-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow auth server can hang client connections in async recording mode #4695
Comments
Is this resolved by restarting the auth server, or the node? I think I may have seen this issue on a node running 4.4.2 yesterday - new sessions would hang and not start with no obvious errors in the log, but this was resolved after restarting the node. |
@webvictim node |
Just had this happen again. Here's debugging info in case it's relevant. Version on auth/proxy:
Version on node:
Auth/proxy logs:
The node is running at
|
@webvictim do you use the |
@webvictim can you also set env vars on the node that hangs
You should see a lot of GRPC state logs |
Hanging connections: grpc/grpc-go#3980 |
Check keepalives server vs client: |
@klizhentas Yes, I use IoT mode. I’ll set the variables and see if I can get extra debugging info. |
@klizhentas I set those variables in Teleport's environment (via |
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It also adds ability to debug GRPC connection state when running in debug mode. To start sending GRPC connection state logs, set environment variables: GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=99 teleport start -d
@webvictim the variables were ignores, see my PR, I fixed them in debug buld |
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking. It also adds ability to debug GRPC connection state when running in debug mode. To start sending GRPC connection state logs, set environment variables: GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=99 teleport start -d
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking.
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking.
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking.
This commit fixes #4695. Teleport in async recording mode sends all events to disk, and uploads them to the server later. It uploads some events synchronously to the audit log so they show up in the global event log right away. However if the auth server is slow, the fanout blocks the session. This commit makes the fanout of some events to be fast, but nonblocking and never fail so sessions will not hang unless the disk writes hang. It adds a backoff period and timeout after which some events will be lost, but session will continue without locking.
Description
Slow or unresponsive auth service can hang client connections in async recording mode.
What happens
Teleport async recording mode sends all events to disk, and uploads them to the server later.
It uploads some events synchronously to the audit log so they show up in the global event log right away.
However if the auth server is slow, the fanout blocks the session.
The text was updated successfully, but these errors were encountered: