Fix: handle connection state changes #354
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While trying to figure out why we are seeing a GOAWAY HTTP2 frame coming from the "API" (actually nginx), Adrian found out that the only way to handle that is by watching the connection state changes.
Add another goroutine that is in charge of monitoring the state changes. If the connection goes to IDLE or SHUTDOWN, report an error from the goroutine so that it breaks out of the loop. This causes the agent to reconnect.
Our best understanding of the issue is that nginx has a hard limit on the number of requests that can be made over a single connection and when it reaches that limit, it sends a GOAWAY frame. The pings that we are sending to keep the connection alive are making this worse becase we send one every 90 seconds and the current limit in nginx is 100 (that gives us 9000 = 2.5 hours, not counting the additional requests that are part of the regular communication between the API and the agent, so that's the upper limit; that seems to match observed behavior).
See https://github.com/grafana/deployment_tools/pull/47478
Signed-off-by: Marcelo E. Magallon marcelo.magallon@grafana.com