-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject connection attempts while closing #92465
Reject connection attempts while closing #92465
Conversation
Today if there is a constant stream of connection attempts then it's possible for the `ClusterConnectionManager` to wait forever in `close()` for `connectingRefCounter` to be fully released. With this commit we reject connection attempts while closing, avoiding this starvation situation.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @DaveCTurner, I've created a changelog YAML for you. |
Spotted this while looking at a test failure which apparently failed due to getting stuck exactly here: https://gradle-enterprise.elastic.co/s/rdlucvijy2dby. I'm not sure this is actually the reason for this failure, it is kinda implausible that we open connections at a high enough rate to cause this starvation, but still I don't see any other obvious reasons and this is worth fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But a small nit: the test setup seems not very straight forward. Individual pieces/blocks of it could benefit from some extra comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
server/src/test/java/org/elasticsearch/transport/ClusterConnectionManagerTests.java
Show resolved
Hide resolved
Fair comment :) I added some more detail in 2c3b8c6, hope that helps. |
Today if there is a constant stream of connection attempts then it's possible for the `ClusterConnectionManager` to wait forever in `close()` for `connectingRefCounter` to be fully released. With this commit we reject connection attempts while closing, avoiding this starvation situation.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Today if there is a constant stream of connection attempts then it's possible for the `ClusterConnectionManager` to wait forever in `close()` for `connectingRefCounter` to be fully released. With this commit we reject connection attempts while closing, avoiding this starvation situation.
Today if there is a constant stream of connection attempts then it's possible for the `ClusterConnectionManager` to wait forever in `close()` for `connectingRefCounter` to be fully released. With this commit we reject connection attempts while closing, avoiding this starvation situation.
On reflection I think this might fix the test failure mentioned above after all. We don't have to constantly open connections until the suite times out, we only have to do it until we terminate the threadpool, because we're using |
Today if there is a constant stream of connection attempts then it's possible for the
ClusterConnectionManager
to wait forever inclose()
forconnectingRefCounter
to be fully released. With this commit we reject connection attempts while closing, avoiding this starvation situation.