Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid opening outbound connections during shutdown #77539

Open
DaveCTurner opened this issue Sep 10, 2021 · 3 comments
Open

Avoid opening outbound connections during shutdown #77539

DaveCTurner opened this issue Sep 10, 2021 · 3 comments
Labels
>bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team >tech debt

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 10, 2021

To answer the question raised in this comment I started to look at whether we could block on the closeLock and I think the answer is no but also I now have more questions about concurrency during closing. Can we definitely not add to acceptedChannels after the server channel is closed? Also this comment:

closeLock.readLock().lock(); // ensure we don't open connections while we are closing

Does it really do that? It did in the past when we blocked the thread while the connection completes, but today I think it's possible for us to open a connection and then leak it by shutting the event loop down. It'll be ok for connections that are happening via ClusterConnectionManager#connectToNode since we closed all the connection managers down first, but we bypass that mechanism for discovery and sniffing.

Not sure there's really a bug here, I haven't tried to reproduce it or anything. If there is a bug then it's benign in production anyway since we're shutting down so everything will be cleaned up soon, but it's still untidy and might cause test issues.

@DaveCTurner DaveCTurner added >bug :Distributed/Network Http and internode communication implementations labels Sep 10, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Sep 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

I think we hit exactly this problem in https://gradle-enterprise.elastic.co/s/3lq4o3iswxumo/console-log?task=:server:test which seems like an odd coincidence since this isn't a new bug and yet I've not seen it result in test failures ever before.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Apr 30, 2022
Today while the `ClusterConnectionManager` is closing it will reject
attempts to open _managed_ connections (i.e. using `connectToNode`), but
it still permits ad-hoc connections (i.e. using `openConnection`). This
commit extends the existing refcounting mechanism to cover both cases,
preventing all concurrent connection attempts while shutting down.

Closes elastic#86249
Relates elastic#77539
@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Apr 30, 2022

Note that #86315 means we no longer need to protect against connection attempts while closing. Should we just remove this (buggy) protection mechanism instead of trying to fix it?

DaveCTurner added a commit that referenced this issue May 2, 2022
Today while the `ClusterConnectionManager` is closing it will reject
attempts to open _managed_ connections (i.e. using `connectToNode`), but
it still permits ad-hoc connections (i.e. using `openConnection`). This
commit extends the existing refcounting mechanism to cover both cases,
preventing all concurrent connection attempts while shutting down.

Closes #86249
Relates #77539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team >tech debt
Projects
None yet
Development

No branches or pull requests

3 participants