Avoid opening outbound connections during shutdown #77539

DaveCTurner · 2021-09-10T07:51:03Z

To answer the question raised in this comment I started to look at whether we could block on the closeLock and I think the answer is no but also I now have more questions about concurrency during closing. Can we definitely not add to acceptedChannels after the server channel is closed? Also this comment:

elasticsearch/server/src/main/java/org/elasticsearch/transport/TcpTransport.java

Line 282 in e6fd459

    
           closeLock.readLock().lock(); // ensure we don't open connections while we are closing

Does it really do that? It did in the past when we blocked the thread while the connection completes, but today I think it's possible for us to open a connection and then leak it by shutting the event loop down. It'll be ok for connections that are happening via ClusterConnectionManager#connectToNode since we closed all the connection managers down first, but we bypass that mechanism for discovery and sniffing.

Not sure there's really a bug here, I haven't tried to reproduce it or anything. If there is a bug then it's benign in production anyway since we're shutting down so everything will be cleaned up soon, but it's still untidy and might cause test issues.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-10T07:51:06Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2021-09-15T16:28:20Z

I think we hit exactly this problem in https://gradle-enterprise.elastic.co/s/3lq4o3iswxumo/console-log?task=:server:test which seems like an odd coincidence since this isn't a new bug and yet I've not seen it result in test failures ever before.

Today while the `ClusterConnectionManager` is closing it will reject attempts to open _managed_ connections (i.e. using `connectToNode`), but it still permits ad-hoc connections (i.e. using `openConnection`). This commit extends the existing refcounting mechanism to cover both cases, preventing all concurrent connection attempts while shutting down. Closes elastic#86249 Relates elastic#77539

DaveCTurner · 2022-04-30T08:31:26Z

Note that #86315 means we no longer need to protect against connection attempts while closing. Should we just remove this (buggy) protection mechanism instead of trying to fix it?

Today while the `ClusterConnectionManager` is closing it will reject attempts to open _managed_ connections (i.e. using `connectToNode`), but it still permits ad-hoc connections (i.e. using `openConnection`). This commit extends the existing refcounting mechanism to cover both cases, preventing all concurrent connection attempts while shutting down. Closes #86249 Relates #77539

DaveCTurner added >bug :Distributed/Network Http and internode communication implementations labels Sep 10, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Sep 10, 2021

DaveCTurner mentioned this issue Sep 10, 2021

Avoid Needless Forking when Closing Transports #66834

Merged

DaveCTurner mentioned this issue Sep 15, 2021

Notify stubbable transport behaviors on clear #77774

Merged

DaveCTurner mentioned this issue Apr 28, 2022

Incomplete remote response handler after transport close in integration tests #86249

Closed

DaveCTurner mentioned this issue Apr 30, 2022

Reject openConnection attempt while closing #86315

Merged

original-brownbear added the >tech debt label Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid opening outbound connections during shutdown #77539

Avoid opening outbound connections during shutdown #77539

DaveCTurner commented Sep 10, 2021 •

edited

elasticmachine commented Sep 10, 2021

DaveCTurner commented Sep 15, 2021

DaveCTurner commented Apr 30, 2022 •

edited

Avoid opening outbound connections during shutdown #77539

Avoid opening outbound connections during shutdown #77539

Comments

DaveCTurner commented Sep 10, 2021 • edited

elasticmachine commented Sep 10, 2021

DaveCTurner commented Sep 15, 2021

DaveCTurner commented Apr 30, 2022 • edited

DaveCTurner commented Sep 10, 2021 •

edited

DaveCTurner commented Apr 30, 2022 •

edited