Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connect concurrency, can cause connection nodes to close #6964

Closed

Conversation

Projects
None yet
5 participants
@kimchy
Copy link
Member

kimchy commented Jul 22, 2014

Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.

@kimchy kimchy added bug labels Jul 22, 2014

@kimchy kimchy added v2.0.0 labels Jul 22, 2014

@bleskes

View changes

src/main/java/org/elasticsearch/transport/netty/NettyTransport.java Outdated
try {
nodeChannels.close();
} finally {
logger.debug("disconnected from [{}]", node);
logger.debug("disconnected from [{}] due to explicit disconnect call", node);

This comment has been minimized.

Copy link
@bleskes

bleskes Jul 22, 2014

Member

I think we should move this above the nodeChannels.close() so it will be logged before an eventual consequence.

@bleskes

View changes

src/main/java/org/elasticsearch/transport/netty/NettyTransport.java Outdated
@@ -793,11 +782,13 @@ private void disconnectFromNode(DiscoveryNode node, Channel channel, String reas
logger.debug("disconnected from [{}], {}", node, reason);

This comment has been minimized.

Copy link
@bleskes

bleskes Jul 22, 2014

Member

Same here - I think we should move the log before the close command.

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Jul 22, 2014

Left two minor comments. LGTM.

Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964

@kimchy kimchy closed this in 88f3afe Jul 22, 2014

kimchy added a commit that referenced this pull request Jul 22, 2014

Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964

@jpountz jpountz removed the review label Jul 24, 2014

@kimchy kimchy deleted the kimchy:transport_connect_can_cause_disconnect branch Aug 18, 2014

@clintongormley clintongormley changed the title Fix connect concurrency, can cause connection nodes to close Internal: Fix connect concurrency, can cause connection nodes to close Sep 8, 2014

@andreasch

This comment has been minimized.

Copy link

andreasch commented Mar 24, 2015

Would this issue cause intermittent exceptions with the following messages?

org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes were available
or
org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available

We are running ES 1.3.2 and seeing intermittent cases where we get these 2 exceptions on the JVMs that run the transport client even though there is no load on the cluster or any sort of networking issues between the JVM running the transport client and cluster.

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Mar 24, 2015

@andreasch I think it might, if you only have one node configured in your transport client? also if you turn on debugging logging it should see some complaints about node not connected.

@andreasch

This comment has been minimized.

Copy link

andreasch commented Mar 24, 2015

@bleskes
We do only have one node in our transport client (points to a VIP/load balancer).

"Node not connected" seems to be the root exception message for all of our exceptions with the message "org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available"

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Mar 24, 2015

Ok. sounds plausible then. But as these things go - it's hard to be sure..

andreasch added a commit to infusionsoft/elasticsearch that referenced this pull request Mar 25, 2015

Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes elastic#6964

@clintongormley clintongormley changed the title Internal: Fix connect concurrency, can cause connection nodes to close Fix connect concurrency, can cause connection nodes to close Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.