Fix connect concurrency, can cause connection nodes to close #6964

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
5 participants
Owner

kimchy commented Jul 22, 2014

Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.

@kimchy kimchy added bug v1.4.0 labels Jul 22, 2014

@bleskes bleskes commented on an outdated diff Jul 22, 2014

...org/elasticsearch/transport/netty/NettyTransport.java
try {
nodeChannels.close();
} finally {
- logger.debug("disconnected from [{}]", node);
+ logger.debug("disconnected from [{}] due to explicit disconnect call", node);
@bleskes

bleskes Jul 22, 2014

Member

I think we should move this above the nodeChannels.close() so it will be logged before an eventual consequence.

@bleskes bleskes commented on an outdated diff Jul 22, 2014

...org/elasticsearch/transport/netty/NettyTransport.java
@@ -793,11 +782,13 @@ private void disconnectFromNode(DiscoveryNode node, Channel channel, String reas
logger.debug("disconnected from [{}], {}", node, reason);
@bleskes

bleskes Jul 22, 2014

Member

Same here - I think we should move the log before the close command.

Member

bleskes commented Jul 22, 2014

Left two minor comments. LGTM.

@kimchy kimchy Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964
a112339

kimchy closed this in 88f3afe Jul 22, 2014

@kimchy kimchy added a commit that referenced this pull request Jul 22, 2014

@kimchy kimchy Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964
d3c1cb1

jpountz removed the review label Jul 24, 2014

kimchy deleted the kimchy:transport_connect_can_cause_disconnect branch Aug 18, 2014

clintongormley changed the title from Fix connect concurrency, can cause connection nodes to close to Internal: Fix connect concurrency, can cause connection nodes to close Sep 8, 2014

Would this issue cause intermittent exceptions with the following messages?

org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes were available
or
org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available

We are running ES 1.3.2 and seeing intermittent cases where we get these 2 exceptions on the JVMs that run the transport client even though there is no load on the cluster or any sort of networking issues between the JVM running the transport client and cluster.

Member

bleskes commented Mar 24, 2015

@andreasch I think it might, if you only have one node configured in your transport client? also if you turn on debugging logging it should see some complaints about node not connected.

@bleskes
We do only have one node in our transport client (points to a VIP/load balancer).

"Node not connected" seems to be the root exception message for all of our exceptions with the message "org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available"

Member

bleskes commented Mar 24, 2015

Ok. sounds plausible then. But as these things go - it's hard to be sure..

@andreasch andreasch added a commit to infusionsoft/elasticsearch that referenced this pull request Mar 25, 2015

@kimchy @andreasch kimchy + andreasch Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964
2a434e4

clintongormley changed the title from Internal: Fix connect concurrency, can cause connection nodes to close to Fix connect concurrency, can cause connection nodes to close Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment