Add infrastructure to manage network connections outside of Transport/TransportService #22194

s1monw · 2016-12-15T15:47:04Z

Some expert users like UnicastZenPing today establishes real connections to nodes during it's ping
phase that can be used by other parts of the system. Yet, this is potentially dangerous
and undesirable unless the nodes have been fully verified and should be connected to in the
case of a cluster state update or if we join a newly elected master. For usecases like this, this change adds the infrastructure to manually handle connections that are not publicly available on the node ie. should not be managed by Transport/TransportSerivce

UnicastZenPing today establishes real connections to nodes during it's ping phase that can be used by other parts of the system. Yet, this is potentially dangerous and undesirable unless the nodes have been fully verified and shoudl be connected to in the case of a cluster state update or if we join a newly elected master. For zen-pings this change adds dedicated connections that are only kept alive while pings are running.

bleskes

Looks Great To Me (given the current structure of the class). I left one question. Another idea I had is to strengthen the UnicastZenTests test by using a transport that explodes if one calls the connectToNode methods?

bleskes · 2016-12-15T16:37:10Z

core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java

@@ -508,7 +520,9 @@ public void run() {
                    }
                });
            } else {
-                sendPingRequestToNode(sendPingsHandler.id(), timeout, pingRequest, latch, node, nodeToSend);
+                final DiscoveryNode finalNodeToSend = nodeToSend;
+                sendPingRequestToNode(() -> transportService.getConnection(finalNodeToSend),


Question - why not just get the connection from the transport service right here? does the supplier buy as something? maybe we can even try to get it first and create a new temporary only if it fails

Transport.Connection existingConnection = null; if (transportService.nodeConnected(nodeToSend)) { try { existingConnection = transportService.getConnection(nodeToSend); } catch (NodeNotConnectedException e) { // it's ok we'll create a temp connection } } if (existingConnection != null) { sendPingRequestToNode(existingConnection, sendPingsHandler.id(), timeout, pingRequest, latch, node, nodeToSend); } else {

yet, it buys us to not duplicate the exception handling.

s1monw · 2016-12-16T11:55:35Z

@elasticmachine test this please

ywelsch · 2016-12-16T15:37:40Z

core/src/main/java/org/elasticsearch/discovery/zen/UnicastZenPing.java

-                                logger.trace("[{}] connecting (light) to {}", sendPingsHandler.id(), finalNodeToSend);
-                                transportService.connectToNodeAndHandshake(finalNodeToSend, timeout.getMillis());
+                            Transport.Connection connection;
+                            if (nodeFoundByAddress && transportService.nodeConnected(finalNodeToSend)) {


How can this returns true? This code is only executed if further up !transportService.nodeConnected(nodeToSend) holds (or we created a temporary disco node which has a fresh id and is never connected).

I also wonder if we should use a temporary disco-node when nodeFoundByAddress == true and transportService.nodeConnected(nodeToSend) == false.

Last, regarding the existing comment on line 454

// if we find on the disco nodes a matching node by address, we are going to restore the connection
// anyhow down the line if its not connected...

Now that we are not reestablishing the full connection to a node anymore that's in the CS, I wonder what this will do when the node gets disconnected but not removed from the cluster state. When pinging completes (with a temporary connection) and the node stays in the cluster state that is published by the master, then the master will not reestablish the connection in ClusterService (as the node has not been removed on the CS and connectToNodes is only called on the newly added nodes).

Let's talk about this

@ywelsch these are valid points. Seeing the PR already made me realize it's time to re-write this class as it has grown too tangled. I hoped I can do it after @s1monw pushed this PR in, but I'm not sure it's worth it anymore to push through and fix all of this.

@s1monw how about you push just the transport layer changes and I will use them in a follow up PR to clean up this class?

Re node connections -

You are right too that this is not good and not optimal. Since the nodes was never removed from the cluster state, it was also never removed from NodeConnectionService (I think). IMO this is just too complex (and there is nodeFD and node joins that play a role here too, on the master). I suggest we change NodeConnectionService to always work on the ClusterState.nodes() directly rather than trying to be overly heroic and use the deltas. This will give a hard validation of every CS. This should be very cheep tests. We can do it as another follow - potentially before we do the unicast zen ping work.

How does that sound?

I will just keep the transport layer changes here...

s1monw · 2016-12-16T20:43:25Z

@bleskes i rolled back the changes to UnicastZenPing

bleskes · 2016-12-16T20:44:15Z

Thx. Still LGTM :)

…/TransportService (#22194) Some expert users like UnicastZenPing today establishes real connections to nodes during it's ping phase that can be used by other parts of the system. Yet, this is potentially dangerous and undesirable unless the nodes have been fully verified and should be connected to in the case of a cluster state update or if we join a newly elected master. For use-cases like this, this change adds the infrastructure to manually handle connections that are not publicly available on the node ie. should not be managed by `Transport`/`TransportSerivce`

The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are: 1) Clear 3 round flow - no interleaving of scheduling. 2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test). 3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration. 4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections. 5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120 6) Discovery host providers are only used once per pinging round. Closes #21739 7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes #19370 8) Beefed up testing and sped them up. 9) removed light profile from production code

#22194 gave us the ability to open low level temporary connections to remote node based on their address. With this use case out of the way, actual full blown connections should validate the node on the other side, making sure we speak to who we think we speak to. This helps in case where multiple nodes are started on the same host and a quick node restart causes them to swap addresses, which in turn can cause confusion down the road.

s1monw added :Internal :Distributed Coordination/Network Http and internode communication implementations >enhancement v5.2.0 v6.0.0-alpha1 labels Dec 15, 2016

s1monw requested review from bleskes and ywelsch December 15, 2016 15:47

bleskes approved these changes Dec 15, 2016

View reviewed changes

s1monw added 2 commits December 16, 2016 14:28

fix line len

2147f85

more line len fixes

a6cda7e

ywelsch reviewed Dec 16, 2016

View reviewed changes

Rollback changes to UnicastZenPing.java

bd9eee5

s1monw changed the title ~~Never connect and publish connection during unicast zen ping~~ Add infrastructure to manage network connections outside of Transport/TransportService Dec 17, 2016

s1monw merged commit 1f3eb06 into elastic:master Dec 17, 2016

bleskes mentioned this pull request Dec 20, 2016

Simplify Unicast Zen Ping #22277

Merged

bleskes mentioned this pull request Jan 27, 2017

TransportService.connectToNode should validate remote node ID #22828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add infrastructure to manage network connections outside of Transport/TransportService #22194

Add infrastructure to manage network connections outside of Transport/TransportService #22194

s1monw commented Dec 15, 2016 •

edited

Loading

bleskes left a comment •

edited

Loading

bleskes Dec 15, 2016

s1monw Dec 16, 2016

s1monw commented Dec 16, 2016

ywelsch Dec 16, 2016

bleskes Dec 16, 2016

s1monw Dec 16, 2016

s1monw commented Dec 16, 2016

bleskes commented Dec 16, 2016

Add infrastructure to manage network connections outside of Transport/TransportService #22194

Add infrastructure to manage network connections outside of Transport/TransportService #22194

Conversation

s1monw commented Dec 15, 2016 • edited Loading

bleskes left a comment • edited Loading

Choose a reason for hiding this comment

bleskes Dec 15, 2016

Choose a reason for hiding this comment

s1monw Dec 16, 2016

Choose a reason for hiding this comment

s1monw commented Dec 16, 2016

ywelsch Dec 16, 2016

Choose a reason for hiding this comment

bleskes Dec 16, 2016

Choose a reason for hiding this comment

s1monw Dec 16, 2016

Choose a reason for hiding this comment

s1monw commented Dec 16, 2016

bleskes commented Dec 16, 2016

s1monw commented Dec 15, 2016 •

edited

Loading

bleskes left a comment •

edited

Loading