Log when probe succeeds but full connection fails #51357

DaveCTurner · 2020-01-23T15:57:15Z

It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.

In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.

This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.

It also tidies up some loose ends in HandshakingTransportAddressConnector,
removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.

Backport of #51304

It is permitted for nodes to accept transport connections at addresses other than their publish address, which allows a good deal of flexibility when configuring discovery. However, it is not unusual for users to misconfigure nodes to pick a publish address which is inaccessible to other nodes. We see this happen a lot if the nodes are on different networks separated by a proxy, or if the nodes are running in Docker with the wrong kind of network config. In this case we offer no useful feedback to the user unless they enable TRACE-level logs. It's particularly tricky to diagnose because if we test connectivity between the nodes (using their discovery addresses) then all will appear well. This commit adds a WARN-level log if this kind of misconfiguration is detected: the probe connection has succeeded (to indicate that we are really talking to a healthy Elasticsearch node) but the followup connection attempt fails. It also tidies up some loose ends in `HandshakingTransportAddressConnector`, removing some TODOs that need not be completed, and registering its accidentally-unregistered timeout settings.

elasticmachine · 2020-01-23T15:57:18Z

Pinging @elastic/es-distributed (:Distributed/Network)

DaveCTurner · 2020-01-23T16:42:20Z

@elasticmachine please run elasticsearch-ci/2

DaveCTurner added :Distributed/Network Http and internode communication implementations backport v7.7.0 labels Jan 23, 2020

DaveCTurner added the >enhancement label Jan 23, 2020

DaveCTurner merged commit 0152c40 into elastic:7.x Jan 23, 2020

DaveCTurner deleted the 2020-01-23-HandshakingTransportAddressConnector-fixes-7x branch January 23, 2020 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log when probe succeeds but full connection fails #51357

Log when probe succeeds but full connection fails #51357

DaveCTurner commented Jan 23, 2020

elasticmachine commented Jan 23, 2020

DaveCTurner commented Jan 23, 2020

Log when probe succeeds but full connection fails #51357

Log when probe succeeds but full connection fails #51357

Conversation

DaveCTurner commented Jan 23, 2020

elasticmachine commented Jan 23, 2020

DaveCTurner commented Jan 23, 2020