Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,46 @@ org.elasticsearch.transport.ConnectTransportException: [][192.168.0.42:9443] *co
server is enabled>> on the remote cluster.
* Ensure no firewall is blocking the communication.

[[remote-clusters-unreliable-network]]
===== Remote cluster connection is unreliable

====== Symptom

The local cluster can connect to the remote cluster, but the connection does
not work reliably. For example, some cross-cluster requests may succeed while
others report connection errors, time out, or appear to be stuck waiting for
the remote cluster to respond.

When {es} detects that the remote cluster connection is not working, it will
report the following message in its logs:
[source,txt,subs=+quotes]
----
[2023-06-28T16:36:47,264][INFO ][o.e.t.ClusterConnectionManager] [local-node] transport connection to [{my-remote#192.168.0.42:9443}{...}] closed by remote
----
This message will also be logged if the node of the remote cluster to which
{es} is connected is shut down or restarted.

Note that with some network configurations it could take minutes or hours for
the operating system to detect that a connection has stopped working. Until the
failure is detected and reported to {es}, requests involving the remote cluster
may time out or may appear to be stuck.

====== Resolution

* Ensure that the network between the clusters is as reliable as possible.

* Ensure that the network is configured to permit <<long-lived-connections>>.

* Ensure that the network is configured to detect faulty connections quickly.
In particular, you must enable and fully support TCP keepalives, and set a
short <<system-config-tcpretries,retransmission timeout>>.

* On Linux systems, execute `ss -tonie` to verify the details of the
configuration of each network connection between the clusters.

* If the problems persist, capture network packets at both ends of the
connection and analyse the traffic to look for delays and lost messages.

[[remote-clusters-troubleshooting-tls-trust]]
===== TLS trust not established

Expand Down