From 6114af2588e8266dc2a83a19d4aadeea9563b408 Mon Sep 17 00:00:00 2001
From: dfitzmau <dfitzmau@redhat.com>
Date: Thu, 16 Oct 2025 15:09:53 +0100
Subject: [PATCH] OSDOCS-13616-takeover: Documented Cluster latency
 requirements for etcd

---
 etcd/etcd-performance.adoc                    |  3 +++
 etcd/etcd-practices.adoc                      | 12 ++++++++-
 modules/etcd-tuning-parameters.adoc           |  1 -
 modules/recommended-cluster-latency-etcd.adoc | 27 +++++++++++++++++++
 4 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 modules/recommended-cluster-latency-etcd.adoc

diff --git a/etcd/etcd-performance.adoc b/etcd/etcd-performance.adoc
index cc40c2b4db76..b7e4ace6fe27 100644
--- a/etcd/etcd-performance.adoc
+++ b/etcd/etcd-performance.adoc
@@ -26,6 +26,9 @@ include::modules/etcd-node-scaling.adoc[leveloffset=+1]
 * link:https://docs.redhat.com/en/documentation/assisted_installer_for_openshift_container_platform/2024/html/installing_openshift_container_platform_with_the_assisted_installer/expanding-the-cluster#installing-control-plane-node-healthy-cluster_expanding-the-cluster[Expanding the cluster]
 * xref:../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]
 
+// * xref:../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/etcd-tuning-parameters.adoc#dr-restoring-cluster-state[Restoring to a previous cluster state]
+
+
 // Effects of disk latency on etcd
 include::modules/etcd-disk-latency.adoc[leveloffset=+1]
 
diff --git a/etcd/etcd-practices.adoc b/etcd/etcd-practices.adoc
index f758f65e3bda..3f4f2e675ceb 100644
--- a/etcd/etcd-practices.adoc
+++ b/etcd/etcd-practices.adoc
@@ -6,9 +6,19 @@ include::_attributes/common-attributes.adoc[]
 
 toc::[]
 
-The following documentation provides information on recommended performance and scalability practices for etcd.
+The following documentation provides information about recommended performance and scalability practices for etcd.
 
+// Storage practices for etcd
 include::modules/recommended-etcd-practices.adoc[leveloffset=+1]
+
+// Cluster latency requirements for etcd
+include::modules/recommended-cluster-latency-etcd.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+.Additional resources
+* xref:../etcd/etcd-performance.adoc#etcd-tuning-parameters_etcd-performance[Setting tuning parameters for etcd]
+
+// Validating the hardware for etcd
 include::modules/etcd-verify-hardware.adoc[leveloffset=+1]
 
 [role="_additional-resources"]
diff --git a/modules/etcd-tuning-parameters.adoc b/modules/etcd-tuning-parameters.adoc
index f756a2ad8aa1..f1b73ea697c7 100644
--- a/modules/etcd-tuning-parameters.adoc
+++ b/modules/etcd-tuning-parameters.adoc
@@ -17,7 +17,6 @@ By selecting one of the other values, you are overriding the default. If you see
 
 To change the hardware speed tolerance for etcd, complete the following steps.
 
-
 .Procedure
 
 . Check to see what the current value is by entering the following command:
diff --git a/modules/recommended-cluster-latency-etcd.adoc b/modules/recommended-cluster-latency-etcd.adoc
new file mode 100644
index 000000000000..789830bd80f6
--- /dev/null
+++ b/modules/recommended-cluster-latency-etcd.adoc
@@ -0,0 +1,27 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-practices.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="recommended-cluster-latency-etcd_{context}"]
+= Cluster latency requirements for etcd
+
+[role="_abstract"]
+Two important constraints should be addressed to provide a low-latency, high-availability network for etcd: 
+
+* network I/O latency 
+* disk I/O latency
+
+etcd uses the Raft consensus algorithm, and every change should replicate to a majority of the cluster members before it commits. This process is highly sensitive to network and disk performance. The minimum time for an etcd request is the Round-Trip Time (RTT) between members, plus the time required for data to write to permanent storage.
+
+To achieve high availability, etcd should detect and recover from a leader failure quickly. This depends on two key tuning parameters:
+
+Heartbeat Interval:: The frequency that the leader sends a heartbeat to followers. This value should be close to the average RTT between members.
+Election Timeout:: The time a follower waits without hearing a heartbeat before it attempts to become the new leader. This should be at least 10 times the RTT value to account for network variance.
+
+In a healthy cluster, the round-trip time between members should be less than 50 ms to ensure stability and avoid frequent leader elections. This is why etcd clusters are often deployed within a single data center or availability zone to minimize physical distance and network latency.
+
+To support a low-latency, high-availability network, especially during the leader election process, an arbiter site should be located where it provides an RTT latency of less than 10 ms. The arbiter component of a network maintains consistency and availability in a distributed system. 
+
+// Need to clarify so the impression is that the arbiter is not counted in the number of nodes
+// In the case of leader election and similar processes, the arbiter is used when clusters have an odd number of nodes, so a majority vote determines the system state.
\ No newline at end of file