openshift · tmalove · Oct 16, 2025
diff --git a/etcd/etcd-practices.adoc b/etcd/etcd-practices.adoc
@@ -6,11 +6,15 @@ include::_attributes/common-attributes.adoc[]
 
 toc::[]
 
-The following documentation provides information on recommended performance and scalability practices for etcd.
+[role="_abstract"]
+The following documentation provides information about recommended performance and scalability practices for etcd.
 
 include::modules/recommended-etcd-practices.adoc[leveloffset=+1]
+
 include::modules/etcd-verify-hardware.adoc[leveloffset=+1]
 
+include::modules/etcd-cluster-latency.adoc[leveloffset=+1]
+
 [role="_additional-resources"]
 .Additional resources
 * link:https://access.redhat.com/solutions/4885641[How to use `fio` to check etcd disk performance in {product-title}]

diff --git a/modules/etcd-cluster-latency.adoc b/modules/etcd-cluster-latency.adoc
@@ -0,0 +1,28 @@
+// Module included in the following assemblies:
+//
+// * etcd/etcd-practices.adoc
+
+
+:_mod-docs-content-type: CONCEPT
+[id="etcd-cluster-latency_{context}"]
+= Cluster latency requirements for etcd
+
+[role="_abstract"]
+Two important constraints should be addressed to provide a low-latency, high-availability network for etcd: 
+
+* network I/O latency 
+* disk I/O latency
+
+etcd uses the Raft consensus algorithm, and every change should replicate to a majority of the cluster members before it commits. This process is highly sensitive to network and disk performance. The minimum time for an etcd request is the Round-Trip Time (RTT) between members, plus the time required for data to write to permanent storage.
+
+To achieve high availability, etcd should detect and recover from a leader failure quickly. This depends on two key tuning parameters:
+
+Heartbeat Interval:: The frequency that the leader sends a heartbeat to followers. This value should be close to the average RTT between members.
+Election Timeout:: The time a follower waits without hearing a heartbeat before it attempts to become the new leader. This should be at least 10 times the RTT value to account for network variance.
+
+In a healthy cluster, the round-trip time between members should be less than 50 ms to ensure stability and avoid frequent leader elections. This is why etcd clusters are often deployed within a single data center or availability zone to minimize physical distance and network latency.
+
+To support a low-latency, high-availability network, especially during the leader election process, an arbiter site should be located where it provides an RTT latency of less than 10 ms. The arbiter component of a network maintains consistency and availability in a distributed system.
+
+// Need to clarify so the impression is that the arbiter is not counted in the number of nodes
+// In the case of leader election and similar processes, the arbiter is used when clusters have an odd number of nodes, so a majority vote determines the system state.