From 09dcdb4e52722e8e72a28f90a9d6cc87d12739fd Mon Sep 17 00:00:00 2001 From: David Turner Date: Tue, 7 Sep 2021 15:20:59 +0100 Subject: [PATCH 1/4] Add nuance around stretched clusters Today the multi-zone-cluster design docs say to keep all the nodes in a single datacenter. This doesn't really reflect what we do in practice: each zone in AWS/GCP/Azure/etc is a separate datacenter with decent connectivity to the other zones in the same region. This commit adjusts the docs to allow for this. --- .../high-availability/cluster-design.asciidoc | 46 +++++++++++++------ 1 file changed, 33 insertions(+), 13 deletions(-) diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc index 4da8b69fac709..d1d070c4e61ff 100644 --- a/docs/reference/high-availability/cluster-design.asciidoc +++ b/docs/reference/high-availability/cluster-design.asciidoc @@ -230,24 +230,44 @@ The cluster will be resilient to the loss of any node as long as: [[high-availability-cluster-design-large-clusters]] === Resilience in larger clusters -It is not unusual for nodes to share some common infrastructure, such as a power -supply or network router. If so, you should plan for the failure of this +It is not unusual for nodes to share some common infrastructure, such as network +interconnects or a power supply. If so, you should plan for the failure of this infrastructure and ensure that such a failure would not affect too many of your nodes. It is common practice to group all the nodes sharing some infrastructure into _zones_ and to plan for the failure of any whole zone at once. -Your cluster’s zones should all be contained within a single data centre. {es} -expects its node-to-node connections to be reliable and have low latency and -high bandwidth. Connections between data centres typically do not meet these -expectations. Although {es} will behave correctly on an unreliable or slow -network, it will not necessarily behave optimally. It may take a considerable -length of time for a cluster to fully recover from a network partition since it -must resynchronize any missing data and rebalance the cluster once the -partition heals. If you want your data to be available in multiple data centres, -deploy a separate cluster in each data centre and use -<> or <> to link the +{es} expects its node-to-node connections to be reliable and have low latency +and good bandwidth. Many of the tasks that {es} performs require multiple +round-trips between nodes. This means that a slow or unreliable interconnect +may have a significant effect on the performance and stability of your cluster. +A few milliseconds of latency added to each round-trip can quickly accumulate +into a noticeable performance penalty. {es} will automatically recover from a +network partition as quickly as it can but your cluster may be partly +unavailable during a partition and will need to spend time and resources to +resynchronize any missing data and rebalance itself once a partition heals. + +If you have divided your cluster into zones then typically the network +connections within each zone are of higher quality than the connections between +the zones. You must make sure that the network connections between zones are of +sufficiently high quality. You will see the best results by locating all your +zones within a single data center with each zone having its own independent +power supply and other supporting infrastructure. You can also _stretch_ your +cluster across nearby data centers as long as the network interconnection +between each pair of data centers is good enough. + +[[high-availability-cluster-design-min-network-perf]] There is no specific +minimum network performance required to run a healthy {es} cluster. In theory a +cluster will work correctly even if the round-trip latency between nodes is +several hundred milliseconds. In practice if your network is that slow then the +cluster performance will be very poor. In addition, slow networks are often +unreliable enough to cause network partitions that will lead to periods of +unavailability. + +If you want your data to be available in multiple data centers that are further +apart or not well connected, deploy a separate cluster in each data center and +use <> or <> to link the clusters together. These features are designed to perform well even if the -cluster-to-cluster connections are less reliable or slower than the network +cluster-to-cluster connections are less reliable or performant than the network within each cluster. After losing a whole zone's worth of nodes, a properly-designed cluster may be From b0fae802ca7b566666e5043c2f0b3d375895301f Mon Sep 17 00:00:00 2001 From: David Turner Date: Tue, 7 Sep 2021 17:16:00 +0100 Subject: [PATCH 2/4] Note recovery time <-> bandwidth --- .../high-availability/cluster-design.asciidoc | 21 +++++++++++-------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc index d1d070c4e61ff..b780b6e17b3d8 100644 --- a/docs/reference/high-availability/cluster-design.asciidoc +++ b/docs/reference/high-availability/cluster-design.asciidoc @@ -236,15 +236,18 @@ infrastructure and ensure that such a failure would not affect too many of your nodes. It is common practice to group all the nodes sharing some infrastructure into _zones_ and to plan for the failure of any whole zone at once. -{es} expects its node-to-node connections to be reliable and have low latency -and good bandwidth. Many of the tasks that {es} performs require multiple -round-trips between nodes. This means that a slow or unreliable interconnect -may have a significant effect on the performance and stability of your cluster. -A few milliseconds of latency added to each round-trip can quickly accumulate -into a noticeable performance penalty. {es} will automatically recover from a -network partition as quickly as it can but your cluster may be partly -unavailable during a partition and will need to spend time and resources to -resynchronize any missing data and rebalance itself once a partition heals. +{es} expects its node-to-node connections to be reliable and to have low +latency and adequate bandwidth. Many of the tasks that {es} performs require +multiple round-trips between nodes. This means that a slow or unreliable +interconnect may have a significant effect on the performance and stability of +your cluster. A few milliseconds of latency added to each round-trip can +quickly accumulate into a noticeable performance penalty. {es} will +automatically recover from a network partition as quickly as it can but your +cluster may be partly unavailable during a partition and will need to spend +time and resources to resynchronize any missing data and rebalance itself once +the partition heals. Recovering from a failure may involve copying a large +amount of data between nodes so the recovery time is often determined by the +available bandwidth. If you have divided your cluster into zones then typically the network connections within each zone are of higher quality than the connections between From 3818e8f2ef5b7f527aa7abf7e565329aae6f55ff Mon Sep 17 00:00:00 2001 From: David Turner Date: Thu, 9 Sep 2021 19:56:15 +0100 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> --- .../high-availability/cluster-design.asciidoc | 30 +++++++++++-------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc index b780b6e17b3d8..48f534513793a 100644 --- a/docs/reference/high-availability/cluster-design.asciidoc +++ b/docs/reference/high-availability/cluster-design.asciidoc @@ -230,18 +230,21 @@ The cluster will be resilient to the loss of any node as long as: [[high-availability-cluster-design-large-clusters]] === Resilience in larger clusters -It is not unusual for nodes to share some common infrastructure, such as network +It's not unusual for nodes to share common infrastructure, such as network interconnects or a power supply. If so, you should plan for the failure of this infrastructure and ensure that such a failure would not affect too many of your nodes. It is common practice to group all the nodes sharing some infrastructure into _zones_ and to plan for the failure of any whole zone at once. -{es} expects its node-to-node connections to be reliable and to have low -latency and adequate bandwidth. Many of the tasks that {es} performs require -multiple round-trips between nodes. This means that a slow or unreliable +{es} expects node-to-node connections to be reliable, have low latency, and +have adequate bandwidth. Many {es} tasks require multiple round-trips between +nodes. A slow or unreliable interconnect may have a significant effect on the performance and stability of -your cluster. A few milliseconds of latency added to each round-trip can -quickly accumulate into a noticeable performance penalty. {es} will +your cluster. + +For example, a few milliseconds of latency added to each round-trip can +quickly accumulate into a noticeable performance penalty. An unreliable network +may have frequent network partitions. {es} will automatically recover from a network partition as quickly as it can but your cluster may be partly unavailable during a partition and will need to spend time and resources to resynchronize any missing data and rebalance itself once @@ -249,21 +252,22 @@ the partition heals. Recovering from a failure may involve copying a large amount of data between nodes so the recovery time is often determined by the available bandwidth. -If you have divided your cluster into zones then typically the network -connections within each zone are of higher quality than the connections between -the zones. You must make sure that the network connections between zones are of +If you've divided your cluster into zones, the network +connections within each zone are typically of higher quality than the connections between +the zones. Ensure the network connections between zones are of sufficiently high quality. You will see the best results by locating all your zones within a single data center with each zone having its own independent power supply and other supporting infrastructure. You can also _stretch_ your cluster across nearby data centers as long as the network interconnection between each pair of data centers is good enough. -[[high-availability-cluster-design-min-network-perf]] There is no specific -minimum network performance required to run a healthy {es} cluster. In theory a +[[high-availability-cluster-design-min-network-perf]] +There is no specific +minimum network performance required to run a healthy {es} cluster. In theory, a cluster will work correctly even if the round-trip latency between nodes is -several hundred milliseconds. In practice if your network is that slow then the +several hundred milliseconds. In practice, if your network is that slow then the cluster performance will be very poor. In addition, slow networks are often -unreliable enough to cause network partitions that will lead to periods of +unreliable enough to cause network partitions that lead to periods of unavailability. If you want your data to be available in multiple data centers that are further From 486f192870b9aadbca2fa9440cd50694fb702256 Mon Sep 17 00:00:00 2001 From: David Turner Date: Thu, 9 Sep 2021 19:58:22 +0100 Subject: [PATCH 4/4] Reformat --- .../high-availability/cluster-design.asciidoc | 55 +++++++++---------- 1 file changed, 26 insertions(+), 29 deletions(-) diff --git a/docs/reference/high-availability/cluster-design.asciidoc b/docs/reference/high-availability/cluster-design.asciidoc index 48f534513793a..48bdb5d72fb97 100644 --- a/docs/reference/high-availability/cluster-design.asciidoc +++ b/docs/reference/high-availability/cluster-design.asciidoc @@ -238,37 +238,34 @@ into _zones_ and to plan for the failure of any whole zone at once. {es} expects node-to-node connections to be reliable, have low latency, and have adequate bandwidth. Many {es} tasks require multiple round-trips between -nodes. A slow or unreliable -interconnect may have a significant effect on the performance and stability of -your cluster. - -For example, a few milliseconds of latency added to each round-trip can -quickly accumulate into a noticeable performance penalty. An unreliable network -may have frequent network partitions. {es} will -automatically recover from a network partition as quickly as it can but your -cluster may be partly unavailable during a partition and will need to spend -time and resources to resynchronize any missing data and rebalance itself once -the partition heals. Recovering from a failure may involve copying a large -amount of data between nodes so the recovery time is often determined by the -available bandwidth. - -If you've divided your cluster into zones, the network -connections within each zone are typically of higher quality than the connections between -the zones. Ensure the network connections between zones are of -sufficiently high quality. You will see the best results by locating all your -zones within a single data center with each zone having its own independent -power supply and other supporting infrastructure. You can also _stretch_ your -cluster across nearby data centers as long as the network interconnection -between each pair of data centers is good enough. +nodes. A slow or unreliable interconnect may have a significant effect on the +performance and stability of your cluster. + +For example, a few milliseconds of latency added to each round-trip can quickly +accumulate into a noticeable performance penalty. An unreliable network may +have frequent network partitions. {es} will automatically recover from a +network partition as quickly as it can but your cluster may be partly +unavailable during a partition and will need to spend time and resources to +resynchronize any missing data and rebalance itself once the partition heals. +Recovering from a failure may involve copying a large amount of data between +nodes so the recovery time is often determined by the available bandwidth. + +If you've divided your cluster into zones, the network connections within each +zone are typically of higher quality than the connections between the zones. +Ensure the network connections between zones are of sufficiently high quality. +You will see the best results by locating all your zones within a single data +center with each zone having its own independent power supply and other +supporting infrastructure. You can also _stretch_ your cluster across nearby +data centers as long as the network interconnection between each pair of data +centers is good enough. [[high-availability-cluster-design-min-network-perf]] -There is no specific -minimum network performance required to run a healthy {es} cluster. In theory, a -cluster will work correctly even if the round-trip latency between nodes is -several hundred milliseconds. In practice, if your network is that slow then the -cluster performance will be very poor. In addition, slow networks are often -unreliable enough to cause network partitions that lead to periods of -unavailability. +There is no specific minimum network performance required to run a healthy {es} +cluster. In theory, a cluster will work correctly even if the round-trip +latency between nodes is several hundred milliseconds. In practice, if your +network is that slow then the cluster performance will be very poor. In +addition, slow networks are often unreliable enough to cause network partitions +that lead to periods of unavailability. If you want your data to be available in multiple data centers that are further apart or not well connected, deploy a separate cluster in each data center and