From 77573f80c93e7f84d38f6db6dded4fff1a9e8554 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Fri, 21 Feb 2025 08:07:54 -0700 Subject: [PATCH 01/19] updates --- deploy-manage/distributed-architecture.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index a1a3cf9ed0..91d99f90a3 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -13,8 +13,6 @@ The topics in this section provides information about the architecture of {{es}} * [Node roles](distributed-architecture/clusters-nodes-shards/node-roles.md): Learn about the different roles that nodes can have in an {{es}} cluster. * [Reading and writing documents](distributed-architecture/reading-and-writing-documents.md): Learn how {{es}} replicates read and write operations across shards and shard copies. * [Shard allocation, relocation, and recovery](distributed-architecture/shard-allocation-relocation-recovery.md): Learn how {{es}} allocates and balances shards across nodes. - - * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. - + * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. * [Shard request cache](elasticsearch://reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. From 06fb37781a6c1ac80cb3e1e6ac1729fa3be78a08 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Fri, 21 Feb 2025 08:23:08 -0700 Subject: [PATCH 02/19] first take --- deploy-manage/distributed-architecture.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index 91d99f90a3..b9b00e4a7c 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -3,6 +3,16 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/_data_store_architecture.html --- +% Update the overview so Kibana is represented too. + +% Clarify which topics are relevant for which deployment types (see note from elastic.co/guide/en/elasticsearch/reference/current/nodes-shards.html) + +% Explain the role of orchestrators on the Clusters, nodes, and shards page, link up. + +% Split the kibana tasks management topic so it's concepts only - guidance goes to design guidance section. + +% Discovery and cluster formation content (7 pages): add introductory note to specify that the endpoints/settings are possibly for self-managed only, and review the content. + # Distributed architecture [_data_store_architecture] {{es}} is a distributed document store. Instead of storing information as rows of columnar data, {{es}} stores complex data structures that have been serialized as JSON documents. When you have multiple {{es}} nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node. From e5e8fbdb76da90571c544c3f15cd5629c5c99c8f Mon Sep 17 00:00:00 2001 From: George Wallace Date: Fri, 21 Feb 2025 10:45:42 -0700 Subject: [PATCH 03/19] updates --- .../clusters-nodes-shards.md | 8 +- .../discovery-cluster-formation.md | 7 + .../kibana-tasks-management.md | 49 ++++-- .../reading-and-writing-documents.md | 19 +- .../shard-allocation-relocation-recovery.md | 13 +- .../delaying-allocation-when-node-leaves.md | 8 +- .../index-level-shard-allocation.md | 164 +++++++++++++++++- 7 files changed, 213 insertions(+), 55 deletions(-) diff --git a/deploy-manage/distributed-architecture/clusters-nodes-shards.md b/deploy-manage/distributed-architecture/clusters-nodes-shards.md index 3fe41b1c94..6de528e2ee 100644 --- a/deploy-manage/distributed-architecture/clusters-nodes-shards.md +++ b/deploy-manage/distributed-architecture/clusters-nodes-shards.md @@ -5,7 +5,7 @@ mapped_pages: # Clusters, nodes, and shards [nodes-shards] -::::{note} +::::{note} Nodes and shards are what make {{es}} distributed and scalable. These concepts aren’t essential if you’re just getting started. How you [deploy {{es}}](../../get-started/deployment-options.md) in production determines what you need to know: * **Self-managed {{es}}**: You are responsible for setting up and managing nodes, clusters, shards, and replicas. This includes managing the underlying infrastructure, scaling, and ensuring high availability through failover and backup strategies. @@ -21,17 +21,15 @@ Elastic is able to distribute your data across nodes by subdividing an index int There are two types of shards: *primaries* and *replicas*. Each document in an index belongs to one primary shard. A replica shard is a copy of a primary shard. Replicas maintain redundant copies of your data across the nodes in your cluster. This protects against hardware failure and increases capacity to serve read requests like searching or retrieving a document. -::::{tip} +::::{tip} The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time, without interrupting indexing or query operations. :::: - Shard copies in your cluster are automatically balanced across nodes to provide scale and high availability. All nodes are aware of all the other nodes in the cluster and can forward client requests to the appropriate node. This allows {{es}} to distribute indexing and query load across the cluster. If you’re exploring {{es}} for the first time or working in a development environment, then you can use a cluster with a single node and create indices with only one shard. However, in a production environment, you should build a cluster with multiple nodes and indices with multiple shards to increase performance and resilience. * To learn about optimizing the number and size of shards in your cluster, refer to [Size your shards](../production-guidance/optimize-performance/size-shards.md). * To learn about how read and write operations are replicated across shards and shard copies, refer to [Reading and writing documents](reading-and-writing-documents.md). -* To adjust how shards are allocated and balanced across nodes, refer to [Shard allocation, relocation, and recovery](shard-allocation-relocation-recovery.md). - +* To adjust how shards are allocated and balanced across nodes, refer to [Shard allocation, relocation, and recovery](shard-allocation-relocation-recovery.md). \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index 0a1f9bcc23..a09a45e33f 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -33,9 +33,16 @@ The following processes and settings are part of discovery and cluster formation [Settings](elasticsearch://reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md) : There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes. +[Quorum-based decision making](discovery-cluster-formation/modules-discovery-quorums.md): How {{es}} uses a quorum-based voting mechanism to make decisions even if some nodes are unavailable. +[Voting configurations](discovery-cluster-formation/modules-discovery-voting.md): How {{es}} automatically updates voting configurations as nodes leave and join a cluster. +[Bootstrapping a cluster](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md): Bootstrapping a cluster is required when an {{es}} cluster starts up for the very first time. In [development mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode), with no discovery settings configured, this is automatically performed by the nodes themselves. As this auto-bootstrapping is [inherently unsafe](discovery-cluster-formation/modules-discovery-quorums.md), running a node in [production mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode) requires bootstrapping to be [explicitly configured](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md). +[Adding and removing master-eligible nodes](../maintenance/add-and-remove-elasticsearch-nodes.md): It is recommended to have a small and fixed number of master-eligible nodes in a cluster, and to scale the cluster up and down by adding and removing master-ineligible nodes only. However there are situations in which it may be desirable to add or remove some master-eligible nodes to or from a cluster. This section describes the process for adding or removing master-eligible nodes, including the extra steps that need to be performed when removing more than half of the master-eligible nodes at the same time. +[Publishing the cluster state](discovery-cluster-formation/cluster-state-overview.md#cluster-state-publishing): Cluster state publishing is the process by which the elected master node updates the cluster state on all the other nodes in the cluster. +[Cluster fault detection](discovery-cluster-formation/cluster-fault-detection.md): {{es}} performs health checks to detect and remove faulty nodes. +[Settings](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md): There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index ac8a95fa2c..c7031bdf3a 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -20,8 +20,6 @@ If you lose this index, all scheduled alerts and actions are lost. :::: - - ## Running background tasks [task-manager-background-tasks] {{kib}} background tasks are managed as follows: @@ -30,11 +28,9 @@ If you lose this index, all scheduled alerts and actions are lost. * Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. * Tasks are run on the {{kib}} server. * Task Manager ensures that tasks: - - * Are only executed once - * Are retried when they fail (if configured to do so) - * Are rescheduled to run again at a future point in time (if configured to do so) - + * Are only executed once + * Are retried when they fail (if configured to do so) + * Are rescheduled to run again at a future point in time (if configured to do so) ::::{important} It is possible for tasks to run late or at an inconsistent schedule. @@ -49,21 +45,31 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ :::: +<<<<<<< HEAD +======= +>>>>>>> f70338d9 (updates) ## Deployment considerations [_deployment_considerations] {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). +<<<<<<< HEAD +======= +>>>>>>> f70338d9 (updates) ## Scaling guidance [task-manager-scaling-guidance] How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage. +<<<<<<< HEAD ### Default scale [task-manager-default-scaling] +======= +### Default scale [task-manager-default-scaling] +>>>>>>> f70338d9 (updates) By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). @@ -73,6 +79,7 @@ By [estimating a rough throughput requirement](#task-manager-rough-throughput-es For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md). +<<<<<<< HEAD ### Scaling horizontally [task-manager-scaling-horizontally] @@ -80,6 +87,13 @@ At times, the sustainable approach might be to expand the throughput of your clu ### Scaling vertically [task-manager-scaling-vertically] +======= +### Scaling horizontally [task-manager-scaling-horizontally] + +At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload. + +### Scaling vertically [task-manager-scaling-vertically] +>>>>>>> f70338d9 (updates) Other times it, might be preferable to increase the throughput of individual {{kib}} instances. @@ -87,8 +101,12 @@ Tweak the capacity with the [`xpack.task_manager.capacity`](kibana://reference/c Tweak the poll interval with the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher. +<<<<<<< HEAD ### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] +======= +### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] +>>>>>>> f70338d9 (updates) Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case. @@ -105,8 +123,12 @@ Task Manager, like the rest of the Elastic Stack, is designed to scale horizonta Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination. +<<<<<<< HEAD ### Rough throughput estimation [task-manager-rough-throughput-estimation] +======= +### Rough throughput estimation [task-manager-rough-throughput-estimation] +>>>>>>> f70338d9 (updates) Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide. @@ -114,14 +136,17 @@ Throughput is best thought of as a measurements in tasks per minute. A default {{kib}} instance can support up to `200/tpm`. +#### Automatic estimation [_automatic_estimation] +<<<<<<< HEAD #### Automatic estimation [_automatic_estimation] +======= +>>>>>>> f70338d9 (updates) ::::{warning} This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: - As demonstrated in [Evaluate your capacity estimation](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-capacity-estimation), the Task Manager [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) performs these estimations automatically. These estimates are based on historical data and should not be used as predictions, but can be used as a rough guide when scaling the system. @@ -139,8 +164,11 @@ When evaluating the proposed {{kib}} instance number under `proposed.provisioned :::: +<<<<<<< HEAD +======= +>>>>>>> f70338d9 (updates) #### Manual estimation [_manual_estimation] By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement. @@ -161,7 +189,4 @@ Given the predicted workload, you can estimate a lower bound throughput of `340/ Although this is a *rough* estimate, the *tasks per minute* provides the lower bound needed to execute tasks on time. -Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime). - - - +Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime). \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/reading-and-writing-documents.md b/deploy-manage/distributed-architecture/reading-and-writing-documents.md index ad038228e2..93d9593668 100644 --- a/deploy-manage/distributed-architecture/reading-and-writing-documents.md +++ b/deploy-manage/distributed-architecture/reading-and-writing-documents.md @@ -5,7 +5,6 @@ mapped_pages: # Reading and writing documents [docs-replication] - ## Introduction [_introduction] Each index in Elasticsearch is [divided into shards](../../deploy-manage/index.md) and each shard can have multiple copies. These copies are known as a *replication group* and must be kept in sync when documents are added or removed. If we fail to do so, reading from one copy will result in very different results than reading from another. The process of keeping the shard copies in sync and serving reads from them is what we call the *data replication model*. @@ -14,7 +13,6 @@ Elasticsearch’s data replication model is based on the *primary-backup model* This purpose of this section is to give a high level overview of the Elasticsearch replication model and discuss the implications it has for various interactions between write and read operations. - ## Basic write model [basic-write-model] Every indexing operation in Elasticsearch is first resolved to a replication group using [routing](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-create), typically based on the document ID. Once the replication group has been determined, the operation is forwarded internally to the current *primary shard* of the group. This stage of indexing is referred to as the *coordinating stage*. @@ -36,7 +34,6 @@ Each in-sync replica copy performs the indexing operation locally so that it has These indexing stages (coordinating, primary, and replica) are sequential. To enable internal retries, the lifetime of each stage encompasses the lifetime of each subsequent stage. For example, the coordinating stage is not complete until each primary stage, which may be spread out across different primary shards, has completed. Each primary stage will not complete until the in-sync replicas have finished indexing the docs locally and responded to the replica requests. - ### Failure handling [_failure_handling] Many things can go wrong during indexing — disks can get corrupted, nodes can be disconnected from each other, or some configuration mistake could cause an operation to fail on a replica despite it being successful on the primary. These are infrequent but the primary has to respond to them. @@ -53,8 +50,6 @@ This is a valid scenario that can happen due to index configuration or simply be :::: - - ## Basic read model [_basic_read_model] Reads in Elasticsearch can be very lightweight lookups by ID or a heavy search request with complex aggregations that take non-trivial CPU power. One of the beauties of the primary-backup model is that it keeps all shard copies identical (with the exception of in-flight operations). As such, a single in-sync copy is sufficient to serve read requests. @@ -66,7 +61,6 @@ When a read request is received by a node, that node is responsible for forwardi 3. Send shard level read requests to the selected copies. 4. Combine the results and respond. Note that in the case of get by ID look up, only one shard is relevant and this step can be skipped. - ### Shard failures [shard-failures] When a shard fails to respond to a read request, the coordinating node sends the request to another shard copy in the same replication group. Repeated failures can result in no available shard copies. @@ -79,20 +73,15 @@ To ensure fast responses, the following APIs will respond with partial results i Responses containing partial results still provide a `200 OK` HTTP status code. Shard failures are indicated by the `timed_out` and `_shards` fields of the response header. - ## A few simple implications [_a_few_simple_implications] Each of these basic flows determines how Elasticsearch behaves as a system for both reads and writes. Furthermore, since read and write requests can be executed concurrently, these two basic flows interact with each other. This has a few inherent implications: -Efficient reads -: Under normal operation each read operation is performed once for each relevant replication group. Only under failure conditions do multiple copies of the same shard execute the same search. - -Read unacknowledged -: Since the primary first indexes locally and then replicates the request, it is possible for a concurrent read to already see the change before it has been acknowledged. +**Efficient reads**: Under normal operation each read operation is performed once for each relevant replication group. Only under failure conditions do multiple copies of the same shard execute the same search. -Two copies by default -: This model can be fault tolerant while maintaining only two copies of the data. This is in contrast to quorum-based system where the minimum number of copies for fault tolerance is 3. +**Read unacknowledged**: Since the primary first indexes locally and then replicates the request, it is possible for a concurrent read to already see the change before it has been acknowledged. +**Two copies by default**: This model can be fault tolerant while maintaining only two copies of the data. This is in contrast to quorum-based system where the minimum number of copies for fault tolerance is 3. ## Failures [_failures] @@ -104,8 +93,6 @@ A single shard can slow down indexing Dirty reads : An isolated primary can expose writes that will not be acknowledged. This is caused by the fact that an isolated primary will only realize that it is isolated once it sends requests to its replicas or when reaching out to the master. At that point the operation is already indexed into the primary and can be read by a concurrent read. Elasticsearch mitigates this risk by pinging the master every second (by default) and rejecting indexing operations if no master is known. - ## The Tip of the Iceberg [_the_tip_of_the_iceberg] This document provides a high level overview of how Elasticsearch deals with data. Of course, there is much more going on under the hood. Things like primary terms, cluster state publishing, and master election all play a role in keeping this system behaving correctly. This document also doesn’t cover known and important bugs (both closed and open). We recognize that [GitHub is hard to keep up with](https://github.com/elastic/elasticsearch/issues?q=label%3Aresiliency). To help people stay on top of those, we maintain a dedicated [resiliency page](https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html) on our website. We strongly advise reading it. - diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md index 653dec5050..ef514484a5 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md @@ -17,15 +17,12 @@ Over the course of normal operation, Elasticsearch allocates shard copies to nod To learn about optimizing the number and size of shards in your cluster, refer to [Size your shards](../production-guidance/optimize-performance/size-shards.md). To learn about how read and write operations are replicated across shards and shard copies, refer to [Reading and writing documents](reading-and-writing-documents.md). :::: - - ## Shard allocation [shard-allocation] Shard allocation is the process of assigning shard copies to nodes. This can happen during initial recovery, replica allocation, rebalancing, when nodes are added to or removed from the cluster, or when cluster or index settings that impact allocation are updated. By default, the primary and replica shard copies for an index can be allocated to any node in the cluster, and may be relocated to rebalance the cluster. - ### Adjust shard allocation settings [_adjust_shard_allocation_settings] You can control how shard copies are allocated using the following settings: @@ -33,7 +30,6 @@ You can control how shard copies are allocated using the following settings: * [Cluster-level shard allocation settings](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md): Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to [allocate nodes availability zones](shard-allocation-relocation-recovery/shard-allocation-awareness.md), or prevent certain nodes from being used so you can perform maintenance. * [Index-level shard allocation settings](shard-allocation-relocation-recovery/index-level-shard-allocation.md): Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes. - ### Monitor shard allocation [_monitor_shard_allocation] If a shard copy is unassigned, it means that the shard copy is not allocated to any node in the cluster. This can happen if there are not enough nodes in the cluster to allocate the shard copy, or if the shard copy can’t be allocated to any node that satisfies the shard allocation filtering rules. When a shard copy is unassigned, your cluster is considered unhealthy and returns a yellow or red cluster health status. @@ -46,7 +42,6 @@ You can use the following APIs to monitor shard allocation: [Learn more about troubleshooting unassigned shard copies and recovering your cluster health](../../troubleshoot/elasticsearch/red-yellow-cluster-status.md). - ## Shard recovery [shard-recovery] Shard recovery is the process of initializing a shard copy, such as restoring a primary shard from a snapshot or creating a replica shard from a primary shard. When a shard recovery completes, the recovered shard is available for search and indexing. @@ -62,7 +57,6 @@ Recovery automatically occurs during the following processes: You can determine the cause of a shard recovery using the [recovery](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-recovery) or [cat recovery](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-recovery) APIs. - ### Adjust shard recovery settings [_adjust_shard_recovery_settings] To control how shards are recovered, for example the resources that can be used by recovery operations, and which indices should be prioritized for recovery, you can adjust the following settings: @@ -73,7 +67,6 @@ To control how shards are recovered, for example the resources that can be used Shard recovery operations also respect general shard allocation settings. - ### Monitor shard recovery [_monitor_shard_recovery] You can use the following APIs to monitor shard allocation: @@ -81,18 +74,14 @@ You can use the following APIs to monitor shard allocation: * View a list of in-progress and completed recoveries using the [cat recovery API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-recovery) * View detailed information about a specific recovery using the [index recovery API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-recovery) - ## Shard relocation [shard-relocation] Shard relocation is the process of moving shard copies from one node to another. This can happen when a node joins or leaves the cluster, or when the cluster is rebalancing. When a shard copy is relocated, it is created as a new shard copy on the target node. When the shard copy is fully allocated and recovered, the old shard copy is deleted. If the shard copy being relocated is a primary, then the new shard copy is marked as primary before the old shard copy is deleted. - ### Adjust shard relocation settings [_adjust_shard_relocation_settings] You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the [cluster-level shard allocation settings](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md). -Shard relocation operations also respect shard allocation and recovery settings. - - +Shard relocation operations also respect shard allocation and recovery settings. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md index 3bffc7e50d..afb5f3fb92 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md @@ -51,14 +51,12 @@ With delayed allocation enabled, the above scenario changes to look like this: This setting will not affect the promotion of replicas to primaries, nor will it affect the assignment of replicas that have not been assigned previously. In particular, delayed allocation does not come into effect after a full cluster restart. Also, in case of a master failover situation, elapsed delay time is forgotten (i.e. reset to the full initial delay). :::: - ## Cancellation of shard relocation [_cancellation_of_shard_relocation] If delayed allocation times out, the master assigns the missing shards to another node which will start recovery. If the missing node rejoins the cluster, and its shards still have the same sync-id as the primary, shard relocation will be cancelled and the synced shard will be used for recovery instead. For this reason, the default `timeout` is set to just one minute: even if shard relocation begins, cancelling recovery in favour of the synced shard is cheap. - ## Monitoring delayed unassigned shards [_monitoring_delayed_unassigned_shards] The number of shards whose allocation has been delayed by this timeout setting can be viewed with the [cluster health API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health): @@ -69,8 +67,6 @@ GET _cluster/health <1> 1. This request will return a `delayed_unassigned_shards` value. - - ## Removing a node permanently [_removing_a_node_permanently] If a node is not going to return and you would like Elasticsearch to allocate the missing shards immediately, just update the timeout to zero: @@ -84,6 +80,4 @@ PUT _all/_settings } ``` -You can reset the timeout as soon as the missing shards have started to recover. - - +You can reset the timeout as soon as the missing shards have started to recover. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index cb03cae8aa..858850b6db 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -24,6 +24,164 @@ mapped_urls: The documentation team is working to combine content pulled from the following pages: -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md) -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md) -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md) \ No newline at end of file +* [/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md) +* [/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md) +* [/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md) + +This module provides per-index settings to control the allocation of shards to nodes: + +* [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. +* [Delayed allocation](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md): Delaying allocation of unassigned shards caused by a node leaving. +* [Total shards per node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/total-shards-per-node.md): A hard limit on the number of shards from the same index per node. +* [Data tier allocation](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md): Controls the allocation of indices to [data tiers](../../../manage-data/lifecycle/data-tiers.md). + +## Index-level shard allocation filtering [shard-allocation-filtering] + +You can use shard allocation filters to control where {{es}} allocates shards of a particular index. These per-index filters are applied in conjunction with [cluster-wide allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-shard-allocation-filtering) and [allocation awareness](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md). + +Shard allocation filters can be based on [custom node attributes](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md#custom-node-attributes) or the built-in `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` attributes. [Index lifecycle management](../../../manage-data/lifecycle/index-lifecycle-management.md) uses filters based on custom node attributes to determine how to reallocate shards when moving between phases. + +The `cluster.routing.allocation` settings are dynamic, enabling existing indices to be moved immediately from one set of nodes to another. Shards are only relocated if it is possible to do so without breaking another routing constraint, such as never allocating a primary and replica shard on the same node. + +For example, you could use a custom node attribute to indicate a node’s performance characteristics and use shard allocation filtering to route shards for a particular index to the most appropriate class of hardware. + + +### Enabling index-level shard allocation filtering [index-allocation-filters] + +To filter based on a custom node attribute: + +1. Specify the filter characteristics with a custom node attribute in each node’s `elasticsearch.yml` configuration file. For example, if you have `small`, `medium`, and `big` nodes, you could add a `size` attribute to filter based on node size. + + ```yaml + node.attr.size: medium + ``` + + You can also set custom attributes when you start a node: + + ```sh + ./bin/elasticsearch -Enode.attr.size=medium + ``` + +2. Add a routing allocation filter to the index. The `index.routing.allocation` settings support three types of filters: `include`, `exclude`, and `require`. For example, to tell {{es}} to allocate shards from the `test` index to either `big` or `medium` nodes, use `index.routing.allocation.include`: + + ```console + PUT test/_settings + { + "index.routing.allocation.include.size": "big,medium" + } + ``` + + If you specify multiple filters the following conditions must be satisfied simultaneously by a node in order for shards to be relocated to it: + + * If any `require` type conditions are specified, all of them must be satisfied + * If any `exclude` type conditions are specified, none of them may be satisfied + * If any `include` type conditions are specified, at least one of them must be satisfied + + For example, to move the `test` index to `big` nodes in `rack1`, you could specify: + + ```console + PUT test/_settings + { + "index.routing.allocation.require.size": "big", + "index.routing.allocation.require.rack": "rack1" + } + ``` + + + +### Index allocation filter settings [index-allocation-settings] + +`index.routing.allocation.include.{{attribute}}` +: Assign the index to a node whose `{{attribute}}` has at least one of the comma-separated values. + +`index.routing.allocation.require.{{attribute}}` +: Assign the index to a node whose `{{attribute}}` has *all* of the comma-separated values. + +`index.routing.allocation.exclude.{{attribute}}` +: Assign the index to a node whose `{{attribute}}` has *none* of the comma-separated values. + +The index allocation settings support the following built-in attributes: + +`_name` +: Match nodes by node name + +`_host_ip` +: Match nodes by host IP address (IP associated with hostname) + +`_publish_ip` +: Match nodes by publish IP address + +`_ip` +: Match either `_host_ip` or `_publish_ip` + +`_host` +: Match nodes by hostname + +`_id` +: Match nodes by node id + +`_tier` +: Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md) + +::::{note} +`_tier` filtering is based on [node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. +:::: + + +You can use wildcards when specifying attribute values, for example: + +```console +PUT test/_settings +{ + "index.routing.allocation.include._ip": "192.168.2.*" +} +``` + +## Index recovery prioritization [recovery-prioritization] + +Unallocated shards are recovered in order of priority, whenever possible. Indices are sorted into priority order as follows: + +* the optional `index.priority` setting (higher before lower) +* the index creation date (higher before lower) +* the index name (higher before lower) + +This means that, by default, newer indices will be recovered before older indices. + +Use the per-index dynamically updatable `index.priority` setting to customise the index prioritization order. For instance: + +```console +PUT index_1 + +PUT index_2 + +PUT index_3 +{ + "settings": { + "index.priority": 10 + } +} + +PUT index_4 +{ + "settings": { + "index.priority": 5 + } +} +``` + +In the above example: + +* `index_3` will be recovered first because it has the highest `index.priority`. +* `index_4` will be recovered next because it has the next highest priority. +* `index_2` will be recovered next because it was created more recently. +* `index_1` will be recovered last. + +This setting accepts an integer, and can be updated on a live index with the [update index settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-settings): + +```console +PUT index_4/_settings +{ + "index.priority": 1 +} +``` + From 0c3cec88dd402bee4aea5838b702c6fd46d01a2e Mon Sep 17 00:00:00 2001 From: George Wallace Date: Fri, 21 Feb 2025 14:54:05 -0700 Subject: [PATCH 04/19] further updateS --- deploy-manage/distributed-architecture.md | 11 ++++++++--- .../clusters-nodes-shards/node-roles.md | 6 +----- .../discovery-cluster-formation.md | 2 ++ .../cluster-fault-detection.md | 2 ++ .../cluster-state-overview.md | 2 ++ .../discovery-hosts-providers.md | 2 ++ .../modules-discovery-bootstrap-cluster.md | 2 ++ .../modules-discovery-quorums.md | 2 ++ .../modules-discovery-voting.md | 2 ++ .../index-level-shard-allocation.md | 10 +++------- .../shard-allocation-awareness.md | 2 ++ 11 files changed, 28 insertions(+), 15 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index b9b00e4a7c..db3a337120 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -19,10 +19,15 @@ mapped_pages: The topics in this section provides information about the architecture of {{es}} and how it stores and retrieves data: -* [Nodes and shards](distributed-architecture/clusters-nodes-shards.md): Learn about the basic building blocks of an {{es}} cluster, including nodes, shards, primaries, and replicas. -* [Node roles](distributed-architecture/clusters-nodes-shards/node-roles.md): Learn about the different roles that nodes can have in an {{es}} cluster. +::::{note} +{{serverless-full}} scales with your workload and automates nodes, shards, and replicas for you. Some of the content in this section does not apply to you if you are using {{serverless-full}}. +:::: + +* [Cluster, nodes, and shards](distributed-architecture/clusters-nodes-shards.md): Learn about the basic building blocks of an {{es}} cluster, including nodes, shards, primaries, and replicas. + * [Node roles](distributed-architecture/clusters-nodes-shards/node-roles.md): Learn about the different roles that nodes can have in an {{es}} cluster. * [Reading and writing documents](distributed-architecture/reading-and-writing-documents.md): Learn how {{es}} replicates read and write operations across shards and shard copies. * [Shard allocation, relocation, and recovery](distributed-architecture/shard-allocation-relocation-recovery.md): Learn how {{es}} allocates and balances shards across nodes. * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. -* [Shard request cache](elasticsearch://reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. +* [Disocvery and cluster formation](distributed-architecture/discovery-cluster-formation.md): Learn about the cluster formation process including voting, adding nodes and publishing the cluster state. +* [Shard request cache](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. diff --git a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md index 8f759378b9..febd299f4f 100644 --- a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md +++ b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md @@ -9,7 +9,6 @@ Any time that you start an instance of {{es}}, you are starting a *node*. A coll Each node performs one or more roles. Roles control the behavior of the node in the cluster. - ## Set node roles [set-node-roles] You define a node’s roles by setting `node.roles` in [`elasticsearch.yml`](../../deploy/self-managed/configure-elasticsearch.md). If you set `node.roles`, the node is only assigned the roles you specify. If you don’t set `node.roles`, the node is assigned the following roles: @@ -30,10 +29,7 @@ You define a node’s roles by setting `node.roles` in [`elasticsearch.yml`](../ If you set `node.roles`, ensure you specify every node role your cluster needs. Every cluster requires the following node roles: * `master` -* - - `data_content` and `data_hot`
OR
`data` - +* `data_content` and `data_hot`
OR
`data` Some {{stack}} features also require specific node roles: diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index a09a45e33f..54ab09a200 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html +applies_to: + stack: --- # Discovery and cluster formation [modules-discovery] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-fault-detection.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-fault-detection.md index 6fb3151682..349644bb19 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-fault-detection.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-fault-detection.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-fault-detection.html +applies_to: + stack: --- # Cluster fault detection [cluster-fault-detection] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-state-overview.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-state-overview.md index 1c29cac851..65c8df5708 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-state-overview.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/cluster-state-overview.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-state-overview.html +applies_to: + stack: --- # Cluster state [cluster-state-overview] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md index c032a857af..adf675cfff 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/discovery-hosts-providers.html +applies_to: + stack: --- # Discovery [discovery-hosts-providers] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md index 7e4e850938..f5ece44bf6 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-bootstrap-cluster.html +applies_to: + stack: --- # Bootstrapping a cluster [modules-discovery-bootstrap-cluster] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md index eba853009c..c6f661c9ca 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-quorums.html +applies_to: + stack: --- # Quorum-based decision making [modules-discovery-quorums] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md index 4de16d8815..5190dd7b16 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-voting.html +applies_to: + stack: --- # Voting configurations [modules-discovery-voting] diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index 858850b6db..6a215478fb 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -28,7 +28,7 @@ The documentation team is working to combine content pulled from the following p * [/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md) * [/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md) -This module provides per-index settings to control the allocation of shards to nodes: +In Elasticsearch, per-index settings allow you to control the allocation of shards to nodes through index-level shard allocation settings. These settings enable you to specify preferences or constraints for where shards of a particular index should reside. This includes allocating shards to nodes with specific attributes or avoiding certain nodes. This level of control helps optimize resource utilization, balance load, and ensure data redundancy and availability according to your deployment's specific requirements. In addition to the content in this article, there are additional resources: * [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. * [Delayed allocation](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md): Delaying allocation of unassigned shards caused by a node leaving. @@ -87,9 +87,7 @@ To filter based on a custom node attribute: } ``` - - -### Index allocation filter settings [index-allocation-settings] +### Index allocation filter settings [index-allocation-settings] `index.routing.allocation.include.{{attribute}}` : Assign the index to a node whose `{{attribute}}` has at least one of the comma-separated values. @@ -127,7 +125,6 @@ The index allocation settings support the following built-in attributes: `_tier` filtering is based on [node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. :::: - You can use wildcards when specifying attribute values, for example: ```console @@ -183,5 +180,4 @@ PUT index_4/_settings { "index.priority": 1 } -``` - +``` \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md index d91216a9bd..9866160319 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-awareness.html +applies_to: + stack: --- # Shard allocation awareness [shard-allocation-awareness] From e3cf73fafff85f25ab5104defe11a31a6706277f Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:27:14 -0600 Subject: [PATCH 05/19] fixing issues --- .../discovery-hosts-providers.md | 15 ++- .../modules-discovery-bootstrap-cluster.md | 7 -- .../modules-discovery-quorums.md | 3 - .../modules-discovery-voting.md | 5 - .../kibana-tasks-management.md | 44 -------- .../index-level-shard-allocation.md | 74 ++++--------- .../index-modules-allocation.md | 14 --- .../recovery-prioritization.md | 48 --------- .../shard-allocation-filtering.md | 102 ------------------ 9 files changed, 32 insertions(+), 280 deletions(-) delete mode 100644 raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md delete mode 100644 raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md delete mode 100644 raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md index adf675cfff..5c0673fe61 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md @@ -25,7 +25,6 @@ By default the cluster formation module offers two seed hosts providers to confi Each seed hosts provider yields the IP addresses or hostnames of the seed nodes. If it returns any hostnames then these are resolved to IP addresses using a DNS lookup. If a hostname resolves to multiple IP addresses then {{es}} tries to find a seed node at all of these addresses. If the hosts provider does not explicitly give the TCP port of the node by then, it will implicitly use the first port in the port range given by `transport.profiles.default.port`, or by `transport.port` if `transport.profiles.default.port` is not set. The number of concurrent lookups is controlled by `discovery.seed_resolver.max_concurrent_resolvers` which defaults to `10`, and the timeout for each lookup is controlled by `discovery.seed_resolver.timeout` which defaults to `5s`. Note that DNS lookups are subject to [JVM DNS caching](../../deploy/self-managed/networkaddress-cache-ttl.md). - #### Settings-based seed hosts provider [settings-based-hosts-provider] The settings-based seed hosts provider uses a node setting to configure a static list of the addresses of the seed nodes. These addresses can be given as hostnames or IP addresses; hosts specified as hostnames are resolved to IP addresses during each round of discovery. @@ -42,8 +41,6 @@ discovery.seed_hosts: 1. The port will default to `transport.profiles.default.port` and fallback to `transport.port` if not specified. 2. If a hostname resolves to multiple IP addresses, {{es}} will attempt to connect to every resolved address. - - #### File-based seed hosts provider [file-based-hosts-provider] The file-based seed hosts provider configures a list of hosts via an external file. {{es}} reloads this file when it changes, so that the list of seed nodes can change dynamically without needing to restart each node. For example, this gives a convenient mechanism for an {{es}} instance that is run in a Docker container to be dynamically supplied with a list of IP addresses to connect to when those IP addresses may not be known at node startup. @@ -74,19 +71,31 @@ Host names are allowed instead of IP addresses and are resolved by DNS as descri You can also add comments to this file. All comments must appear on their lines starting with `#` (i.e. comments cannot start in the middle of a line). +<<<<<<< HEAD +======= +>>>>>>> a4b41272 (update) #### EC2 hosts provider [ec2-hosts-provider] The [EC2 discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-ec2.md) adds a hosts provider that uses the [AWS API](https://github.com/aws/aws-sdk-java) to find a list of seed nodes. +<<<<<<< HEAD +======= +>>>>>>> a4b41272 (update) #### Azure Classic hosts provider [azure-classic-hosts-provider] The [Azure Classic discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-azure-classic.md) adds a hosts provider that uses the Azure Classic API find a list of seed nodes. +<<<<<<< HEAD #### Google Compute Engine hosts provider [gce-hosts-provider] The [GCE discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes. +======= +#### Google Compute Engine hosts provider [gce-hosts-provider] + +The [GCE discovery plugin](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes. +>>>>>>> a4b41272 (update) diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md index f5ece44bf6..2ec224b321 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-bootstrap-cluster.md @@ -25,7 +25,6 @@ If you leave `cluster.initial_master_nodes` in place once the cluster has formed :::: - The simplest way to create a new cluster is for you to select one of your master-eligible nodes that will bootstrap itself into a single-node cluster, which all the other nodes will then join. This simple approach is not resilient to failures until the other master-eligible nodes have joined the cluster. For example, if you have a master-eligible node with [node name](../../deploy/self-managed/important-settings-configuration.md#node-name) `master-a` then configure it as follows (omitting `cluster.initial_master_nodes` from the configuration of all other nodes): ```yaml @@ -45,7 +44,6 @@ cluster.initial_master_nodes: You must set `cluster.initial_master_nodes` to the same list of nodes on each node on which it is set in order to be sure that only a single cluster forms during bootstrapping. If `cluster.initial_master_nodes` varies across the nodes on which it is set then you may bootstrap multiple clusters. It is usually not possible to recover from this situation without losing data. :::: - ::::{admonition} Node name formats must match :name: modules-discovery-bootstrap-cluster-fqdns @@ -62,12 +60,10 @@ This message shows the node names `master-a.example.com` and `master-b.example.c :::: - ## Choosing a cluster name [bootstrap-cluster-name] The [`cluster.name`](elasticsearch://reference/elasticsearch/configuration-reference/miscellaneous-cluster-settings.md#cluster-name) setting enables you to create multiple clusters which are separated from each other. Nodes verify that they agree on their cluster name when they first connect to each other, and Elasticsearch will only form a cluster from nodes that all have the same cluster name. The default value for the cluster name is `elasticsearch`, but it is recommended to change this to reflect the logical name of the cluster. - ## Auto-bootstrapping in development mode [bootstrap-auto-bootstrap] By default each node will automatically bootstrap itself into a single-node cluster the first time it starts. If any of the following settings are configured then auto-bootstrapping will not take place: @@ -100,6 +96,3 @@ If you intended to form a new multi-node cluster but instead bootstrapped a coll 6. Remove `cluster.initial_master_nodes` from every node’s configuration. :::: - - - diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md index c6f661c9ca..c7fb6699a6 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md @@ -20,15 +20,12 @@ If you stop half or more of the nodes in the voting configuration at the same ti :::: - After a master-eligible node has joined or left the cluster the elected master may issue a cluster-state update that adjusts the voting configuration to match, and this can take a short time to complete. It is important to wait for this adjustment to complete before removing more nodes from the cluster. See [Removing master-eligible nodes](../../maintenance/add-and-remove-elasticsearch-nodes.md#modules-discovery-removing-nodes) for more information. - ## Master elections [_master_elections] Elasticsearch uses an election process to agree on an elected master node, both at startup and if the existing elected master fails. Any master-eligible node can start an election, and normally the first election that takes place will succeed. Elections only usually fail when two nodes both happen to start their elections at about the same time, so elections are scheduled randomly on each node to reduce the probability of this happening. Nodes will retry elections until a master is elected, backing off on failure, so that eventually an election will succeed (with arbitrarily high probability). The scheduling of master elections are controlled by the [master election settings](elasticsearch://reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md#master-election-settings). - ## Cluster maintenance, rolling restarts and migrations [_cluster_maintenance_rolling_restarts_and_migrations] Many cluster maintenance tasks involve temporarily shutting down one or more nodes and then starting them back up again. By default {{es}} can remain available if one of its master-eligible nodes is taken offline, such as during a rolling upgrade. Furthermore, if multiple nodes are stopped and then started again then it will automatically recover, such as during a full cluster restart. There is no need to take any further action with the APIs described here in these cases, because the set of master nodes is not changing permanently. diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md index 5190dd7b16..f12195a401 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-voting.md @@ -31,7 +31,6 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_con The current voting configuration is not necessarily the same as the set of all available master-eligible nodes in the cluster. Altering the voting configuration involves taking a vote, so it takes some time to adjust the configuration as nodes join or leave the cluster. Also, there are situations where the most resilient configuration includes unavailable nodes or does not include some available nodes. In these situations, the voting configuration differs from the set of available master-eligible nodes in the cluster. :::: - Larger voting configurations are usually more resilient, so Elasticsearch normally prefers to add master-eligible nodes to the voting configuration after they join the cluster. Similarly, if a node in the voting configuration leaves the cluster and there is another master-eligible node in the cluster that is not in the voting configuration then it is preferable to swap these two nodes over. The size of the voting configuration is thus unchanged but its resilience increases. It is not so straightforward to automatically remove nodes from the voting configuration after they have left the cluster. Different strategies have different benefits and drawbacks, so the right choice depends on how the cluster will be used. You can control whether the voting configuration automatically shrinks by using the [`cluster.auto_shrink_voting_configuration` setting](elasticsearch://reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md). @@ -40,19 +39,16 @@ It is not so straightforward to automatically remove nodes from the voting confi If `cluster.auto_shrink_voting_configuration` is set to `true` (which is the default and recommended value) and there are at least three master-eligible nodes in the cluster, Elasticsearch remains capable of processing cluster state updates as long as all but one of its master-eligible nodes are healthy. :::: - There are situations in which Elasticsearch might tolerate the loss of multiple nodes, but this is not guaranteed under all sequences of failures. If the `cluster.auto_shrink_voting_configuration` setting is `false`, you must remove departed nodes from the voting configuration manually. Use the [voting exclusions API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-post-voting-config-exclusions) to achieve the desired level of resilience. No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the event of the failure of some of its nodes and the administrative tasks that must be performed as nodes join and leave the cluster. - ## Even numbers of master-eligible nodes [_even_numbers_of_master_eligible_nodes] There should normally be an odd number of master-eligible nodes in a cluster. If there is an even number, Elasticsearch leaves one of them out of the voting configuration to ensure that it has an odd size. This omission does not decrease the failure-tolerance of the cluster. In fact, improves it slightly: if the cluster suffers from a network partition that divides it into two equally-sized halves then one of the halves will contain a majority of the voting configuration and will be able to keep operating. If all of the votes from master-eligible nodes were counted, neither side would contain a strict majority of the nodes and so the cluster would not be able to make any progress. For instance if there are four master-eligible nodes in the cluster and the voting configuration contained all of them, any quorum-based decision would require votes from at least three of them. This situation means that the cluster can tolerate the loss of only a single master-eligible node. If this cluster were split into two equal halves, neither half would contain three master-eligible nodes and the cluster would not be able to make any progress. If the voting configuration contains only three of the four master-eligible nodes, however, the cluster is still only fully tolerant to the loss of one node, but quorum-based decisions require votes from two of the three voting nodes. In the event of an even split, one half will contain two of the three voting nodes so that half will remain available. - ## Setting the initial voting configuration [_setting_the_initial_voting_configuration] When a brand-new cluster starts up for the first time, it must elect its first master node. To do this election, it needs to know the set of master-eligible nodes whose votes should count. This initial voting configuration is known as the *bootstrap configuration* and is set in the [cluster bootstrapping process](modules-discovery-bootstrap-cluster.md). @@ -65,5 +61,4 @@ If the bootstrap configuration is not set correctly, when you start a brand-new To illustrate the problem with configuring each node to expect a certain cluster size, imagine starting up a three-node cluster in which each node knows that it is going to be part of a three-node cluster. A majority of three nodes is two, so normally the first two nodes to discover each other form a cluster and the third node joins them a short time later. However, imagine that four nodes were erroneously started instead of three. In this case, there are enough nodes to form two separate clusters. Of course if each node is started manually then it’s unlikely that too many nodes are started. If you’re using an automated orchestrator, however, it’s certainly possible to get into this situation-- particularly if the orchestrator is not resilient to failures such as network partitions. :::: - The initial quorum is only required the very first time a whole cluster starts up. New nodes joining an established cluster can safely obtain all the information they need from the elected master. Nodes that have previously been part of a cluster will have stored to disk all the information that is required when they restart. diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index c7031bdf3a..19d4a612a7 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -45,31 +45,17 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ :::: -<<<<<<< HEAD - - -======= ->>>>>>> f70338d9 (updates) ## Deployment considerations [_deployment_considerations] {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). -<<<<<<< HEAD - -======= ->>>>>>> f70338d9 (updates) ## Scaling guidance [task-manager-scaling-guidance] How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage. -<<<<<<< HEAD - -### Default scale [task-manager-default-scaling] -======= ### Default scale [task-manager-default-scaling] ->>>>>>> f70338d9 (updates) By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). @@ -79,21 +65,11 @@ By [estimating a rough throughput requirement](#task-manager-rough-throughput-es For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md). -<<<<<<< HEAD - -### Scaling horizontally [task-manager-scaling-horizontally] - -At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload. - - -### Scaling vertically [task-manager-scaling-vertically] -======= ### Scaling horizontally [task-manager-scaling-horizontally] At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload. ### Scaling vertically [task-manager-scaling-vertically] ->>>>>>> f70338d9 (updates) Other times it, might be preferable to increase the throughput of individual {{kib}} instances. @@ -101,12 +77,7 @@ Tweak the capacity with the [`xpack.task_manager.capacity`](kibana://reference/c Tweak the poll interval with the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher. -<<<<<<< HEAD - -### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] -======= ### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] ->>>>>>> f70338d9 (updates) Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case. @@ -123,12 +94,7 @@ Task Manager, like the rest of the Elastic Stack, is designed to scale horizonta Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination. -<<<<<<< HEAD - -### Rough throughput estimation [task-manager-rough-throughput-estimation] -======= ### Rough throughput estimation [task-manager-rough-throughput-estimation] ->>>>>>> f70338d9 (updates) Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide. @@ -138,11 +104,6 @@ A default {{kib}} instance can support up to `200/tpm`. #### Automatic estimation [_automatic_estimation] -<<<<<<< HEAD -#### Automatic estimation [_automatic_estimation] - -======= ->>>>>>> f70338d9 (updates) ::::{warning} This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: @@ -164,11 +125,6 @@ When evaluating the proposed {{kib}} instance number under `proposed.provisioned :::: -<<<<<<< HEAD - - -======= ->>>>>>> f70338d9 (updates) #### Manual estimation [_manual_estimation] By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement. diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index 6a215478fb..e4f0cc9873 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -3,31 +3,12 @@ mapped_urls: - https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-filtering.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery-prioritization.html +applies_to: + stack: --- # Index-level shard allocation -% What needs to be done: Refine - -% Scope notes: also reference https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tier-shard-filtering.html - -% Use migrated content from existing pages that map to this page: - -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md -% Notes: conceptual content -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md -% Notes: conceptual content -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md -% Notes: conceptual content - -⚠️ **This page is a work in progress.** ⚠️ - -The documentation team is working to combine content pulled from the following pages: - -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md) -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md) -* [/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md](../../../raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md) - In Elasticsearch, per-index settings allow you to control the allocation of shards to nodes through index-level shard allocation settings. These settings enable you to specify preferences or constraints for where shards of a particular index should reside. This includes allocating shards to nodes with specific attributes or avoiding certain nodes. This level of control helps optimize resource utilization, balance load, and ensure data redundancy and availability according to your deployment's specific requirements. In addition to the content in this article, there are additional resources: * [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. @@ -45,8 +26,7 @@ The `cluster.routing.allocation` settings are dynamic, enabling existing indices For example, you could use a custom node attribute to indicate a node’s performance characteristics and use shard allocation filtering to route shards for a particular index to the most appropriate class of hardware. - -### Enabling index-level shard allocation filtering [index-allocation-filters] +### Enabling index-level shard allocation filtering [index-allocation-filters] To filter based on a custom node attribute: @@ -89,39 +69,25 @@ To filter based on a custom node attribute: ### Index allocation filter settings [index-allocation-settings] -`index.routing.allocation.include.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has at least one of the comma-separated values. - -`index.routing.allocation.require.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has *all* of the comma-separated values. - -`index.routing.allocation.exclude.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has *none* of the comma-separated values. +| Setting | Description | +|---|---| +|`index.routing.allocation.include.{{attribute}}`| Assign the index to a node whose `{{attribute}}` has at least one of the comma-separated values.| +|`index.routing.allocation.require.{{attribute}}`| Assign the index to a node whose `{{attribute}}` has *all* of the comma-separated values.| +|`index.routing.allocation.exclude.{{attribute}}`| Assign the index to a node whose `{{attribute}}` has *none* of the comma-separated values. | The index allocation settings support the following built-in attributes: -`_name` -: Match nodes by node name - -`_host_ip` -: Match nodes by host IP address (IP associated with hostname) - -`_publish_ip` -: Match nodes by publish IP address - -`_ip` -: Match either `_host_ip` or `_publish_ip` - -`_host` -: Match nodes by hostname - -`_id` -: Match nodes by node id - -`_tier` -: Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md) - -::::{note} +| Attribute | Description| +| --- | --- | +|`_name`| Match nodes by node name | +|`_host_ip`| Match nodes by host IP address (IP associated with hostname) | +|`_publish_ip`| Match nodes by publish IP address | +|`_ip`| Match either `_host_ip` or `_publish_ip` | +| `_host`| Match nodes by hostname | +|`_id`| Match nodes by node id | +|`_tier`| Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md) | + +::::{note} `_tier` filtering is based on [node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. :::: @@ -180,4 +146,4 @@ PUT index_4/_settings { "index.priority": 1 } -``` \ No newline at end of file +``` diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md deleted file mode 100644 index ec7a51e024..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/index-modules-allocation.md +++ /dev/null @@ -1,14 +0,0 @@ -# Index Shard Allocation [index-modules-allocation] - -This module provides per-index settings to control the allocation of shards to nodes: - -* [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. -* [Delayed allocation](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md): Delaying allocation of unassigned shards caused by a node leaving. -* [Total shards per node](elasticsearch://reference/elasticsearch/index-settings/total-shards-per-node.md): A hard limit on the number of shards from the same index per node. -* [Data tier allocation](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation.md): Controls the allocation of indices to [data tiers](../../../manage-data/lifecycle/data-tiers.md). - - - - - - diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md deleted file mode 100644 index 3a3194338a..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/recovery-prioritization.md +++ /dev/null @@ -1,48 +0,0 @@ -# Index recovery prioritization [recovery-prioritization] - -Unallocated shards are recovered in order of priority, whenever possible. Indices are sorted into priority order as follows: - -* the optional `index.priority` setting (higher before lower) -* the index creation date (higher before lower) -* the index name (higher before lower) - -This means that, by default, newer indices will be recovered before older indices. - -Use the per-index dynamically updatable `index.priority` setting to customise the index prioritization order. For instance: - -```console -PUT index_1 - -PUT index_2 - -PUT index_3 -{ - "settings": { - "index.priority": 10 - } -} - -PUT index_4 -{ - "settings": { - "index.priority": 5 - } -} -``` - -In the above example: - -* `index_3` will be recovered first because it has the highest `index.priority`. -* `index_4` will be recovered next because it has the next highest priority. -* `index_2` will be recovered next because it was created more recently. -* `index_1` will be recovered last. - -This setting accepts an integer, and can be updated on a live index with the [update index settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-settings): - -```console -PUT index_4/_settings -{ - "index.priority": 1 -} -``` - diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md deleted file mode 100644 index ff1639946d..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/shard-allocation-filtering.md +++ /dev/null @@ -1,102 +0,0 @@ -# Index-level shard allocation filtering [shard-allocation-filtering] - -You can use shard allocation filters to control where {{es}} allocates shards of a particular index. These per-index filters are applied in conjunction with [cluster-wide allocation filtering](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-shard-allocation-filtering) and [allocation awareness](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md). - -Shard allocation filters can be based on [custom node attributes](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md#custom-node-attributes) or the built-in `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` attributes. [Index lifecycle management](../../../manage-data/lifecycle/index-lifecycle-management.md) uses filters based on custom node attributes to determine how to reallocate shards when moving between phases. - -The `cluster.routing.allocation` settings are dynamic, enabling existing indices to be moved immediately from one set of nodes to another. Shards are only relocated if it is possible to do so without breaking another routing constraint, such as never allocating a primary and replica shard on the same node. - -For example, you could use a custom node attribute to indicate a node’s performance characteristics and use shard allocation filtering to route shards for a particular index to the most appropriate class of hardware. - - -## Enabling index-level shard allocation filtering [index-allocation-filters] - -To filter based on a custom node attribute: - -1. Specify the filter characteristics with a custom node attribute in each node’s `elasticsearch.yml` configuration file. For example, if you have `small`, `medium`, and `big` nodes, you could add a `size` attribute to filter based on node size. - - ```yaml - node.attr.size: medium - ``` - - You can also set custom attributes when you start a node: - - ```sh - ./bin/elasticsearch -Enode.attr.size=medium - ``` - -2. Add a routing allocation filter to the index. The `index.routing.allocation` settings support three types of filters: `include`, `exclude`, and `require`. For example, to tell {{es}} to allocate shards from the `test` index to either `big` or `medium` nodes, use `index.routing.allocation.include`: - - ```console - PUT test/_settings - { - "index.routing.allocation.include.size": "big,medium" - } - ``` - - If you specify multiple filters the following conditions must be satisfied simultaneously by a node in order for shards to be relocated to it: - - * If any `require` type conditions are specified, all of them must be satisfied - * If any `exclude` type conditions are specified, none of them may be satisfied - * If any `include` type conditions are specified, at least one of them must be satisfied - - For example, to move the `test` index to `big` nodes in `rack1`, you could specify: - - ```console - PUT test/_settings - { - "index.routing.allocation.require.size": "big", - "index.routing.allocation.require.rack": "rack1" - } - ``` - - - -## Index allocation filter settings [index-allocation-settings] - -`index.routing.allocation.include.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has at least one of the comma-separated values. - -`index.routing.allocation.require.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has *all* of the comma-separated values. - -`index.routing.allocation.exclude.{{attribute}}` -: Assign the index to a node whose `{{attribute}}` has *none* of the comma-separated values. - -The index allocation settings support the following built-in attributes: - -`_name` -: Match nodes by node name - -`_host_ip` -: Match nodes by host IP address (IP associated with hostname) - -`_publish_ip` -: Match nodes by publish IP address - -`_ip` -: Match either `_host_ip` or `_publish_ip` - -`_host` -: Match nodes by hostname - -`_id` -: Match nodes by node id - -`_tier` -: Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation.md) - -::::{note} -`_tier` filtering is based on [node](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. -:::: - - -You can use wildcards when specifying attribute values, for example: - -```console -PUT test/_settings -{ - "index.routing.allocation.include._ip": "192.168.2.*" -} -``` - From 85f247c1763b625bc12b4f7d4645826656b6ec85 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Wed, 5 Mar 2025 10:31:32 -0700 Subject: [PATCH 06/19] more updates --- deploy-manage/distributed-architecture.md | 3 +++ .../distributed-architecture/discovery-cluster-formation.md | 2 ++ 2 files changed, 5 insertions(+) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index db3a337120..f87c72dd11 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -1,6 +1,9 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/_data_store_architecture.html +applies_to: + stack: + serverless: --- % Update the overview so Kibana is represented too. diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index 54ab09a200..e3183437af 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -5,6 +5,8 @@ applies_to: stack: --- +% Discovery and cluster formation content (7 pages): add introductory note to specify that the endpoints/settings are possibly for self-managed only, and review the content. + # Discovery and cluster formation [modules-discovery] The discovery and cluster formation processes are responsible for discovering nodes, electing a master, forming a cluster, and publishing the cluster state each time it changes. From 5af6889a7974cb7546016f62a14cc60d1db0b8a6 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Wed, 12 Mar 2025 09:08:15 -0600 Subject: [PATCH 07/19] updates --- deploy-manage/distributed-architecture.md | 3 ++- .../distributed-architecture/clusters-nodes-shards.md | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index f87c72dd11..5b54d6aa83 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -23,7 +23,7 @@ applies_to: The topics in this section provides information about the architecture of {{es}} and how it stores and retrieves data: ::::{note} -{{serverless-full}} scales with your workload and automates nodes, shards, and replicas for you. Some of the content in this section does not apply to you if you are using {{serverless-full}}. +{{serverless-full}} scales with your workload and automates nodes, shards, and replicas for you. Some of the content in this section does not apply to you if you are using {{serverless-full}}. Instead, the information in this section will provide you information about how the platform works for you. :::: * [Cluster, nodes, and shards](distributed-architecture/clusters-nodes-shards.md): Learn about the basic building blocks of an {{es}} cluster, including nodes, shards, primaries, and replicas. @@ -33,4 +33,5 @@ The topics in this section provides information about the architecture of {{es}} * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. * [Disocvery and cluster formation](distributed-architecture/discovery-cluster-formation.md): Learn about the cluster formation process including voting, adding nodes and publishing the cluster state. * [Shard request cache](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. +* [Kibana task management](distributed-architecture/kibana-tasks-management.md): Learn how {{kib}} runs background tasks and distribute work across multiple {{kib}} instances to be persistent and scale with your deployment. diff --git a/deploy-manage/distributed-architecture/clusters-nodes-shards.md b/deploy-manage/distributed-architecture/clusters-nodes-shards.md index 6de528e2ee..e4d0f5b8d9 100644 --- a/deploy-manage/distributed-architecture/clusters-nodes-shards.md +++ b/deploy-manage/distributed-architecture/clusters-nodes-shards.md @@ -1,6 +1,9 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/nodes-shards.html +applies_to: + stack: + serverless: --- # Clusters, nodes, and shards [nodes-shards] @@ -14,7 +17,6 @@ Nodes and shards are what make {{es}} distributed and scalable. These concepts a :::: - You can add servers (*nodes*) to a cluster to increase capacity, and {{es}} automatically distributes your data and query load across all of the available nodes. Elastic is able to distribute your data across nodes by subdividing an index into *shards*. Each index in {{es}} is a grouping of one or more physical shards, where each shard is a self-contained Lucene index containing a subset of the documents in the index. By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes, {{es}} increases indexing and query capacity. @@ -23,7 +25,6 @@ There are two types of shards: *primaries* and *replicas*. Each document in an i ::::{tip} The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time, without interrupting indexing or query operations. - :::: Shard copies in your cluster are automatically balanced across nodes to provide scale and high availability. All nodes are aware of all the other nodes in the cluster and can forward client requests to the appropriate node. This allows {{es}} to distribute indexing and query load across the cluster. From 1984bcda00cd2a17b25ffcbd3b1f42e448c0f456 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 11:36:03 -0600 Subject: [PATCH 08/19] updates --- deploy-manage/distributed-architecture.md | 5 +- .../discovery-cluster-formation.md | 4 +- .../discovery-hosts-providers.md | 17 ---- .../kibana-tasks-management.md | 96 +------------------ 4 files changed, 6 insertions(+), 116 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index 5b54d6aa83..94b14f4eba 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -6,8 +6,6 @@ applies_to: serverless: --- -% Update the overview so Kibana is represented too. - % Clarify which topics are relevant for which deployment types (see note from elastic.co/guide/en/elasticsearch/reference/current/nodes-shards.html) % Explain the role of orchestrators on the Clusters, nodes, and shards page, link up. @@ -33,5 +31,4 @@ The topics in this section provides information about the architecture of {{es}} * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. * [Disocvery and cluster formation](distributed-architecture/discovery-cluster-formation.md): Learn about the cluster formation process including voting, adding nodes and publishing the cluster state. * [Shard request cache](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. -* [Kibana task management](distributed-architecture/kibana-tasks-management.md): Learn how {{kib}} runs background tasks and distribute work across multiple {{kib}} instances to be persistent and scale with your deployment. - +* [Kibana task management](distributed-architecture/kibana-tasks-management.md): Learn how {{kib}} runs background tasks and distribute work across multiple {{kib}} instances to be persistent and scale with your deployment. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index e3183437af..08d2186c47 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -5,7 +5,9 @@ applies_to: stack: --- -% Discovery and cluster formation content (7 pages): add introductory note to specify that the endpoints/settings are possibly for self-managed only, and review the content. +::::{important} +The information provided in this section is applicable to all deployment types. However, the configuration settings detailed here are only valid for self-managed {{es}} deployments. For {{ecloud}} and {{serverless}} deployments this seciton should only be used for general information. +:::: # Discovery and cluster formation [modules-discovery] diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md index 5c0673fe61..7ad63296d3 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md @@ -71,31 +71,14 @@ Host names are allowed instead of IP addresses and are resolved by DNS as descri You can also add comments to this file. All comments must appear on their lines starting with `#` (i.e. comments cannot start in the middle of a line). -<<<<<<< HEAD - -======= ->>>>>>> a4b41272 (update) #### EC2 hosts provider [ec2-hosts-provider] The [EC2 discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-ec2.md) adds a hosts provider that uses the [AWS API](https://github.com/aws/aws-sdk-java) to find a list of seed nodes. -<<<<<<< HEAD - -======= ->>>>>>> a4b41272 (update) #### Azure Classic hosts provider [azure-classic-hosts-provider] The [Azure Classic discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-azure-classic.md) adds a hosts provider that uses the Azure Classic API find a list of seed nodes. -<<<<<<< HEAD - #### Google Compute Engine hosts provider [gce-hosts-provider] The [GCE discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes. - - -======= -#### Google Compute Engine hosts provider [gce-hosts-provider] - -The [GCE discovery plugin](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes. ->>>>>>> a4b41272 (update) diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index 19d4a612a7..e7006dc469 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -45,104 +45,12 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ :::: -## Deployment considerations [_deployment_considerations] - {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). -## Scaling guidance [task-manager-scaling-guidance] +By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. -However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage. - -### Default scale [task-manager-default-scaling] - -By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). - In practice, a {{kib}} instance will only achieve the upper bound of `200/tpm` if the duration of task execution is below the polling rate of 3 seconds. For the most part, the duration of tasks is below that threshold, but it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets). -By [estimating a rough throughput requirement](#task-manager-rough-throughput-estimation), you can estimate the number of {{kib}} instances required to reliably execute tasks in a timely manner. An appropriate number of {{kib}} instances can be estimated to match the required scale. - -For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md). - -### Scaling horizontally [task-manager-scaling-horizontally] - -At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload. - -### Scaling vertically [task-manager-scaling-vertically] - -Other times it, might be preferable to increase the throughput of individual {{kib}} instances. - -Tweak the capacity with the [`xpack.task_manager.capacity`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull a higher number of tasks per interval. This setting can impact the performance of each instance as the workload will be higher. - -Tweak the poll interval with the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher. - -### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] - -Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case. - -Scaling {{kib}} instances vertically causes higher resource usage in each {{kib}} instance, as it will perform more concurrent work. Scaling {{kib}} instances horizontally requires a higher degree of coordination, which can impact overall performance. - -A recommended strategy is to follow these steps: - -1. Produce a [rough throughput estimate](#task-manager-rough-throughput-estimation) as a guide to provisioning as many {{kib}} instances as needed. Include any growth in tasks that you predict experiencing in the near future, and a buffer to better address ad-hoc tasks. -2. After provisioning a deployment, assess whether the provisioned {{kib}} instances achieve the required throughput by evaluating the [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md) as described in [Insufficient throughput to handle the scheduled workload](../../troubleshoot/kibana/task-manager.md#task-manager-theory-insufficient-throughput). -3. If the throughput is insufficient, and {{kib}} instances exhibit low resource usage, incrementally scale vertically while [monitoring](../monitor/monitoring-data/kibana-page.md) the impact of these changes. -4. If the throughput is insufficient, and {{kib}} instances are exhibiting high resource usage, incrementally scale horizontally by provisioning new {{kib}} instances and reassess. - -Task Manager, like the rest of the Elastic Stack, is designed to scale horizontally. Take advantage of this ability to ensure mission critical services, such as Alerting, Actions, and Reporting, always have the capacity they need. - -Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination. - -### Rough throughput estimation [task-manager-rough-throughput-estimation] - -Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide. - -Throughput is best thought of as a measurements in tasks per minute. - -A default {{kib}} instance can support up to `200/tpm`. - -#### Automatic estimation [_automatic_estimation] - -::::{warning} -This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. -:::: - -As demonstrated in [Evaluate your capacity estimation](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-capacity-estimation), the Task Manager [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) performs these estimations automatically. - -These estimates are based on historical data and should not be used as predictions, but can be used as a rough guide when scaling the system. - -We recommend provisioning enough {{kib}} instances to ensure a buffer between the observed maximum throughput (as estimated under `observed.max_throughput_per_minute`) and the average required throughput (as estimated under `observed.avg_required_throughput_per_minute`). Otherwise there might be insufficient capacity to handle spikes of ad-hoc tasks. How much of a buffer is needed largely depends on your use case, but keep in mind that estimated throughput takes into account recent spikes and, as long as they are representative of your system’s behaviour, shouldn’t require much of a buffer. - -We recommend provisioning at least as many {{kib}} instances as proposed by `proposed.provisioned_kibana`, but keep in mind that this number is based on the estimated required throughput, which is based on average historical performance, and cannot accurately predict future requirements. - -::::{warning} -Automatic capacity estimation is performed by each {{kib}} instance independently. This estimation is performed by observing the task throughput in that instance, the number of {{kib}} instances executing tasks at that moment in time, and the recurring workload in {{es}}. - -If a {{kib}} instance is idle at the moment of capacity estimation, the number of active {{kib}} instances might be miscounted and the available throughput miscalculated. - -When evaluating the proposed {{kib}} instance number under `proposed.provisioned_kibana`, we highly recommend verifying that the `observed.observed_kibana_instances` matches the number of provisioned {{kib}} instances. - -:::: - -#### Manual estimation [_manual_estimation] - -By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement. - -For example, suppose your current workload reveals a required throughput of `440/tpm`. You can address this scale by provisioning 3 {{kib}} instances, with an upper throughput of `600/tpm`. This scale would provide approximately 25% additional capacity to handle ad-hoc non-recurring tasks and potential growth in recurring tasks. - -Given a deployment of 100 recurring tasks, estimating the required throughput depends on the scheduled cadence. Suppose you expect to run 50 tasks at a cadence of `10s`, the other 50 tasks at `20m`. In addition, you expect a couple dozen non-recurring tasks every minute. - -A non-recurring task requires a single execution, which means that a single {{kib}} instance could execute all 100 tasks in less than a minute, using only half of its capacity. As these tasks are only executed once, the {{kib}} instance will sit idle once all tasks are executed. For that reason, don’t include non-recurring tasks in your *tasks per minute* calculation. Instead, include a buffer in the final *lower bound* to incur the cost of ad-hoc non-recurring tasks. - -A recurring task requires as many executions as its cadence can fit in a minute. A recurring task with a `10s` schedule will require `6/tpm`, as it will execute 6 times per minute. A recurring task with a `20m` schedule only executes 3 times per hour and only requires a throughput of `0.05/tpm`, a number so small it that is difficult to take it into account. - -For this reason, we recommend grouping tasks by *tasks per minute* and *tasks per hour*, as demonstrated in [Evaluate your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), averaging the *per hour* measurement across all minutes. - -It is highly recommended that you maintain at least 20% additional capacity, beyond your expected workload, as spikes in ad-hoc tasks is possible at times of high activity (such as a spike in actions in response to an active alert). - -Given the predicted workload, you can estimate a lower bound throughput of `340/tpm` (`6/tpm` * 50 + `3/tph` * 50 + 20% buffer). As a default, a {{kib}} instance provides a throughput of `200/tpm`. A good starting point for your deployment is to provision 2 {{kib}} instances. You could then monitor their performance and reassess as the required throughput becomes clearer. - -Although this is a *rough* estimate, the *tasks per minute* provides the lower bound needed to execute tasks on time. - -Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime). \ No newline at end of file +For more information on scaling, see [Kibana task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md). From 57f9fe56da7838a0d8e43f0f152deee4251bf9a6 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 11:56:03 -0600 Subject: [PATCH 09/19] finished cleanup of kibana task management --- .../kibana-tasks-management.md | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index e7006dc469..148c2633ba 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -8,7 +8,7 @@ mapped_pages: {{kib}} Task Manager is leveraged by features such as Alerting, Actions, and Reporting to run mission critical work as persistent background tasks. These background tasks distribute work across multiple {{kib}} instances. This has three major benefits: * **Persistence**: All task state and scheduling is stored in {{es}}, so if you restart {{kib}}, tasks will pick up where they left off. -* **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. +* **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. For more information on scaling, see [Kibana task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). * **Load Balancing**: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in {{es}}. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load. ::::{important} @@ -20,18 +20,18 @@ If you lose this index, all scheduled alerts and actions are lost. :::: -## Running background tasks [task-manager-background-tasks] +## How background tasks are managed [task-manager-background-tasks] {{kib}} background tasks are managed as follows: * An {{es}} task index is polled for overdue tasks at 3-second intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting. * Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. +* {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). * Tasks are run on the {{kib}} server. * Task Manager ensures that tasks: * Are only executed once * Are retried when they fail (if configured to do so) * Are rescheduled to run again at a future point in time (if configured to do so) - ::::{important} It is possible for tasks to run late or at an inconsistent schedule. @@ -45,12 +45,5 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ :::: -{{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). - -By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). - -How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. -In practice, a {{kib}} instance will only achieve the upper bound of `200/tpm` if the duration of task execution is below the polling rate of 3 seconds. For the most part, the duration of tasks is below that threshold, but it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets). -For more information on scaling, see [Kibana task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md). From 8fb66bf7bba439bcbbcc1e56d6a767f8215802de Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 13:27:16 -0600 Subject: [PATCH 10/19] fixing links --- deploy-manage/distributed-architecture.md | 10 +--------- .../discovery-cluster-formation.md | 2 +- .../monitor/kibana-task-manager-health-monitoring.md | 2 +- .../kibana-alerting-production-considerations.md | 6 +++--- .../kibana-task-manager-scaling-considerations.md | 4 ++-- troubleshoot/kibana/task-manager.md | 12 ++++++------ 6 files changed, 14 insertions(+), 22 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index 94b14f4eba..29c3b472c4 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -6,14 +6,6 @@ applies_to: serverless: --- -% Clarify which topics are relevant for which deployment types (see note from elastic.co/guide/en/elasticsearch/reference/current/nodes-shards.html) - -% Explain the role of orchestrators on the Clusters, nodes, and shards page, link up. - -% Split the kibana tasks management topic so it's concepts only - guidance goes to design guidance section. - -% Discovery and cluster formation content (7 pages): add introductory note to specify that the endpoints/settings are possibly for self-managed only, and review the content. - # Distributed architecture [_data_store_architecture] {{es}} is a distributed document store. Instead of storing information as rows of columnar data, {{es}} stores complex data structures that have been serialized as JSON documents. When you have multiple {{es}} nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node. @@ -21,7 +13,7 @@ applies_to: The topics in this section provides information about the architecture of {{es}} and how it stores and retrieves data: ::::{note} -{{serverless-full}} scales with your workload and automates nodes, shards, and replicas for you. Some of the content in this section does not apply to you if you are using {{serverless-full}}. Instead, the information in this section will provide you information about how the platform works for you. +{{serverless-full}} scales with your workload and automates nodes, shards, and replicas for you. Some of the content in this section does not apply to you if you are using {{serverless-full}}. Instead, the information in this section will provide you information about how the platform works for you. :::: * [Cluster, nodes, and shards](distributed-architecture/clusters-nodes-shards.md): Learn about the basic building blocks of an {{es}} cluster, including nodes, shards, primaries, and replicas. diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index 08d2186c47..ce3b118ae5 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -6,7 +6,7 @@ applies_to: --- ::::{important} -The information provided in this section is applicable to all deployment types. However, the configuration settings detailed here are only valid for self-managed {{es}} deployments. For {{ecloud}} and {{serverless}} deployments this seciton should only be used for general information. +The information provided in this section is applicable to all deployment types. However, the configuration settings detailed here are only valid for self-managed {{es}} deployments. For {{ecloud}} and {{serverless-full}} deployments this seciton should only be used for general information. :::: # Discovery and cluster formation [modules-discovery] diff --git a/deploy-manage/monitor/kibana-task-manager-health-monitoring.md b/deploy-manage/monitor/kibana-task-manager-health-monitoring.md index 3b237c5bec..71459d5388 100644 --- a/deploy-manage/monitor/kibana-task-manager-health-monitoring.md +++ b/deploy-manage/monitor/kibana-task-manager-health-monitoring.md @@ -99,7 +99,7 @@ The health monitoring API exposes three sections: `configuration`, `workload` an | Configuration | This section summarizes the current configuration of Task Manager. This includes dynamic configurations that change over time, such as `poll_interval` and `max_workers`, which can adjust in reaction to changing load on the system. | | Workload | This section summarizes the work load across the cluster, including the tasks in the system, their types, and current status. | | Runtime | This section tracks execution performance of Task Manager, tracking task *drift*, worker *load*, and execution stats broken down by type, including duration and execution results. | -| Capacity Estimation | This section provides a rough estimate about the sufficiency of its capacity. As the name suggests, these are estimates based on historical data and should not be used as predictions. Use these estimations when following the Task Manager [Scaling guidance](../distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). | +| Capacity Estimation | This section provides a rough estimate about the sufficiency of its capacity. As the name suggests, these are estimates based on historical data and should not be used as predictions. Use these estimations when following the Task Manager [Scaling guidance](../production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). | Each section has a `timestamp` and a `status` that indicates when the last update to this section took place and whether the health of this section was evaluated as `OK`, `Warning` or `Error`. diff --git a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md index 4d872e0eba..b79899238b 100644 --- a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md +++ b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md @@ -11,7 +11,7 @@ mapped_pages: Alerting runs both rule checks and actions as persistent background tasks managed by the Task Manager. -When relying on rules and actions as mission critical services, make sure you follow the [production considerations](../distributed-architecture/kibana-tasks-management.md) for Task Manager. +When relying on rules and actions as mission critical services, make sure you follow the [production considerations](production-guidance/kibana-task-manager-scaling-considerations.md) for Task Manager. ## Running background rule checks and actions [alerting-background-tasks] @@ -37,14 +37,14 @@ For detailed guidance, see [Alerting Troubleshooting](../../explore-analyze/aler ## Scaling guidance [alerting-scaling-guidance] -As rules and actions leverage background tasks to perform the majority of work, scaling Alerting is possible by following the [Task Manager Scaling Guidance](../distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +As rules and actions leverage background tasks to perform the majority of work, scaling Alerting is possible by following the [Task Manager Scaling Guidance](production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). When estimating the required task throughput, keep the following in mind: * Each rule uses a single recurring task that is scheduled to run at the cadence defined by its check interval. * Each action uses a single task. However, because actions are taken per instance, alerts can generate a large number of non-recurring tasks. -It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](../distributed-architecture/kibana-tasks-management.md#task-manager-rough-throughput-estimation) as a *tasks per minute* measurement. +It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-rough-throughput-estimation) as a *tasks per minute* measurement. Predicting the buffer required to account for actions depends heavily on the rule types you use, the amount of alerts they might detect, and the number of actions you might choose to assign to action groups. With that in mind, regularly [monitor the health](../monitor/kibana-task-manager-health-monitoring.md) of your Task Manager instances. diff --git a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md index 487b7eccee..b4fb0a4205 100644 --- a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md +++ b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md @@ -69,7 +69,7 @@ By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This In practice, a {{kib}} instance will only achieve the upper bound of `200/tpm` if the duration of task execution is below the polling rate of 3 seconds. For the most part, the duration of tasks is below that threshold, but it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets). -By [estimating a rough throughput requirement](../distributed-architecture/kibana-tasks-management.md#task-manager-rough-throughput-estimation), you can estimate the number of {{kib}} instances required to reliably execute tasks in a timely manner. An appropriate number of {{kib}} instances can be estimated to match the required scale. +By [estimating a rough throughput requirement](#task-manager-rough-throughput-estimation), you can estimate the number of {{kib}} instances required to reliably execute tasks in a timely manner. An appropriate number of {{kib}} instances can be estimated to match the required scale. For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md). @@ -96,7 +96,7 @@ Scaling {{kib}} instances vertically causes higher resource usage in each {{kib} A recommended strategy is to follow these steps: -1. Produce a [rough throughput estimate](../distributed-architecture/kibana-tasks-management.md#task-manager-rough-throughput-estimation) as a guide to provisioning as many {{kib}} instances as needed. Include any growth in tasks that you predict experiencing in the near future, and a buffer to better address ad-hoc tasks. +1. Produce a [rough throughput estimate](#task-manager-rough-throughput-estimation) as a guide to provisioning as many {{kib}} instances as needed. Include any growth in tasks that you predict experiencing in the near future, and a buffer to better address ad-hoc tasks. 2. After provisioning a deployment, assess whether the provisioned {{kib}} instances achieve the required throughput by evaluating the [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md) as described in [Insufficient throughput to handle the scheduled workload](../../troubleshoot/kibana/task-manager.md#task-manager-theory-insufficient-throughput). 3. If the throughput is insufficient, and {{kib}} instances exhibit low resource usage, incrementally scale vertically while [monitoring](../monitor/monitoring-data/kibana-page.md) the impact of these changes. 4. If the throughput is insufficient, and {{kib}} instances are exhibiting high resource usage, incrementally scale horizontally by provisioning new {{kib}} instances and reassess. diff --git a/troubleshoot/kibana/task-manager.md b/troubleshoot/kibana/task-manager.md index 95080482e0..b81595939a 100644 --- a/troubleshoot/kibana/task-manager.md +++ b/troubleshoot/kibana/task-manager.md @@ -50,7 +50,7 @@ For example: Refer to [Diagnose a root cause for drift](#task-manager-diagnosing-root-cause) for step-by-step instructions on identifying the correct resolution. -*Drift* is often addressed by adjusting the scaling the deployment to better suit your usage. For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +*Drift* is often addressed by adjusting the scaling the deployment to better suit your usage. For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). ## Diagnose a root cause for drift [task-manager-diagnosing-root-cause] @@ -489,7 +489,7 @@ You can infer from these stats that this {{kib}} is using most of its capacity, * The `p90` of `load` is at 100%, and `p50` is also quite high at 80%. This means that there is little to no room for maneuvering, and a spike of work might cause Task Manager to exceed its capacity. * Tasks run soon after their scheduled time, which is to be expected. A `poll_interval` of `3000` milliseconds would often experience a consistent drift of somewhere between `0` and `3000` milliseconds. A `p50 drift` of `2999` suggests that there is room for improvement, and you could benefit from a higher throughput. -For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). $$$task-manager-theory-long-running-tasks$$$ **Theory**: Tasks run for too long, overrunning their schedule @@ -665,7 +665,7 @@ Keep in mind that these stats give you a glimpse at a moment in time, and even t Predicting the required throughput a deployment might need to support Task Manager is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. -[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). Evaluating the preceding health stats in the previous example, you see the following output under `stats.workload.value`: @@ -819,7 +819,7 @@ These rough calculations give you a lower bound to the required throughput, whic Given these inferred attributes, it would be safe to assume that a single {{kib}} instance with default settings **would not** provide the required throughput. It is possible that scaling horizontally by adding a couple more {{kib}} instances will. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). ### Evaluate the Capacity Estimation [task-manager-health-evaluate-the-capacity-estimation] @@ -828,7 +828,7 @@ Task Manager is constantly evaluating its runtime operations and workload. This As the name suggests, these are estimates based on historical data and should not be used as predictions. These estimations should be evaluated alongside the detailed [Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) stats before making changes to infrastructure. These estimations assume all {{kib}} instances are configured identically. -We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). Evaluating the health stats in the previous example, you can see the following output under `stats.capacity_estimation.value`: @@ -912,7 +912,7 @@ Evaluating by these estimates, we can infer some interesting attributes of our s You can infer from these estimates that the capacity in the current system is insufficient and at least one additional {{kib}} instance is required to keep up with the workload. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/distributed-architecture/kibana-tasks-management.md#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). ### Inline scripts are disabled in {{es}} [task-manager-cannot-operate-when-inline-scripts-are-disabled] From d73abe270945eda5f8c3e555359ea093bf9e077d Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 13:41:32 -0600 Subject: [PATCH 11/19] updating links --- .../kibana-alerting-production-considerations.md | 6 +++--- troubleshoot/kibana/task-manager.md | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md index b79899238b..3d581be0ac 100644 --- a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md +++ b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md @@ -11,7 +11,7 @@ mapped_pages: Alerting runs both rule checks and actions as persistent background tasks managed by the Task Manager. -When relying on rules and actions as mission critical services, make sure you follow the [production considerations](production-guidance/kibana-task-manager-scaling-considerations.md) for Task Manager. +When relying on rules and actions as mission critical services, make sure you follow the [production considerations](kibana-task-manager-scaling-considerations.md) for Task Manager. ## Running background rule checks and actions [alerting-background-tasks] @@ -27,7 +27,7 @@ For more details on Task Manager, see [Running background tasks](../distributed- ::::{important} Rule and action tasks can run late or at an inconsistent schedule. This is typically a symptom of the specific usage of the cluster in question. -You can address such issues by tweaking the [Task Manager settings](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) or scaling the deployment to better suit your use case. +You can address such issues by tweaking the [Task Manager settings](kibana://reference/configuration-reference/task-manager-settings.md) or scaling the deployment to better suit your use case. For detailed guidance, see [Alerting Troubleshooting](../../explore-analyze/alerts-cases/alerts/alerting-troubleshooting.md). @@ -44,7 +44,7 @@ When estimating the required task throughput, keep the following in mind: * Each rule uses a single recurring task that is scheduled to run at the cadence defined by its check interval. * Each action uses a single task. However, because actions are taken per instance, alerts can generate a large number of non-recurring tasks. -It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-rough-throughput-estimation) as a *tasks per minute* measurement. +It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](kibana-task-manager-scaling-considerations.md#task-manager-rough-throughput-estimation) as a *tasks per minute* measurement. Predicting the buffer required to account for actions depends heavily on the rule types you use, the amount of alerts they might detect, and the number of actions you might choose to assign to action groups. With that in mind, regularly [monitor the health](../monitor/kibana-task-manager-health-monitoring.md) of your Task Manager instances. diff --git a/troubleshoot/kibana/task-manager.md b/troubleshoot/kibana/task-manager.md index b81595939a..26094f5892 100644 --- a/troubleshoot/kibana/task-manager.md +++ b/troubleshoot/kibana/task-manager.md @@ -489,7 +489,7 @@ You can infer from these stats that this {{kib}} is using most of its capacity, * The `p90` of `load` is at 100%, and `p50` is also quite high at 80%. This means that there is little to no room for maneuvering, and a spike of work might cause Task Manager to exceed its capacity. * Tasks run soon after their scheduled time, which is to be expected. A `poll_interval` of `3000` milliseconds would often experience a consistent drift of somewhere between `0` and `3000` milliseconds. A `p50 drift` of `2999` suggests that there is room for improvement, and you could benefit from a higher throughput. -For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). $$$task-manager-theory-long-running-tasks$$$ **Theory**: Tasks run for too long, overrunning their schedule @@ -665,7 +665,7 @@ Keep in mind that these stats give you a glimpse at a moment in time, and even t Predicting the required throughput a deployment might need to support Task Manager is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. -[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). Evaluating the preceding health stats in the previous example, you see the following output under `stats.workload.value`: @@ -819,7 +819,7 @@ These rough calculations give you a lower bound to the required throughput, whic Given these inferred attributes, it would be safe to assume that a single {{kib}} instance with default settings **would not** provide the required throughput. It is possible that scaling horizontally by adding a couple more {{kib}} instances will. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). ### Evaluate the Capacity Estimation [task-manager-health-evaluate-the-capacity-estimation] @@ -828,7 +828,7 @@ Task Manager is constantly evaluating its runtime operations and workload. This As the name suggests, these are estimates based on historical data and should not be used as predictions. These estimations should be evaluated alongside the detailed [Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) stats before making changes to infrastructure. These estimations assume all {{kib}} instances are configured identically. -We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). Evaluating the health stats in the previous example, you can see the following output under `stats.capacity_estimation.value`: @@ -912,7 +912,7 @@ Evaluating by these estimates, we can infer some interesting attributes of our s You can infer from these estimates that the capacity in the current system is insufficient and at least one additional {{kib}} instance is required to keep up with the workload. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage//production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). ### Inline scripts are disabled in {{es}} [task-manager-cannot-operate-when-inline-scripts-are-disabled] From 7ac5535cb2f7d9090be13ce1839899316532c28a Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 13:46:48 -0600 Subject: [PATCH 12/19] still fixing links --- .../kibana-alerting-production-considerations.md | 2 +- troubleshoot/kibana/task-manager.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md index 3d581be0ac..046f1d2974 100644 --- a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md +++ b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md @@ -37,7 +37,7 @@ For detailed guidance, see [Alerting Troubleshooting](../../explore-analyze/aler ## Scaling guidance [alerting-scaling-guidance] -As rules and actions leverage background tasks to perform the majority of work, scaling Alerting is possible by following the [Task Manager Scaling Guidance](production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). +As rules and actions leverage background tasks to perform the majority of work, scaling Alerting is possible by following the [Task Manager Scaling Guidance](kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). When estimating the required task throughput, keep the following in mind: diff --git a/troubleshoot/kibana/task-manager.md b/troubleshoot/kibana/task-manager.md index 26094f5892..91328fbaa5 100644 --- a/troubleshoot/kibana/task-manager.md +++ b/troubleshoot/kibana/task-manager.md @@ -489,7 +489,7 @@ You can infer from these stats that this {{kib}} is using most of its capacity, * The `p90` of `load` is at 100%, and `p50` is also quite high at 80%. This means that there is little to no room for maneuvering, and a spike of work might cause Task Manager to exceed its capacity. * Tasks run soon after their scheduled time, which is to be expected. A `poll_interval` of `3000` milliseconds would often experience a consistent drift of somewhere between `0` and `3000` milliseconds. A `p50 drift` of `2999` suggests that there is room for improvement, and you could benefit from a higher throughput. -For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on achieving higher throughput by adjusting your scaling strategy, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). $$$task-manager-theory-long-running-tasks$$$ **Theory**: Tasks run for too long, overrunning their schedule @@ -665,7 +665,7 @@ Keep in mind that these stats give you a glimpse at a moment in time, and even t Predicting the required throughput a deployment might need to support Task Manager is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. -[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +[Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) provides statistics that make it easier to monitor the adequacy of the existing throughput. By evaluating the workload, the required throughput can be estimated, which is used when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). Evaluating the preceding health stats in the previous example, you see the following output under `stats.workload.value`: @@ -819,7 +819,7 @@ These rough calculations give you a lower bound to the required throughput, whic Given these inferred attributes, it would be safe to assume that a single {{kib}} instance with default settings **would not** provide the required throughput. It is possible that scaling horizontally by adding a couple more {{kib}} instances will. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/.md#task-manager-scaling-guidance). ### Evaluate the Capacity Estimation [task-manager-health-evaluate-the-capacity-estimation] @@ -828,7 +828,7 @@ Task Manager is constantly evaluating its runtime operations and workload. This As the name suggests, these are estimates based on historical data and should not be used as predictions. These estimations should be evaluated alongside the detailed [Health monitoring](../../deploy-manage/monitor/kibana-task-manager-health-monitoring.md) stats before making changes to infrastructure. These estimations assume all {{kib}} instances are configured identically. -We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +We recommend using these estimations when following the Task Manager [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). Evaluating the health stats in the previous example, you can see the following output under `stats.capacity_estimation.value`: @@ -912,7 +912,7 @@ Evaluating by these estimates, we can infer some interesting attributes of our s You can infer from these estimates that the capacity in the current system is insufficient and at least one additional {{kib}} instance is required to keep up with the workload. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). ### Inline scripts are disabled in {{es}} [task-manager-cannot-operate-when-inline-scripts-are-disabled] From d307b1b32cbd3dcab862b4fb543f7bff7d198339 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:27:37 -0600 Subject: [PATCH 13/19] more --- raw-migrated-files/toc.yml | 1 - troubleshoot/kibana/task-manager.md | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 9a7b1540e2..60fb4c4808 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -114,7 +114,6 @@ toc: - file: elasticsearch/elasticsearch-reference/security-files.md - file: elasticsearch/elasticsearch-reference/security-limitations.md - file: elasticsearch/elasticsearch-reference/semantic-search-inference.md - - file: elasticsearch/elasticsearch-reference/shard-allocation-filtering.md - file: elasticsearch/elasticsearch-reference/shard-request-cache.md - file: ingest-docs/fleet/index.md - file: ingest-docs/ingest-overview/index.md diff --git a/troubleshoot/kibana/task-manager.md b/troubleshoot/kibana/task-manager.md index 91328fbaa5..c7bb15de00 100644 --- a/troubleshoot/kibana/task-manager.md +++ b/troubleshoot/kibana/task-manager.md @@ -819,7 +819,7 @@ These rough calculations give you a lower bound to the required throughput, whic Given these inferred attributes, it would be safe to assume that a single {{kib}} instance with default settings **would not** provide the required throughput. It is possible that scaling horizontally by adding a couple more {{kib}} instances will. -For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/.md#task-manager-scaling-guidance). +For details on scaling Task Manager, see [Scaling guidance](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). ### Evaluate the Capacity Estimation [task-manager-health-evaluate-the-capacity-estimation] From 2c135efc7d114b1a2ebc22cbee140659bc598ae6 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 14:01:39 -0600 Subject: [PATCH 14/19] updates --- .../clusters-nodes-shards/node-roles.md | 2 ++ .../distributed-architecture/kibana-tasks-management.md | 2 ++ .../distributed-architecture/reading-and-writing-documents.md | 3 +++ .../shard-allocation-relocation-recovery.md | 2 ++ .../delaying-allocation-when-node-leaves.md | 2 ++ 5 files changed, 11 insertions(+) diff --git a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md index febd299f4f..bba90ce66a 100644 --- a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md +++ b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/node-roles-overview.html +applies_to: + stack: --- # Node roles [node-roles-overview] diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index 148c2633ba..ca1a9b8224 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/kibana/current/task-manager-production-considerations.html +applies_to: + stack: --- # Kibana tasks management [task-manager-production-considerations] diff --git a/deploy-manage/distributed-architecture/reading-and-writing-documents.md b/deploy-manage/distributed-architecture/reading-and-writing-documents.md index 93d9593668..b3abbfa3a7 100644 --- a/deploy-manage/distributed-architecture/reading-and-writing-documents.md +++ b/deploy-manage/distributed-architecture/reading-and-writing-documents.md @@ -1,6 +1,9 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html +applies_to: + stack: + serverless: --- # Reading and writing documents [docs-replication] diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md index ef514484a5..5fa24342fb 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-relocation-recovery.html +applies_to: + stack: --- # Shard allocation, relocation, and recovery [shard-allocation-relocation-recovery] diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md index afb5f3fb92..dd07a2f942 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md @@ -1,6 +1,8 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html +applies_to: + stack: --- # Delaying allocation when a node leaves [delayed-allocation] From 2d2e56883df7cc74ee1b3b606f4660c6291d708f Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 15:42:02 -0600 Subject: [PATCH 15/19] Apply suggestions from code review Co-authored-by: Lisa Cawley --- deploy-manage/distributed-architecture.md | 2 +- .../discovery-cluster-formation.md | 2 +- .../index-level-shard-allocation.md | 12 ++++++------ 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/deploy-manage/distributed-architecture.md b/deploy-manage/distributed-architecture.md index 29c3b472c4..e96e95dffc 100644 --- a/deploy-manage/distributed-architecture.md +++ b/deploy-manage/distributed-architecture.md @@ -22,5 +22,5 @@ The topics in this section provides information about the architecture of {{es}} * [Shard allocation, relocation, and recovery](distributed-architecture/shard-allocation-relocation-recovery.md): Learn how {{es}} allocates and balances shards across nodes. * [Shard allocation awareness](distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md): Learn how to use custom node attributes to distribute shards across different racks or availability zones. * [Disocvery and cluster formation](distributed-architecture/discovery-cluster-formation.md): Learn about the cluster formation process including voting, adding nodes and publishing the cluster state. -* [Shard request cache](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. +* [Shard request cache](elasticsearch://reference/elasticsearch/configuration-reference/shard-request-cache-settings.md): Learn how {{es}} caches search requests to improve performance. * [Kibana task management](distributed-architecture/kibana-tasks-management.md): Learn how {{kib}} runs background tasks and distribute work across multiple {{kib}} instances to be persistent and scale with your deployment. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/discovery-cluster-formation.md b/deploy-manage/distributed-architecture/discovery-cluster-formation.md index ce3b118ae5..74683be2cd 100644 --- a/deploy-manage/distributed-architecture/discovery-cluster-formation.md +++ b/deploy-manage/distributed-architecture/discovery-cluster-formation.md @@ -51,4 +51,4 @@ The following processes and settings are part of discovery and cluster formation [Cluster fault detection](discovery-cluster-formation/cluster-fault-detection.md): {{es}} performs health checks to detect and remove faulty nodes. -[Settings](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md): There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes. \ No newline at end of file +[Settings](elasticsearch://reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md): There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes. \ No newline at end of file diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index e4f0cc9873..4969d06cf7 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -9,18 +9,18 @@ applies_to: # Index-level shard allocation -In Elasticsearch, per-index settings allow you to control the allocation of shards to nodes through index-level shard allocation settings. These settings enable you to specify preferences or constraints for where shards of a particular index should reside. This includes allocating shards to nodes with specific attributes or avoiding certain nodes. This level of control helps optimize resource utilization, balance load, and ensure data redundancy and availability according to your deployment's specific requirements. In addition to the content in this article, there are additional resources: +In Elasticsearch, per-index settings allow you to control the allocation of shards to nodes through index-level shard allocation settings. These settings enable you to specify preferences or constraints for where shards of a particular index should reside. This includes allocating shards to nodes with specific attributes or avoiding certain nodes. This level of control helps optimize resource utilization, balance load, and ensure data redundancy and availability according to your deployment's specific requirements. For additional details, check out: * [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. * [Delayed allocation](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md): Delaying allocation of unassigned shards caused by a node leaving. -* [Total shards per node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/total-shards-per-node.md): A hard limit on the number of shards from the same index per node. +* [Total shards per node](elasticsearch://reference/elasticsearch/index-settings/total-shards-per-node.md): A hard limit on the number of shards from the same index per node. * [Data tier allocation](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md): Controls the allocation of indices to [data tiers](../../../manage-data/lifecycle/data-tiers.md). ## Index-level shard allocation filtering [shard-allocation-filtering] -You can use shard allocation filters to control where {{es}} allocates shards of a particular index. These per-index filters are applied in conjunction with [cluster-wide allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-shard-allocation-filtering) and [allocation awareness](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md). +You can use shard allocation filters to control where {{es}} allocates shards of a particular index. These per-index filters are applied in conjunction with [cluster-wide allocation filtering](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#cluster-shard-allocation-filtering) and [allocation awareness](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md). -Shard allocation filters can be based on [custom node attributes](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md#custom-node-attributes) or the built-in `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` attributes. [Index lifecycle management](../../../manage-data/lifecycle/index-lifecycle-management.md) uses filters based on custom node attributes to determine how to reallocate shards when moving between phases. +Shard allocation filters can be based on [custom node attributes](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md#custom-node-attributes) or the built-in `_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference` attributes. [Index lifecycle management](../../../manage-data/lifecycle/index-lifecycle-management.md) uses filters based on custom node attributes to determine how to reallocate shards when moving between phases. The `cluster.routing.allocation` settings are dynamic, enabling existing indices to be moved immediately from one set of nodes to another. Shards are only relocated if it is possible to do so without breaking another routing constraint, such as never allocating a primary and replica shard on the same node. @@ -85,10 +85,10 @@ The index allocation settings support the following built-in attributes: |`_ip`| Match either `_host_ip` or `_publish_ip` | | `_host`| Match nodes by hostname | |`_id`| Match nodes by node id | -|`_tier`| Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md) | +|`_tier`| Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation-settings.md) | ::::{note} -`_tier` filtering is based on [node](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. +`_tier` filtering is based on [node](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. :::: You can use wildcards when specifying attribute values, for example: From ee1c5c0241069e0086c7532c04de3a65c1f7f0c5 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:01:58 -0600 Subject: [PATCH 16/19] updates based on feedback --- .../clusters-nodes-shards/node-roles.md | 1 + .../distributed-architecture/reading-and-writing-documents.md | 2 +- .../shard-allocation-awareness.md | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md index bba90ce66a..e6b989f724 100644 --- a/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md +++ b/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md @@ -3,6 +3,7 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/node-roles-overview.html applies_to: stack: + self: --- # Node roles [node-roles-overview] diff --git a/deploy-manage/distributed-architecture/reading-and-writing-documents.md b/deploy-manage/distributed-architecture/reading-and-writing-documents.md index b3abbfa3a7..1ae4c0fa64 100644 --- a/deploy-manage/distributed-architecture/reading-and-writing-documents.md +++ b/deploy-manage/distributed-architecture/reading-and-writing-documents.md @@ -96,6 +96,6 @@ A single shard can slow down indexing Dirty reads : An isolated primary can expose writes that will not be acknowledged. This is caused by the fact that an isolated primary will only realize that it is isolated once it sends requests to its replicas or when reaching out to the master. At that point the operation is already indexed into the primary and can be read by a concurrent read. Elasticsearch mitigates this risk by pinging the master every second (by default) and rejecting indexing operations if no master is known. -## The Tip of the Iceberg [_the_tip_of_the_iceberg] +## The tip of the iceberg [_the_tip_of_the_iceberg] This document provides a high level overview of how Elasticsearch deals with data. Of course, there is much more going on under the hood. Things like primary terms, cluster state publishing, and master election all play a role in keeping this system behaving correctly. This document also doesn’t cover known and important bugs (both closed and open). We recognize that [GitHub is hard to keep up with](https://github.com/elastic/elasticsearch/issues?q=label%3Aresiliency). To help people stay on top of those, we maintain a dedicated [resiliency page](https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html) on our website. We strongly advise reading it. diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md index 9866160319..955cd3f8c6 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/shard-allocation-awareness.md @@ -3,6 +3,7 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-awareness.html applies_to: stack: + self: --- # Shard allocation awareness [shard-allocation-awareness] From b825f15680a35c6dc4b3b2751d6f86026ceb80d1 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:07:41 -0600 Subject: [PATCH 17/19] Apply suggestions from code review Co-authored-by: Lisa Cawley --- .../index-level-shard-allocation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index 4969d06cf7..cc833971c6 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -14,7 +14,7 @@ In Elasticsearch, per-index settings allow you to control the allocation of shar * [Shard allocation filtering](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md): Controlling which shards are allocated to which nodes. * [Delayed allocation](../../../deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/delaying-allocation-when-node-leaves.md): Delaying allocation of unassigned shards caused by a node leaving. * [Total shards per node](elasticsearch://reference/elasticsearch/index-settings/total-shards-per-node.md): A hard limit on the number of shards from the same index per node. -* [Data tier allocation](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-settings/data-tier-allocation-settings.md): Controls the allocation of indices to [data tiers](../../../manage-data/lifecycle/data-tiers.md). +* [Data tier allocation](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation.md): Controls the allocation of indices to [data tiers](../../../manage-data/lifecycle/data-tiers.md). ## Index-level shard allocation filtering [shard-allocation-filtering] @@ -85,7 +85,7 @@ The index allocation settings support the following built-in attributes: |`_ip`| Match either `_host_ip` or `_publish_ip` | | `_host`| Match nodes by hostname | |`_id`| Match nodes by node id | -|`_tier`| Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation-settings.md) | +|`_tier`| Match nodes by the node’s [data tier](../../../manage-data/lifecycle/data-tiers.md) role. For more details see [data tier allocation filtering](elasticsearch://reference/elasticsearch/index-settings/data-tier-allocation.md) | ::::{note} `_tier` filtering is based on [node](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md) roles. Only a subset of roles are [data tier](../../../manage-data/lifecycle/data-tiers.md) roles, and the generic [data role](../../../deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#data-node-role) will match any tier filtering. From d51512f2d5bd3f3f32f8d4da7cffc96b7947cb27 Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:07:55 -0600 Subject: [PATCH 18/19] fix for self --- .../index-level-shard-allocation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md index cc833971c6..e2ca138809 100644 --- a/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md +++ b/deploy-manage/distributed-architecture/shard-allocation-relocation-recovery/index-level-shard-allocation.md @@ -5,6 +5,7 @@ mapped_urls: - https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery-prioritization.html applies_to: stack: + self: --- # Index-level shard allocation From 0bfcfb90dbeadf054e86351019e78eee3d2b9e1a Mon Sep 17 00:00:00 2001 From: George Wallace Date: Tue, 18 Mar 2025 16:32:58 -0600 Subject: [PATCH 19/19] fixing broken links --- raw-migrated-files/toc.yml | 2 -- 1 file changed, 2 deletions(-) diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 60fb4c4808..e02dd8d1c6 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -104,10 +104,8 @@ toc: children: - file: elasticsearch/elasticsearch-reference/documents-indices.md - file: elasticsearch/elasticsearch-reference/esql-using.md - - file: elasticsearch/elasticsearch-reference/index-modules-allocation.md - file: elasticsearch/elasticsearch-reference/index-modules-mapper.md - file: elasticsearch/elasticsearch-reference/ip-filtering.md - - file: elasticsearch/elasticsearch-reference/recovery-prioritization.md - file: elasticsearch/elasticsearch-reference/scalability.md - file: elasticsearch/elasticsearch-reference/search-with-synonyms.md - file: elasticsearch/elasticsearch-reference/secure-cluster.md