From b1eb5c4018fa50d5dba29fc1e281b6f6e80d4084 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:00:14 +0100 Subject: [PATCH 01/11] Update unavailable nodes --- troubleshoot/monitoring/unavailable-nodes.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/troubleshoot/monitoring/unavailable-nodes.md b/troubleshoot/monitoring/unavailable-nodes.md index 22c94598be..5eea3424a0 100644 --- a/troubleshoot/monitoring/unavailable-nodes.md +++ b/troubleshoot/monitoring/unavailable-nodes.md @@ -36,9 +36,12 @@ This section provides a list of common symptoms and possible actions that you ca Some actions described here, such as stopping indexing or Machine Learning jobs, are temporary remediations intended to get your cluster into a state where you can make configuration changes to resolve the issue. :::: - For production deployments, we recommend setting up a dedicated monitoring cluster to collect metrics and logs, troubleshooting views, and cluster alerts. +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](../../../docs-content/deploy-manage/monitor/autoops.md). +::: + If your issue is not addressed here, then [contact Elastic support for help](/troubleshoot/index.md). ## Full disk on single-node deployment [ec-single-node-deployment-disk-used] From 999c5b6639be300d7493e0da9efef80e8aa6efad Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:11:51 +0100 Subject: [PATCH 02/11] Fix broken link --- troubleshoot/monitoring/unavailable-nodes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/troubleshoot/monitoring/unavailable-nodes.md b/troubleshoot/monitoring/unavailable-nodes.md index 5eea3424a0..e94b3c4161 100644 --- a/troubleshoot/monitoring/unavailable-nodes.md +++ b/troubleshoot/monitoring/unavailable-nodes.md @@ -39,7 +39,7 @@ Some actions described here, such as stopping indexing or Machine Learning jobs, For production deployments, we recommend setting up a dedicated monitoring cluster to collect metrics and logs, troubleshooting views, and cluster alerts. :::{important} - If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](../../../docs-content/deploy-manage/monitor/autoops.md). + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). ::: If your issue is not addressed here, then [contact Elastic support for help](/troubleshoot/index.md). From c441a179f3980f5ea2c81d10aad35c88ea38d930 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:17:03 +0100 Subject: [PATCH 03/11] Update unavailable shards --- troubleshoot/monitoring/unavailable-shards.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/troubleshoot/monitoring/unavailable-shards.md b/troubleshoot/monitoring/unavailable-shards.md index 8b44530c88..3730767e0d 100644 --- a/troubleshoot/monitoring/unavailable-shards.md +++ b/troubleshoot/monitoring/unavailable-shards.md @@ -1,7 +1,7 @@ --- navigation_title: "Unavailable shards" mapped_urls: - - https://www.elastic.co/guide/en/cloud/current/ec-scenario_why_are_shards_unavailable.html + - - https://www.elastic.co/guide/en/cloud-heroku/current/echscenario_why_are_shards_unavailable.html - https://www.elastic.co/guide/en/cloud-heroku/current/ech-analyze_shards_with-api.html - https://www.elastic.co/guide/en/cloud-heroku/current/ech-analyze_shards_with-kibana.html @@ -32,6 +32,10 @@ If a cluster has unassigned shards, you might see an error message such as this :alt: Unhealthy deployment error message ::: +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: + If your issue is not addressed here, then [contact Elastic support for help](/troubleshoot/index.md). ## Analyze unassigned shards using the {{es}} API [ec-analyze_shards_with-api] From 1c2b4ac9e1aa5dcbf8ed34e384d4b135c33118b2 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:19:50 +0100 Subject: [PATCH 04/11] Update performance degrading over time --- troubleshoot/monitoring/performance.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/troubleshoot/monitoring/performance.md b/troubleshoot/monitoring/performance.md index 6bb842f80a..779d858005 100644 --- a/troubleshoot/monitoring/performance.md +++ b/troubleshoot/monitoring/performance.md @@ -18,3 +18,7 @@ When you look in the **Cluster Performance Metrics** section of the [{{ecloud}} Between just after 00:10 and 00:20, excessively high CPU usage consumes all CPU credits until no more credits are available. CPU credits enable boosting the assigned CPU resources temporarily to improve performance on smaller clusters up to and including 8 GB of RAM when it is needed most, but CPU credits are by their nature limited. You accumulate CPU credits when you use less than your assigned share of CPU resources, and you consume credits when you use more CPU resources than assigned. As you max out your CPU resources, CPU credits permit your cluster to consume more than 100% of the assigned resources temporarily, which explains why CPU usage exceeds 100%, with usage peaks that reach well over 400% for one node. As CPU credits are depleted, CPU usage gradually drops until it returns to 100% at 00:30 when no more CPU credits are available. You can also notice that after 00:30 credits gradually begin to accumulate again. If you need your cluster to be able to sustain a certain level of performance, you cannot rely on CPU boosting to handle the workload except temporarily. To ensure that performance can be sustained, consider increasing the size of your cluster. + +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: From 320b88b484fab5a6c09f041f8442307e432ede2c Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:20:57 +0100 Subject: [PATCH 05/11] Update cluster really highly available --- troubleshoot/monitoring/high-availability.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/troubleshoot/monitoring/high-availability.md b/troubleshoot/monitoring/high-availability.md index 06c4efbc62..463b6306df 100644 --- a/troubleshoot/monitoring/high-availability.md +++ b/troubleshoot/monitoring/high-availability.md @@ -22,3 +22,7 @@ Cluster performance metrics are shown per node and are color-coded to indicate w This CPU usage graph indicates that your cluster is load-balancing between the nodes in the different availability zones as designed, but the workload is too high to be able to handle the loss of an availability zone. For a cluster to be able to handle the failure of a node, it should be considered at capacity when it uses 50% of its resources. In this case, two of the nodes are already maxed out and the third one is around 50%. If any one of the three nodes were to fail, the volume of user requests would overwhelm the remaining nodes. On smaller clusters up to and including 8 GB of RAM, CPU boosting can temporarily relieve some of the pressure, but you should not rely on this feature for high availability. On larger clusters, CPU boosting is not available. Even if your cluster is performing well, you still need to make sure that there is sufficient spare capacity to deal with the outage of an entire availability zone. For this cluster to remain highly available at all times, you either need to increase its size or reduce its workload. + +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: \ No newline at end of file From ee4932e12c424233acf69eee719d98253b1e59e4 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:22:08 +0100 Subject: [PATCH 06/11] Update memory pressure --- troubleshoot/monitoring/high-memory-pressure.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/troubleshoot/monitoring/high-memory-pressure.md b/troubleshoot/monitoring/high-memory-pressure.md index 7c04dca657..10b832e6bf 100644 --- a/troubleshoot/monitoring/high-memory-pressure.md +++ b/troubleshoot/monitoring/high-memory-pressure.md @@ -29,6 +29,9 @@ In our example, the **Index Response Times** metric shows that high memory press If the performance impact from high memory pressure is not acceptable, you need to increase the cluster size or reduce the workload. +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: ## Increase the deployment size [ec_increase_the_deployment_size] From 47abe557bee69eef4e81752a03faca2f598776c5 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:23:03 +0100 Subject: [PATCH 07/11] Update cluster response time --- troubleshoot/monitoring/cluster-response-time.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/troubleshoot/monitoring/cluster-response-time.md b/troubleshoot/monitoring/cluster-response-time.md index c442bb2ddc..a0ed0fdb56 100644 --- a/troubleshoot/monitoring/cluster-response-time.md +++ b/troubleshoot/monitoring/cluster-response-time.md @@ -19,4 +19,8 @@ Memory pressure is not the culprit. The **Memory Pressure per Node** metric is a So what caused the sudden increase in response times? The key to the puzzle lies in the **Number of Requests** metric, which indicates the number of requests that a cluster receives per second. Beginning shortly before 13:32, there was a substantial increase in the number of user requests per second. The number of requests per second continued to rise until the requests began to plateau as your cluster reached its maximum throughput, which in turn caused response times to rise. The number of requests remained at a high level for approximately five minutes, until they started to drop off again around 13:40. Overall, the sustained increase of user requests lasted a bit over 10 minutes, consistent with the slowdown you observed. -This cluster was sized to handle a certain number of user requests. As the user requests exceeded the maximum throughput that a cluster of this size could sustain, response times increased. To avoid such a slowdown, you either need to control the volume of user requests that reaches the {{es}} cluster or you need to size your cluster to be able to accommodate a sudden increase in user requests. \ No newline at end of file +This cluster was sized to handle a certain number of user requests. As the user requests exceeded the maximum throughput that a cluster of this size could sustain, response times increased. To avoid such a slowdown, you either need to control the volume of user requests that reaches the {{es}} cluster or you need to size your cluster to be able to accommodate a sudden increase in user requests. + +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: \ No newline at end of file From b2aafc67cdc8724cff9b1d2df07de09f26f5e326 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:24:04 +0100 Subject: [PATCH 08/11] Update deployment health warnings --- troubleshoot/monitoring/deployment-health-warnings.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/troubleshoot/monitoring/deployment-health-warnings.md b/troubleshoot/monitoring/deployment-health-warnings.md index 8c6577be39..69315d135e 100644 --- a/troubleshoot/monitoring/deployment-health-warnings.md +++ b/troubleshoot/monitoring/deployment-health-warnings.md @@ -27,3 +27,7 @@ If multiple health warnings appear for one of your deployments, or if your deplo **Warning about system changes** If the warning refers to a system change, check the deployment’s [Activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) page. + +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: \ No newline at end of file From b091044b7aa2f60ed924dd0c13904fac3a780b7d Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:24:59 +0100 Subject: [PATCH 09/11] Update node bootlooping --- troubleshoot/monitoring/node-bootlooping.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/troubleshoot/monitoring/node-bootlooping.md b/troubleshoot/monitoring/node-bootlooping.md index 8cd9ea897b..5d398aa95a 100644 --- a/troubleshoot/monitoring/node-bootlooping.md +++ b/troubleshoot/monitoring/node-bootlooping.md @@ -38,6 +38,9 @@ Following are some frequent causes of a failed configuration change: If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful {{es}} configuration by performing a [no-op plan](/troubleshoot/monitoring/deployment-health-warnings.md). For an example, see this [video walkthrough](https://www.youtube.com/watch?v=8MnXZ9egBbQ). +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: ## Secure settings [ec-config-change-errors-secure-settings] From a868224bc59ab9aeb478ba263d264901b91460fd Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:29:41 +0100 Subject: [PATCH 10/11] Update Access performance metrics --- raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md b/raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md index dd1b72703f..b11d83c981 100644 --- a/raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md +++ b/raw-migrated-files/cloud/cloud/ec-saas-metrics-accessing.md @@ -4,6 +4,10 @@ Cluster performance metrics are available directly in the [{{ecloud}} Console](h For advanced views or production monitoring, [enable logging and monitoring](../../../deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md). The monitoring application provides more advanced views for Elasticsearch and JVM metrics, and includes a configurable retention period. +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: + To access cluster performance metrics: 1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body). From 46af2308cbc6217eb685937b7ce465b3654ed337 Mon Sep 17 00:00:00 2001 From: Arianna Laudazzi Date: Mon, 3 Mar 2025 13:52:10 +0100 Subject: [PATCH 11/11] Update monitoring setup --- .../stack-monitoring/elastic-cloud-stack-monitoring.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md b/deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md index 6ad9cfe399..d0d011d708 100644 --- a/deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md +++ b/deploy-manage/monitor/stack-monitoring/elastic-cloud-stack-monitoring.md @@ -60,6 +60,13 @@ $$$ech-logging-and-monitoring-production$$$ $$$ech-logging-and-monitoring-retention$$$ +% Please leave the AutoOps banner in the final content of this page + +:::{important} + If you’re using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to [Monitor with AutoOps](/deploy-manage/monitor/autoops.md). +::: + + **This page is a work in progress.** The documentation team is working to combine content pulled from the following pages: * [/raw-migrated-files/cloud/cloud-heroku/ech-monitoring.md](/raw-migrated-files/cloud/cloud-heroku/ech-monitoring.md)