Skip to content

Troubleshooting Elasticsearch Upgrades#6396

Draft
stefnestor wants to merge 2 commits into
elastic:mainfrom
stefnestor:stef_esUpgradeIssues
Draft

Troubleshooting Elasticsearch Upgrades#6396
stefnestor wants to merge 2 commits into
elastic:mainfrom
stefnestor:stef_esUpgradeIssues

Conversation

@stefnestor
Copy link
Copy Markdown
Member

Summary

Follow-up of #5848 to create separate troubleshooting page.

Generative AI disclosure

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No

@stefnestor stefnestor requested review from a team as code owners May 9, 2026 15:03
@stefnestor stefnestor marked this pull request as draft May 9, 2026 15:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Elastic Docs AI PR menu

Check the box to run an AI review for this pull request.

Powered by GitHub Agentic Workflows and docs-actions. For more information, reach out to the docs team.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs review summary

Focus areas

  • Style and clarity: Typos found in two files (earlier ersion in elasticsearch.md, due dilligence in the new file); grammar issue in discovery-troubleshooting.md. "Kindly" should be dropped per style guide.
  • Jargon: No unexplained jargon introduced.
  • Frontmatter and applies_to: applies_to: stack: is missing a lifecycle value (e.g., ga) in the new file — this will likely fail validation or render incorrectly.
  • Content type fit: The new page is declared type: troubleshooting, which is appropriate. However, the required Symptoms and Resolution sections are entirely unfilled template placeholders. The Resolution section still contains the literal template stepper block. The page is linked from the upgrade guide, meaning users who follow that tip will land on an incomplete page.
  • Parent issue satisfaction: This PR is a follow-up to #5848. The page structure is in place but the core content (Symptoms, Resolution) is not yet written, so the issue is partially satisfied.

Notes

  • The empty ## heading on line 18 of troubleshooting-upgrades.md will likely cause a build or rendering failure — this is the highest-priority fix.
  • Vale found no findings (eligible-files list was empty in the pre-fetch), so no Vale-sourced nits to report.

Generated by Docs review agent for issue #6396 · ● 325.5K

description: "Common upgrade issues and resolutions."
type: troubleshooting
applies_to:
stack:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applies_to: stack: is missing a lifecycle value. Refer to the cumulative-docs reference for valid values (e.g., ga, beta, coming). Without a value this will likely render incorrectly or fail validation.


# Troubleshoot upgrades [troubleshooting-upgrades]

Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due dilligence.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: due dilligencedue diligence.

Suggested change
Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due dilligence.
Usually, [{{es}} upgrades](/deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md) proceed smoothly due to [planning](/deploy-manage/upgrade/plan-upgrade.md) and [preparation](/deploy-manage/upgrade/prepare-to-upgrade.md) due diligence.

* _(Recommended)_ **A rolling restart**

This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the old-version. Running more than two versions of {{es}} in the same cluster is not supported.
This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier ersion. Running more than two versions of {{es}} in the same cluster is not supported.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: earlier ersion should be earlier version.

Suggested change
This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier ersion. Running more than two versions of {{es}} in the same cluster is not supported.
This option allows you to upgrade your cluster one node at a time without interrupting service. Running multiple versions of {{es}} in the same cluster beyond the duration of an upgrade is not supported, as shards cannot be replicated from upgraded nodes to nodes running the earlier version. Running more than two versions of {{es}} in the same cluster is not supported.

Master elections only involve master-eligible nodes, so focus your attention on the master-eligible nodes in this situation. These nodes' logs indicate the requirements for a master election, such as the discovery of a certain set of nodes. The [Health]({{es-apis}}operation/operation-health-report) API on these nodes also provides useful information about the situation.
If there is no elected master node and no node can win an election, all nodes repeatedly log messages about the problem using a [logger](/deploy-manage/monitor/logging-configuration.md) called `org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper`. By default, this happens every 10 seconds.

During this time the {{es}} will induce `MasterNotDiscoveredException` errors and which its API will report like:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: the {{es}} will induce ... errors and which its API will report like: has two problems — the article "the" before {{es}} and the spurious "and which".

Suggested rewrite:

Suggested change
During this time the {{es}} will induce `MasterNotDiscoveredException` errors and which its API will report like:
During this time, {{es}} returns `MasterNotDiscoveredException` errors. Its API reports:

This guide outlines {{es}} logs which indicate either upgrade blocking issues or fatal node start-up errors.


##
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This H2 heading is empty (## with no title). It should either be given a title (e.g., ## Monitor upgrade progress) or removed. An untitled heading will also likely fail docs build validation.

{{es}} maintains the data in the data paths of the older nodes and will recover the cluster to health using this data after the nodes are fully upgraded. Therefore, to bring these nodes back into the cluster, upgrade them.

:::{note} :applies_to: { ece:, ess: }
Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, kindly [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "kindly" — the Elastic style guide treats it the same as "please", which should be omitted unless asking users to wait or tolerate inconvenience.

Suggested change
Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, kindly [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment.
Usually you can "Reapply" your latest [Deployment activity](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) {{es}} upgrade to finish upgrading. If the node out of cluster causes [Cluster health](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-health) status of `red`, then plans will be blocked for data safety. If this is the case, [contact us](/troubleshoot/index.md#contact-us) with {{ech}} deployment ID or [{{ece}} diagnostic](/troubleshoot/deployments/cloud-enterprise/run-ece-diagnostics-tool.md) flagged `--deployments` for problematic deployment.

Avoid linking to GitHub issues, pull requests, or internal discussions. Resources should be stable, user-facing documentation.
-->

- [Related documentation link]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder links should be filled in with real targets or removed before publishing:

  • [Related documentation link]
  • [Contrib/upstream reference]

In a testing or development environment with only one or two master-eligible nodes, you cannot avoid stopping half or more of the master-eligible nodes, so the cluster will always become unavailable at some point during the upgrade. When you restart the master-eligible nodes after this unavailability, the cluster will re-form with a single upgraded node, which is therefore fully-upgraded and will reject older nodes' attempts to re-join the cluster. Upgrade the master-eligible nodes last to avoid these rejections.


## Symptoms
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The required Symptoms and Resolution sections (and the optional Diagnosis, Best practices, Resources sections) contain only template placeholder comments. The resolution section still has the literal stepper code block from the template. These need to be filled in before the page goes live — the page is currently non-functional for users who land on it from the link added in deploy-manage/upgrade/deployment-or-cluster/elasticsearch.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant