Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle upgrades where multiple Kibana nodes and/or versions are running #84389

Closed
geekpete opened this issue Nov 26, 2020 · 2 comments
Labels
enhancement New value added to drive a business result

Comments

@geekpete
Copy link
Member

Describe the feature:

Currently the Kibana upgrade docs advise:

Shut down all Kibana instances. Running more than one Kibana version against the same Elasticseach index is unsupported. Upgrading while older Kibana instances are running can cause data loss or upgrade failures.

Kibana could use Optimistic Concurrency Control already available in Elasticsearch to determine the best node to perform the upgrade out of the currently online Kibana nodes and do this gracefully with fail safe verification before commencing the upgrade.

If there's an old version node and a new version node running, writes to a document in a Kibana controlled system index could allow the nodes to identify themselves and agree on the newer version node being in control of the upgrade.
If there's two or more nodes of the same version, the same mechanism should identify one of the nodes as the node to be in charge of the upgrade and the others should avoid any upgrade action once the control document can be used to confirm they are not the upgrading nodes. Unless the node is confirmed in the document and the other nodes agree to stand down, then an upgrade should not occur.

For Cloud/ECE, having this graceful upgrade functionality in Kibana would also avoid needing additional logic in that higher level orchestration to avoid upgrade collision between Kibana nodes.

As a secondary objective, non-controlling Kibana nodes could remain up and running to provide some level of service until the upgrade is complete, even if that might be limited in some ways vs having to do a full outage in order to upgrade.

Describe a specific use case for the feature:

Graceful upgrades for Kibana where the number and versions of running Kibana nodes doesn't need to be manually controlled and Kibana upgrade can be performed as rolling upgrade without risk of a race condition to the upgrade of Kibana system indices.

@geekpete geekpete added the enhancement New value added to drive a business result label Nov 26, 2020
@geekpete
Copy link
Member Author

#81536

@rudolf
Copy link
Contributor

rudolf commented Dec 14, 2020

Closing in favour of #66056

From #52202

Note: Rolling upgrades introduce significant complexity for plugins and risk of bugs. We assume that as long as the downtime window is predictable, downtime as such is not a problem for our users. Since this allows us to have a dramatically simpler system we won't aim to implement rolling upgrades unless this assumption is proven wrong.

However, with the algorithm in #66056 it would be possible to implement read-only functionality from outdated nodes during an upgrade to limit the impact (though it's not a goal we're currently working towards).

@rudolf rudolf closed this as completed Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result
Projects
None yet
Development

No branches or pull requests

2 participants