Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail node containing ancient closed index #44264

Conversation

DaveCTurner
Copy link
Contributor

Today we fail the node at startup if it contains an index that is too old to be
compatible with the current version, unless that index is closed. If the index
is closed then the node will start up and this puts us into a bad state: the
index cannot be opened and must be reindexed using an earlier version, but we
offer no way to get that index into a node running an earlier version so that
it can be reindexed. Downgrading the node in-place is decidedly unsupported and
cannot be expected to work since the node already started up and upgraded the
rest of its metadata. Since #41731 we actively reject downgrades to versions ≥
v7.2.0 too.

This commit prevents the node from starting in the presence of any too-old
indices (closed or not). In particular, it does not write any upgraded metadata
in this situation, increasing the chances an in-place downgrade might be
successful. We still actively reject the downgrade using #41731, because we
wrote the node metadata file before checking the index metadata, but at least
there is a way to override this check.

Relates #21830, #44230

Today we fail the node at startup if it contains an index that is too old to be
compatible with the current version, unless that index is closed. If the index
is closed then the node will start up and this puts us into a bad state: the
index cannot be opened and must be reindexed using an earlier version, but we
offer no way to get that index into a node running an earlier version so that
it can be reindexed. Downgrading the node in-place is decidedly unsupported and
cannot be expected to work since the node already started up and upgraded the
rest of its metadata. Since elastic#41731 we actively reject downgrades to versions ≥
v7.2.0 too.

This commit prevents the node from starting in the presence of any too-old
indices (closed or not). In particular, it does not write any upgraded metadata
in this situation, increasing the chances an in-place downgrade might be
successful. We still actively reject the downgrade using elastic#41731, because we
wrote the node metadata file before checking the index metadata, but at least
there is a way to override this check.

Relates elastic#21830, elastic#44230
@DaveCTurner DaveCTurner added >bug :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. v8.0.0 v6.8.2 v7.4.0 labels Jul 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@DaveCTurner
Copy link
Contributor Author

@henningandersen observes that this could be quite a bad breaking change for people who are unknowingly running with these ancient closed indices already, so I think we can't backport this.

@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/default-distro

@ywelsch
Copy link
Contributor

ywelsch commented Jul 12, 2019

@henningandersen observes that this could be quite a bad breaking change for people who are unknowingly running with these ancient closed indices already, so I think we can't backport this.

Perhaps we could offer an extension to the elasticsearch-node tool that would allow detaching an index. This could also help in situations where the metadata for a given index has become corrupted and there are no other copies for this index metadata.

@DaveCTurner
Copy link
Contributor Author

We check all the indices' versions at joining time, so there's no way for another node to join the master of a cluster containing such an old index. It can't be imported as a dangling index, so the only way to get a multi-node cluster containing such an ancient closed index is to do a full cluster restart with something like recover_after_nodes set to the full size of the cluster, because in that case the state used in join validation hasn't yet been recovered. If any node subsequently drops out of the cluster it'll never rejoin. Rolling upgrades are impossible in this state. In practice I think this means if anyone is actually broken by this change then they must only be running a one-node cluster, and there we have a reasonable path forward: go back to the earlier version (perhaps overriding the version check), deal with the broken index, and then attempt the upgrade again.

I am +1 to extending elasticsearch-node to help deal with broken metadata (see also #37286) but that's a bigger project.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks @DaveCTurner , left one minor comment only.

public IndexMetaData newIndexMeta(String name, Settings indexSettings) {
final Version createdVersion = VersionUtils.randomVersionBetween(random(),
Version.CURRENT.minimumIndexCompatibilityVersion(), VersionUtils.getPreviousVersion());
final Version upgradedVersion = VersionUtils.randomVersionBetween(random(), createdVersion, VersionUtils.getPreviousVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, upgradedVersion was before created version. It seems it is now always equal or after. But the code does work for before too I think so maybe this should be a random version between minimum compatibility version and current?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I pushed 4c0ee9a. It doesn't really make sense (or at least represents an unsupported situation) because the index metadata can't have been upgraded in a version prior to the version in which the index was created, at least not without doing a downgrade. But nothing in these tests cares about that, so it's no big deal.


private static Version randomEarlierCompatibleVersion() {
return VersionUtils.randomVersionBetween(random(),
Version.CURRENT.minimumIndexCompatibilityVersion(), VersionUtils.getPreviousVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VersionUtils.getPreviousVersion() uses only released versions I think. At least that stopped me a bit from adding a method like this in https://github.com/elastic/elasticsearch/pull/44235/files#diff-de7450af97775c27cab88faea6a38869R482

Copy link
Contributor Author

@DaveCTurner DaveCTurner Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

So it does. That means we won't be able to remove the 7.x version constants in master until 8.0 is released. I bet that bites us one day.

Addressed here in 4ac6f7f.

@DaveCTurner DaveCTurner merged commit 0df12a8 into elastic:master Jul 15, 2019
@DaveCTurner DaveCTurner deleted the 2019-07-12-reject-unsupported-closed-indices branch July 15, 2019 14:19
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >bug :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants