Fail node containing ancient closed index #44264

DaveCTurner · 2019-07-12T08:56:57Z

Today we fail the node at startup if it contains an index that is too old to be
compatible with the current version, unless that index is closed. If the index
is closed then the node will start up and this puts us into a bad state: the
index cannot be opened and must be reindexed using an earlier version, but we
offer no way to get that index into a node running an earlier version so that
it can be reindexed. Downgrading the node in-place is decidedly unsupported and
cannot be expected to work since the node already started up and upgraded the
rest of its metadata. Since #41731 we actively reject downgrades to versions ≥
v7.2.0 too.

This commit prevents the node from starting in the presence of any too-old
indices (closed or not). In particular, it does not write any upgraded metadata
in this situation, increasing the chances an in-place downgrade might be
successful. We still actively reject the downgrade using #41731, because we
wrote the node metadata file before checking the index metadata, but at least
there is a way to override this check.

Relates #21830, #44230

Today we fail the node at startup if it contains an index that is too old to be compatible with the current version, unless that index is closed. If the index is closed then the node will start up and this puts us into a bad state: the index cannot be opened and must be reindexed using an earlier version, but we offer no way to get that index into a node running an earlier version so that it can be reindexed. Downgrading the node in-place is decidedly unsupported and cannot be expected to work since the node already started up and upgraded the rest of its metadata. Since elastic#41731 we actively reject downgrades to versions ≥ v7.2.0 too. This commit prevents the node from starting in the presence of any too-old indices (closed or not). In particular, it does not write any upgraded metadata in this situation, increasing the chances an in-place downgrade might be successful. We still actively reject the downgrade using elastic#41731, because we wrote the node metadata file before checking the index metadata, but at least there is a way to override this check. Relates elastic#21830, elastic#44230

elasticmachine · 2019-07-12T08:56:59Z

Pinging @elastic/es-core-infra

DaveCTurner · 2019-07-12T09:18:57Z

@henningandersen observes that this could be quite a bad breaking change for people who are unknowingly running with these ancient closed indices already, so I think we can't backport this.

DaveCTurner · 2019-07-12T09:26:44Z

@elasticmachine please run elasticsearch-ci/default-distro

ywelsch · 2019-07-12T09:26:49Z

@henningandersen observes that this could be quite a bad breaking change for people who are unknowingly running with these ancient closed indices already, so I think we can't backport this.

Perhaps we could offer an extension to the elasticsearch-node tool that would allow detaching an index. This could also help in situations where the metadata for a given index has become corrupted and there are no other copies for this index metadata.

DaveCTurner · 2019-07-12T09:51:55Z

We check all the indices' versions at joining time, so there's no way for another node to join the master of a cluster containing such an old index. It can't be imported as a dangling index, so the only way to get a multi-node cluster containing such an ancient closed index is to do a full cluster restart with something like recover_after_nodes set to the full size of the cluster, because in that case the state used in join validation hasn't yet been recovered. If any node subsequently drops out of the cluster it'll never rejoin. Rolling upgrades are impossible in this state. In practice I think this means if anyone is actually broken by this change then they must only be running a one-node cluster, and there we have a reasonable path forward: go back to the earlier version (perhaps overriding the version check), deal with the broken index, and then attempt the upgrade again.

I am +1 to extending elasticsearch-node to help deal with broken metadata (see also #37286) but that's a bigger project.

henningandersen

LGTM.

Thanks @DaveCTurner , left one minor comment only.

henningandersen · 2019-07-12T11:30:15Z

server/src/test/java/org/elasticsearch/cluster/metadata/MetaDataIndexUpgradeServiceTests.java

+    public IndexMetaData newIndexMeta(String name, Settings indexSettings) {
+        final Version createdVersion = VersionUtils.randomVersionBetween(random(),
+            Version.CURRENT.minimumIndexCompatibilityVersion(), VersionUtils.getPreviousVersion());
+        final Version upgradedVersion = VersionUtils.randomVersionBetween(random(), createdVersion, VersionUtils.getPreviousVersion());


Previously, upgradedVersion was before created version. It seems it is now always equal or after. But the code does work for before too I think so maybe this should be a random version between minimum compatibility version and current?

Ok, I pushed 4c0ee9a. It doesn't really make sense (or at least represents an unsupported situation) because the index metadata can't have been upgraded in a version prior to the version in which the index was created, at least not without doing a downgrade. But nothing in these tests cares about that, so it's no big deal.

ywelsch · 2019-07-12T12:50:26Z

server/src/test/java/org/elasticsearch/cluster/metadata/MetaDataIndexUpgradeServiceTests.java

+
+    private static Version randomEarlierCompatibleVersion() {
+        return VersionUtils.randomVersionBetween(random(),
+            Version.CURRENT.minimumIndexCompatibilityVersion(), VersionUtils.getPreviousVersion());


VersionUtils.getPreviousVersion() uses only released versions I think. At least that stopped me a bit from adding a method like this in https://github.com/elastic/elasticsearch/pull/44235/files#diff-de7450af97775c27cab88faea6a38869R482

So it does. That means we won't be able to remove the 7.x version constants in master until 8.0 is released. I bet that bites us one day.

Addressed here in 4ac6f7f.

DaveCTurner added >bug :Core/Infra/Resiliency Keep running when everything is ok. Die quickly if things go horribly wrong. v8.0.0 v6.8.2 v7.4.0 labels Jul 12, 2019

DaveCTurner requested review from ywelsch and henningandersen July 12, 2019 08:56

DaveCTurner mentioned this pull request Jul 12, 2019

Better handling of ancient indices #44230

Closed

DaveCTurner removed v6.8.2 v7.4.0 labels Jul 12, 2019

DaveCTurner added the >breaking label Jul 12, 2019

DaveCTurner added 2 commits July 12, 2019 11:25

Add breaking-changes doc

5ddc5a6

Merge branch 'master' into 2019-07-12-reject-unsupported-closed-indices

06e7ef7

henningandersen approved these changes Jul 12, 2019

View reviewed changes

Test cases where the upgraded version is before the created version too

4c0ee9a

ywelsch approved these changes Jul 12, 2019

View reviewed changes

DaveCTurner added 2 commits July 12, 2019 14:37

Check unreleased versions too

4ac6f7f

Merge branch 'master' into 2019-07-12-reject-unsupported-closed-indices

ae3764c

DaveCTurner merged commit 0df12a8 into elastic:master Jul 15, 2019

DaveCTurner deleted the 2019-07-12-reject-unsupported-closed-indices branch July 15, 2019 14:19

jakelandis mentioned this pull request Feb 22, 2021

DRAFT [META] REST Compatible API V7 completeness #68905

Closed

jakelandis removed the v8.0.0 label Jul 26, 2021

jakelandis added the v8.0.0-alpha1 label Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail node containing ancient closed index #44264

Fail node containing ancient closed index #44264

DaveCTurner commented Jul 12, 2019

elasticmachine commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

ywelsch commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

henningandersen left a comment

henningandersen Jul 12, 2019

DaveCTurner Jul 12, 2019

ywelsch Jul 12, 2019

DaveCTurner Jul 12, 2019 •

edited

Loading

Fail node containing ancient closed index #44264

Fail node containing ancient closed index #44264

Conversation

DaveCTurner commented Jul 12, 2019

elasticmachine commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

ywelsch commented Jul 12, 2019

DaveCTurner commented Jul 12, 2019

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Jul 12, 2019

Choose a reason for hiding this comment

DaveCTurner Jul 12, 2019

Choose a reason for hiding this comment

ywelsch Jul 12, 2019

Choose a reason for hiding this comment

DaveCTurner Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

DaveCTurner Jul 12, 2019 •

edited

Loading