Don't recover from buggy version #9925

Merged
merged 1 commit into from Mar 2, 2015

Projects

None yet

6 participants

@s1monw
Contributor
s1monw commented Feb 27, 2015

This commit forces a full recovery if the source node is < 1.4.0 and
prevents any recoveries from pre 1.3.2 nodes to
work around #7210

Closes #9922

note: this is just a start, I need to fix some BWC test first before this can be pulled in but I wanted to get the discussion going

@rmuir
Contributor
rmuir commented Feb 27, 2015

+1 this looks great.

@rjernst
Member
rjernst commented Feb 27, 2015

+1

@mikemccand mikemccand commented on an outdated diff Feb 27, 2015
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
@@ -157,13 +158,24 @@ private void doRecovery(final RecoveryStatus recoveryStatus) {
logger.trace("collecting local files for {}", recoveryStatus);
final Map<String, StoreFileMetaData> existingFiles;
- try {
- existingFiles = recoveryStatus.store().getMetadataOrEmpty().asMap();
- } catch (Exception e) {
- logger.debug("error while listing local files, recovery as if there are none", e);
- onGoingRecoveries.failRecovery(recoveryStatus.recoveryId(),
- new RecoveryFailedException(recoveryStatus.state(), "failed to list local files", e), true);
- return;
+ final Version sourceNodeVersion = recoveryStatus.sourceNode().version();
+ if (sourceNodeVersion.before(Version.V_1_3_2) && recoverySettings.compress()) { // don't recover from pre 1.3.2 if compression is on?
+ throw new ElasticsearchIllegalStateException("Can't recovery from node "
@mikemccand
mikemccand Feb 27, 2015 Contributor

Can't recovery -> Can't recover

@mikemccand mikemccand commented on an outdated diff Feb 27, 2015
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
- } catch (Exception e) {
- logger.debug("error while listing local files, recovery as if there are none", e);
- onGoingRecoveries.failRecovery(recoveryStatus.recoveryId(),
- new RecoveryFailedException(recoveryStatus.state(), "failed to list local files", e), true);
- return;
+ final Version sourceNodeVersion = recoveryStatus.sourceNode().version();
+ if (sourceNodeVersion.before(Version.V_1_3_2) && recoverySettings.compress()) { // don't recover from pre 1.3.2 if compression is on?
+ throw new ElasticsearchIllegalStateException("Can't recovery from node "
+ + recoveryStatus.sourceNode() + " with [" + RecoverySettings.INDICES_RECOVERY_COMPRESS
+ + " : true] due to compression bugs - see issue #7210 for details" );
+ }
+ if (sourceNodeVersion.onOrAfter(Version.V_1_4_0)) {
+ try {
+ existingFiles = recoveryStatus.store().getMetadataOrEmpty().asMap();
+ } catch (Exception e) {
+ logger.debug("error while listing local files, recovery as if there are none", e);
@mikemccand
mikemccand Feb 27, 2015 Contributor

Remove the ", recovery as if there are none"? Because we are failing the recovery instead right?

@mikemccand
Contributor

+1, just left a couple minor comments.

@s1monw
Contributor
s1monw commented Mar 2, 2015

@mikemccand @rmuir @rjernst pushed a new commit with unittests...

@bleskes bleskes commented on an outdated diff Mar 2, 2015
...asticsearch/indices/recovery/RecoveryTargetTests.java
+ client().admin().indices().prepareFlush("test").setWaitIfOngoing(true).setForce(true).get();
+ RecoveryTarget recoveryTarget = getInstanceFromNode(RecoveryTarget.class);
+ IndexService idxService = getInstanceFromNode(IndicesService.class).indexService("test");
+ Store store = idxService.shard(0).store();
+ store.incRef();
+ try {
+ int iters = randomIntBetween(10, 20);
+ for (int i = 0; i < iters; i++) {
+ Version version = randomVersion();
+ DiscoveryNode discoNode = new DiscoveryNode("123", new LocalTransportAddress("123"), version);
+ boolean withCompression = randomBoolean();
+ try {
+ recoveryTarget.existingFiles(discoNode, store, withCompression);
+ assertTrue(discoNode.version() + " " + withCompression, version.onOrAfter(Version.V_1_3_2) || withCompression == false);
+ } catch (ElasticsearchIllegalStateException ex) {
+ // all is good
@bleskes
bleskes Mar 2, 2015 Member

can we check that we expect this exception? i.e., when version is before 1.3.2 and compression is on?

@bleskes
Member
bleskes commented Mar 2, 2015

+1 on the change. I think the 1.3.2 part is better implemented as an allocation decider to prevent the master repeatedly trying to allocate it and failing. I checked and it's fairly easy to integrate this into NodeVersionAllocationDecider by adding RecoverySettings to the constructor

private Decision isVersionCompatible(final RoutingNodes routingNodes, final String sourceNodeId, final RoutingNode target, RoutingAllocation allocation) {
        final RoutingNode source = routingNodes.node(sourceNodeId);
        if (recoverySettings.compress() && source.node().getVersion().before(Version.V_1_3_2)) {
            return allocation.decision(Decision.NO, NAME, "source node version [%s] has a known compression bug preventing allocation",
                    source.node().version());
        } else if (target.node().version().onOrAfter(source.node().version())) {
            /* we can allocate if we can recover from a node that is younger or on the same version
             * if the primary is already running on a newer version that won't work due to possible
             * differences in the lucene index format etc.*/
            return allocation.decision(Decision.YES, NAME, "target node version [%s] is same or newer than source node version [%s]",
                    target.node().version(), source.node().version());
        }  else {
            return allocation.decision(Decision.NO, NAME, "target node version [%s] is older than source node version [%s]",
                    target.node().version(), source.node().version());
        }
    }
@s1monw
Contributor
s1monw commented Mar 2, 2015

@bleskes here is a new commit

@bleskes bleskes and 1 other commented on an outdated diff Mar 2, 2015
...ing/allocation/NodeVersionAllocationDeciderTests.java
+// clusterState = ClusterState.builder(clusterState).nodes(DiscoveryNodes.builder()
+// .put(newNode("old0", Version.V_1_3_1))
+// .put(newNode("new1"))
+// .put(newNode("new0"))).build();
+//
+// clusterState = stabilize(clusterState, service);
+//
+// clusterState = ClusterState.builder(clusterState).nodes(DiscoveryNodes.builder()
+// .put(newNode("new2"))
+// .put(newNode("new1"))
+// .put(newNode("new0"))).build();
+//
+// clusterState = stabilize(clusterState, service);
+// routingTable = clusterState.routingTable();
+// for (int i = 0; i < routingTable.index("test").shards().size(); i++) {
+// assertThat(routingTable.index("test").shard(i).shards().size(), equalTo(3));
@bleskes
bleskes Mar 2, 2015 Member

left overs?

@s1monw
s1monw Mar 2, 2015 Contributor

hmm yeah :D

@bleskes bleskes commented on the diff Mar 2, 2015
...asticsearch/test/ElasticsearchAllocationTestCase.java
@@ -71,9 +73,13 @@ public static AllocationDeciders randomAllocationDeciders(Settings settings, Nod
Constructor<? extends AllocationDecider> constructor = deciderClass.getConstructor(Settings.class, NodeSettingsService.class);
list.add(constructor.newInstance(settings, nodeSettingsService));
} catch (NoSuchMethodException e) {
- Constructor<? extends AllocationDecider> constructor = null;
- constructor = deciderClass.getConstructor(Settings.class);
- list.add(constructor.newInstance(settings));
+ try {
+ Constructor<? extends AllocationDecider> constructor = deciderClass.getConstructor(Settings.class);
+ list.add(constructor.newInstance(settings));
+ } catch (NoSuchMethodException e1) {
+ Constructor<? extends AllocationDecider> constructor = deciderClass.getConstructor(Settings.class, RecoverySettings.class);
@s1monw
s1monw Mar 2, 2015 Contributor

oh well :) could be worse

@bleskes
Member
bleskes commented Mar 2, 2015

LGTM. Thx @s1monw

@s1monw s1monw [RECOVERY] Don't recover from buggy version
This commit forces a full recovery if the source node is < 1.4.0 and
prevents any recoveries from pre 1.3.2 nodes if compression is enabled to
work around #7210

Closes #9922
dd78370
@s1monw s1monw merged commit dd78370 into elastic:1.x Mar 2, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details
@clintongormley
Member

Closes #9922

@clintongormley clintongormley changed the title from [RECOVERY] Don't recover from buggy version to Don't recover from buggy version Jun 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment