Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't recover from buggy version #9925

Merged
merged 1 commit into from
Mar 2, 2015
Merged

Don't recover from buggy version #9925

merged 1 commit into from
Mar 2, 2015

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Feb 27, 2015

This commit forces a full recovery if the source node is < 1.4.0 and
prevents any recoveries from pre 1.3.2 nodes to
work around #7210

Closes #9922

note: this is just a start, I need to fix some BWC test first before this can be pulled in but I wanted to get the discussion going

@rmuir
Copy link
Contributor

rmuir commented Feb 27, 2015

+1 this looks great.

@rjernst
Copy link
Member

rjernst commented Feb 27, 2015

+1

return;
final Version sourceNodeVersion = recoveryStatus.sourceNode().version();
if (sourceNodeVersion.before(Version.V_1_3_2) && recoverySettings.compress()) { // don't recover from pre 1.3.2 if compression is on?
throw new ElasticsearchIllegalStateException("Can't recovery from node "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't recovery -> Can't recover

@mikemccand
Copy link
Contributor

+1, just left a couple minor comments.

@s1monw
Copy link
Contributor Author

s1monw commented Mar 2, 2015

@mikemccand @rmuir @rjernst pushed a new commit with unittests...

recoveryTarget.existingFiles(discoNode, store, withCompression);
assertTrue(discoNode.version() + " " + withCompression, version.onOrAfter(Version.V_1_3_2) || withCompression == false);
} catch (ElasticsearchIllegalStateException ex) {
// all is good
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check that we expect this exception? i.e., when version is before 1.3.2 and compression is on?

@bleskes
Copy link
Contributor

bleskes commented Mar 2, 2015

+1 on the change. I think the 1.3.2 part is better implemented as an allocation decider to prevent the master repeatedly trying to allocate it and failing. I checked and it's fairly easy to integrate this into NodeVersionAllocationDecider by adding RecoverySettings to the constructor

private Decision isVersionCompatible(final RoutingNodes routingNodes, final String sourceNodeId, final RoutingNode target, RoutingAllocation allocation) {
        final RoutingNode source = routingNodes.node(sourceNodeId);
        if (recoverySettings.compress() && source.node().getVersion().before(Version.V_1_3_2)) {
            return allocation.decision(Decision.NO, NAME, "source node version [%s] has a known compression bug preventing allocation",
                    source.node().version());
        } else if (target.node().version().onOrAfter(source.node().version())) {
            /* we can allocate if we can recover from a node that is younger or on the same version
             * if the primary is already running on a newer version that won't work due to possible
             * differences in the lucene index format etc.*/
            return allocation.decision(Decision.YES, NAME, "target node version [%s] is same or newer than source node version [%s]",
                    target.node().version(), source.node().version());
        }  else {
            return allocation.decision(Decision.NO, NAME, "target node version [%s] is older than source node version [%s]",
                    target.node().version(), source.node().version());
        }
    }

@s1monw
Copy link
Contributor Author

s1monw commented Mar 2, 2015

@bleskes here is a new commit

// clusterState = stabilize(clusterState, service);
// routingTable = clusterState.routingTable();
// for (int i = 0; i < routingTable.index("test").shards().size(); i++) {
// assertThat(routingTable.index("test").shard(i).shards().size(), equalTo(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left overs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah :D

@bleskes
Copy link
Contributor

bleskes commented Mar 2, 2015

LGTM. Thx @s1monw

This commit forces a full recovery if the source node is < 1.4.0 and
prevents any recoveries from pre 1.3.2 nodes if compression is enabled to
work around elastic#7210

Closes elastic#9922
@s1monw s1monw merged commit dd78370 into elastic:1.x Mar 2, 2015
@clintongormley
Copy link

Closes #9922

@clintongormley clintongormley added >bug v2.0.0-beta1 release highlight v1.5.0 :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. resiliency labels Mar 19, 2015
@clintongormley clintongormley changed the title [RECOVERY] Don't recover from buggy version Don't recover from buggy version Jun 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. release highlight resiliency v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants