Bootstrap a new history_uuid when force allocating a stale primary #33432

dnhatn · 2018-09-05T17:05:10Z

This commit ensures that we bootstrap a new history_uuid when force
allocating a stale primary. A stale primary should never be the source
of an operation-based recovery to another shard which exists before the
forced-allocation.

Closes #26712

This commit ensures that we bootstrap a new history_uuid when force allocating a stale primary. A stale primary should never be the source of an operation-based recovery to another shard which exists before the forced-allocation.

elasticmachine · 2018-09-05T17:05:12Z

Pinging @elastic/es-distributed

bleskes

LGTM. Nice that we can do this now via store with little hassle.

bleskes · 2018-09-08T20:43:12Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

@@ -107,25 +112,68 @@ public int hashCode() {
    }

    /**
-     * recovery from an existing on-disk store or a fresh copy
+     * Recovery from a refresh copy


refresh -> fresh?

bleskes · 2018-09-08T20:46:00Z

server/src/test/java/org/elasticsearch/cluster/routing/PrimaryAllocationIT.java

+
+        Set<String> newHistoryUUIds = Arrays.stream(client().admin().indices().prepareStats("test").clear().get().getShards())
+            .map(shard -> shard.getCommitStats().getUserData().get(Engine.HISTORY_UUID_KEY)).collect(Collectors.toSet());
+        assertThat(newHistoryUUIds, everyItem(not(isIn(historyUUIDs))));


mabye assert that all new history uuids are the same, just for fun?

dnhatn · 2018-09-08T23:29:13Z

Thanks @bleskes for reviewing.

* master: CORE: Make Pattern Exclusion Work with Aliases (elastic#33518) Reverse logic for CCR license checks (elastic#33549) Add latch countdown on failure in CCR license tests (elastic#33548) HLRC: Add put stored script support to high-level rest client (elastic#31323) Create temporary directory if needed in CCR test Add license checks for auto-follow implementation (elastic#33496) Bootstrap a new history_uuid when force allocating a stale primary (elastic#33432) INGEST: Remove Outdated TODOs (elastic#33458) Logging: Clean up skipping test Logging: Skip test if it'd fail CRUD: AwaitsFix entire wait_for_refresh close test Painless: Add Imported Static Method (elastic#33440)

…33432) This commit ensures that we bootstrap a new history_uuid when force allocating a stale primary. A stale primary should never be the source of an operation-based recovery to another shard which exists before the forced-allocation. Closes #26712

Relates #33432

ywelsch · 2018-09-10T07:07:42Z

Note that if the node crashes between shard initialization and shard recovery, the allocation id (both in the in-sync set as well as on the node's local shard copy) will be adjusted but the history UUID will not, which is unsafe to do. I would have preferred to put the history UUID into the index metadata (on a per-shard level, similar to in-sync-allocation ids), and then realigned the shards history uuid with the index metadata one, avoiding the above problem.

bleskes · 2018-09-10T11:15:05Z

@ywelsch that's a good insight. If we put the history uuid in the index meta data, I think we need to work out the exact semantics of it and how it correlates to the lucene index version (or may we want to move to store it in the state file together with the history uuid). I think we can also consider another alternative - only add the allocation id of the stale copy to the insync set once it has been recovered (potentially clearing the in sync set when the command is run). I also wonder if we should do the same for recovery from snapshot (if I read the code correctly, I think we have the same problem)

ywelsch · 2018-09-19T12:22:59Z

only add the allocation id of the stale copy to the insync set once it has been recovered (potentially clearing the in sync set when the command is run)

I would be ok to activate it only after it has been recovered, but I would also like for the allocate stale primary command to immediately take effect (in terms of resetting the in-sync set). An empty in-sync set has currently the semantics of "allocate a fresh shard copy" (i.e. not an existing one), so that's not an option. We could just make it a singleton set with a random allocation id or keep the set as is as long as the newly allocated primary has not been activated.

bleskes · 2018-09-19T12:29:20Z

I would also like for the allocate stale primary command to immediately take effect (in terms of resetting the in-sync set).

Agreed.

We could just make it a singleton set with a random allocation id

I first thought this was a hack, but then I thought that maybe we should set it to [ "_forced_allocation" ] which will increase the transparency of what happened until we get back to "normal" (i.e., the primary was started)

ywelsch · 2018-09-19T13:57:02Z

maybe we should set it to [ "_forced_allocation"

works for me.

… after recovery is done Relates to elastic#33432

removes fake allocation id after recovery is done Relates to #33432

removes fake allocation id after recovery is done Relates to #33432 (cherry picked from commit f789d49)

…4140) removes fake allocation id after recovery is done Relates to elastic#33432

dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 labels Sep 5, 2018

dnhatn requested review from bleskes and ywelsch September 5, 2018 17:05

dnhatn added the review label Sep 5, 2018

dnhatn added 3 commits September 5, 2018 23:04

rename est

376f8ca

Merge branch 'master' into new_history_uuid_stale_primary

a50ef73

Merge branch 'master' into new_history_uuid_stale_primary

d5c782f

bleskes approved these changes Sep 8, 2018

View reviewed changes

dnhatn added 2 commits September 8, 2018 16:52

Merge branch 'master' into new_history_uuid_stale_primary

f972a11

boaz feedback

44a163d

dnhatn merged commit 94e4cb6 into elastic:master Sep 8, 2018

dnhatn added the backport pending label Sep 8, 2018

dnhatn deleted the new_history_uuid_stale_primary branch September 8, 2018 23:30

dnhatn removed backport pending review labels Sep 10, 2018

dnhatn added a commit that referenced this pull request Sep 10, 2018

Adjust bwc for stale primary recovery source (#33432)

e6ca55b

Relates #33432

vladimirdolzhenko pushed a commit to vladimirdolzhenko/elasticsearch that referenced this pull request Sep 28, 2018

Put a fake allocation id on allocate stale primary command; remove it…

62ce29b

… after recovery is done Relates to elastic#33432

vladimirdolzhenko mentioned this pull request Sep 28, 2018

Put a fake allocation id on allocate stale primary command #34140

Merged

vladimirdolzhenko added a commit that referenced this pull request Nov 7, 2018

Put a fake allocation id on allocate stale primary command (#34140)

f789d49

removes fake allocation id after recovery is done Relates to #33432

vladimirdolzhenko added a commit that referenced this pull request Nov 7, 2018

Put a fake allocation id on allocate stale primary command (#34140)

4fd9a9c

removes fake allocation id after recovery is done Relates to #33432 (cherry picked from commit f789d49)

pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018

Put a fake allocation id on allocate stale primary command (elastic#3…

7f38c83

…4140) removes fake allocation id after recovery is done Relates to elastic#33432

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap a new history_uuid when force allocating a stale primary #33432

Bootstrap a new history_uuid when force allocating a stale primary #33432

dnhatn commented Sep 5, 2018

elasticmachine commented Sep 5, 2018

bleskes left a comment

bleskes Sep 8, 2018

bleskes Sep 8, 2018

dnhatn commented Sep 8, 2018

ywelsch commented Sep 10, 2018

bleskes commented Sep 10, 2018

ywelsch commented Sep 19, 2018

bleskes commented Sep 19, 2018

ywelsch commented Sep 19, 2018

Bootstrap a new history_uuid when force allocating a stale primary #33432

Bootstrap a new history_uuid when force allocating a stale primary #33432

Conversation

dnhatn commented Sep 5, 2018

elasticmachine commented Sep 5, 2018

bleskes left a comment

Choose a reason for hiding this comment

bleskes Sep 8, 2018

Choose a reason for hiding this comment

bleskes Sep 8, 2018

Choose a reason for hiding this comment

dnhatn commented Sep 8, 2018

ywelsch commented Sep 10, 2018

bleskes commented Sep 10, 2018

ywelsch commented Sep 19, 2018

bleskes commented Sep 19, 2018

ywelsch commented Sep 19, 2018