Put a fake allocation id on allocate stale primary command #34140

vladimirdolzhenko · 2018-09-28T12:13:44Z

Put a fake allocation id on allocate stale primary command; remove it after recovery is done

Eliminates case when the node after AllocateStalePrimary (or after recovery from a snapshot) crashes between shard initialization and shard recovery and allocation id is already adjusted.

Relates to #33432

… after recovery is done Relates to elastic#33432

elasticmachine · 2018-09-28T12:13:45Z

Pinging @elastic/es-distributed

…fore recovery is done (historyUUID is adjusted):

…cation_allocation_id

…efore recovery is done (historyUUID is adjusted)

bleskes

I left some comments. I also think we need some unit tests. Maybe here: AllocationCommandsTests

bleskes · 2018-10-02T07:48:06Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java

@@ -69,6 +69,13 @@ public void shardInitialized(ShardRouting unassignedShard, ShardRouting initiali
    @Override
    public void shardStarted(ShardRouting initializingShard, ShardRouting startedShard) {
        addAllocationId(startedShard);
+        if (startedShard.primary()
+            // started shard has to have null recoverySource; have to pick up recoverySource from its initializing state
+            && (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource


Why are snapshot and restore relevant here? I hope we can make this more explicit, depending on ExistingStoreRecoverySource#FORCE_STALE_PRIMARY_INSTANCE in some form or fashion

fbdf6d7, details here: #34140 (comment)

bleskes · 2018-10-02T07:48:31Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java

@@ -156,9 +164,12 @@ public MetaData applyChanges(MetaData oldMetaData, RoutingTable newRoutingTable)
                // forcing an empty primary resets the in-sync allocations to the empty set (ShardRouting.allocatedPostIndexCreate)
                indexMetaDataBuilder.putInSyncAllocationIds(shardId.id(), Collections.emptySet());
            } else {
+                assert recoverySource instanceof RecoverySource.ExistingStoreRecoverySource
+                    || recoverySource instanceof RecoverySource.SnapshotRecoverySource


Same comment as before.

snapshot/restore behaves like allocating a stale primary ( confirmed by @ywelsch )

Enforced check and extended comment at fbdf6d7

RecoverySource.ExistingStoreRecoverySource was introduced due to misleading by BalanceConfigurationTests test failure (fixed at d3df30c):

it used oversimplified allocator and it leads to inconsistent behaviour allocation decision (and after that inconsistent cluster state). On removing nodes some shards could be completely unassigned, in such cases PrimaryShardAllocator makes decision to to allocate that shard as NO_VALID_SHARD_COPY, while used in test NoopGatewayAllocator ignores that.

bleskes · 2018-10-02T07:50:28Z

server/src/main/java/org/elasticsearch/cluster/routing/IndexRoutingTable.java

@@ -141,7 +141,8 @@ boolean validate(MetaData metaData) {

                if (shardRouting.primary() && shardRouting.initializing() &&
                    shardRouting.recoverySource().getType() == RecoverySource.Type.EXISTING_STORE &&
-                    inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false)
+                    inSyncAllocationIds.contains(shardRouting.allocationId().getId()) == false &&
+                    inSyncAllocationIds.contains(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID) == false)


I think we want to go for hard equality here - the inSync set should be exactly the FORCE_ALLOCATION_ID

addressed in e331b1f

bleskes · 2018-10-02T07:51:52Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java

+            && (initializingShard.recoverySource() instanceof RecoverySource.ExistingStoreRecoverySource
+            || initializingShard.recoverySource() instanceof RecoverySource.SnapshotRecoverySource)) {
+            Updates updates = changes(startedShard.shardId());
+            updates.removedAllocationIds.add(RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID);


can we make sure this is the only one?

in e331b1f I added assert check in another place where we have old and new allocation ids

…mand

…g fake allocation id

…NoopGatewayAllocator that generated inconsistent cluster state

…y FORCE_STALE_PRIMARY_INSTANCE

…cation_allocation_id

vladimirdolzhenko · 2018-10-03T12:17:27Z

extended AllocationCommandsTests with AllocateStalePrimaryAllocationCommand in 8b697ef

…cation_allocation_id

bleskes

@vladimirdolzhenko sorry for the delay. I left some comments

server/src/main/java/org/elasticsearch/cluster/routing/IndexRoutingTable.java

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java

server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java

server/src/test/java/org/elasticsearch/cluster/routing/allocation/AllocationCommandsTests.java

...er/src/test/java/org/elasticsearch/cluster/routing/allocation/BalanceConfigurationTests.java

…) equal to startedShard.allocationId()

…the fake allocation id

…allocationId()

…cation_allocation_id

vladimirdolzhenko · 2018-10-30T10:46:04Z

run gradle build tests

bleskes

Thx @vladimirdolzhenko, I left some more small comments

server/src/main/java/org/elasticsearch/cluster/routing/allocation/IndexMetaDataUpdater.java

server/src/test/java/org/elasticsearch/cluster/routing/AllocationIdIT.java

… node name instead of node id

vladimirdolzhenko · 2018-11-05T17:03:21Z

@bleskes I addressed your comments, could you please have another look ? Thanks

…cation_allocation_id

bleskes

LGTM. Thanks for all the iterations.

vladimirdolzhenko · 2018-11-07T19:18:31Z

Thanks @bleskes for the review and comments.

removes fake allocation id after recovery is done Relates to #33432 (cherry picked from commit f789d49)

* master: (24 commits) Replicate index settings to followers (elastic#35089) Rename RealmConfig.globalSettings() to settings() (elastic#35330) [TEST] Cleanup FileUserPasswdStoreTests (elastic#35329) Scripting: Add back lookup vars in score script (elastic#34833) watcher: Fix integration tests to ensure correct start/stop of Watcher (elastic#35271) Remove ALL shard check in CheckShrinkReadyStep (elastic#35346) Use soft-deleted docs to resolve strategy for engine operation (elastic#35230) [ILM] Check shard and relocation status in AllocationRoutedStep (elastic#35316) Ignore date ranges containing 'now' when pre-processing a percolator query (elastic#35160) Add a frozen engine implementation (elastic#34357) Put a fake allocation id on allocate stale primary command (elastic#34140) [CCR] Enforce auto follow pattern name restrictions (elastic#35197) [ILM] rolling upgrade tests (elastic#35328) [ML] Add Missing data checking class (elastic#35310) Apply `ignore_throttled` also to concrete indices (elastic#35335) Make version field names more meaningful (elastic#35334) [CCR] Added HLRC support for pause follow API (elastic#35216) [Docs] Improve Convert Processor description (elastic#35280) [Painless] Removes extraneous compile method (elastic#35323) [CCR] Fail with a better error if leader index is red (elastic#35298) ...

…4140) removes fake allocation id after recovery is done Relates to elastic#33432

Put a fake allocation id on allocate stale primary command; remove it…

62ce29b

… after recovery is done Relates to elastic#33432

vladimirdolzhenko added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 labels Sep 28, 2018

vladimirdolzhenko requested a review from ywelsch September 28, 2018 15:11

Vladimir Dolzhenko added 3 commits September 29, 2018 21:16

added a test case to spot the problem if allocation id is adjusted be…

3185235

…fore recovery is done (historyUUID is adjusted):

Merge remote-tracking branch 'remotes/origin/master' into forced_allo…

2023a04

…cation_allocation_id

simplify test case to spot the problem if allocation id is adjusted b…

02e5e05

…efore recovery is done (historyUUID is adjusted)

bleskes reviewed Oct 2, 2018

View reviewed changes

Vladimir Dolzhenko added 5 commits October 2, 2018 13:23

extended testAllocateCommand: added AllocateStalePrimaryAllocationCom…

8b697ef

…mand

enforced check that there is only one allocation id on adding/removin…

e331b1f

…g fake allocation id

fix BalanceConfigurationTests: using TestGatewayAllocator instead of …

d3df30c

…NoopGatewayAllocator that generated inconsistent cluster state

enforce check for fake_allocation: for existing store it could be onl…

fbdf6d7

…y FORCE_STALE_PRIMARY_INSTANCE

Merge remote-tracking branch 'remotes/origin/master' into forced_allo…

f0d71c5

…cation_allocation_id

Merge remote-tracking branch 'remotes/origin/master' into forced_allo…

6b3cfaa

…cation_allocation_id

vladimirdolzhenko requested a review from bleskes October 5, 2018 10:05

bleskes suggested changes Oct 23, 2018

View reviewed changes

Vladimir Dolzhenko added 10 commits October 25, 2018 17:06

inline addAllocationId; add assert on initializingShard.allocationId(…

8f909af

…) equal to startedShard.allocationId()

java doc for FORCED_ALLOCATION_ID; use front and back underscore for …

3554615

…the fake allocation id

S/R deserves its own allocation id

70a8379

fix assert on initializingShard.allocationId() equal to startedShard.…

9ac69bd

…allocationId()

fixed index routing table validation message for fake allocation id case

46183b1

extract AllocateStalePrimaryCommand to its own test method

e14d094

simplify AllocationIdIT

72a24a4

simplify AllocationIdIT; don't restart master

e06ba85

Merge remote-tracking branch 'remotes/origin/master' into forced_allo…

a1440e3

…cation_allocation_id

after merge compilation fix

e741bf5

vladimirdolzhenko requested a review from bleskes October 30, 2018 14:12

bleskes suggested changes Nov 5, 2018

View reviewed changes

Vladimir Dolzhenko added 10 commits November 5, 2018 11:46

Merge branch 'origin/master' into forced_allocation_allocation_id

30dffa6

move updates earlier and reuse it

99dc666

drop getNodeIdByName as AllocateStalePrimaryAllocationCommand can use…

e8e3925

… node name instead of node id

dropped unnecessary settings

3f02dce

handle single historyUUID

dd2fe3b

reuse ESIntegTestCase.createIndex

1d8899e

drop redundant assertBusy

d141d9e

comment on the reason behind the test

e4e45c7

S&R leftover

2b6d535

drop useless open index (index is still opened)

07a795b

vladimirdolzhenko requested a review from bleskes November 5, 2018 17:03

Merge remote-tracking branch 'remotes/origin/master' into forced_allo…

78dfaa3

…cation_allocation_id

bleskes approved these changes Nov 7, 2018

View reviewed changes

vladimirdolzhenko merged commit f789d49 into elastic:master Nov 7, 2018

vladimirdolzhenko deleted the forced_allocation_allocation_id branch November 7, 2018 19:18

vladimirdolzhenko added a commit that referenced this pull request Nov 7, 2018

Put a fake allocation id on allocate stale primary command (#34140)

4fd9a9c

removes fake allocation id after recovery is done Relates to #33432 (cherry picked from commit f789d49)

vladimirdolzhenko added the v6.6.0 label Nov 7, 2018

pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Nov 13, 2018

Put a fake allocation id on allocate stale primary command (elastic#3…

7f38c83

…4140) removes fake allocation id after recovery is done Relates to elastic#33432

DaveCTurner mentioned this pull request Nov 13, 2018

Sporadic failure in testForceStaleReplicaToBePromotedToPrimary #35497

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put a fake allocation id on allocate stale primary command #34140

Put a fake allocation id on allocate stale primary command #34140

vladimirdolzhenko commented Sep 28, 2018

elasticmachine commented Sep 28, 2018

bleskes left a comment

bleskes Oct 2, 2018

vladimirdolzhenko Oct 3, 2018

bleskes Oct 2, 2018

vladimirdolzhenko Oct 3, 2018

bleskes Oct 2, 2018

vladimirdolzhenko Oct 3, 2018

bleskes Oct 2, 2018

vladimirdolzhenko Oct 3, 2018

vladimirdolzhenko commented Oct 3, 2018

bleskes left a comment

vladimirdolzhenko commented Oct 30, 2018

bleskes left a comment

vladimirdolzhenko commented Nov 5, 2018

bleskes left a comment

vladimirdolzhenko commented Nov 7, 2018

Put a fake allocation id on allocate stale primary command #34140

Put a fake allocation id on allocate stale primary command #34140

Conversation

vladimirdolzhenko commented Sep 28, 2018

elasticmachine commented Sep 28, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimirdolzhenko commented Oct 3, 2018

bleskes left a comment

Choose a reason for hiding this comment

vladimirdolzhenko commented Oct 30, 2018

bleskes left a comment

Choose a reason for hiding this comment

vladimirdolzhenko commented Nov 5, 2018

bleskes left a comment

Choose a reason for hiding this comment

vladimirdolzhenko commented Nov 7, 2018