Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring from snapshot should force generation of a new history uuid #26694

Merged
merged 9 commits into from Sep 19, 2017

Conversation

Projects
None yet
5 participants
@bleskes
Copy link
Member

commented Sep 18, 2017

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of histroy_uuid in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a history_uuid from the RecoverySource of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544

bleskes added some commits Sep 17, 2017

@ywelsch
Copy link
Contributor

left a comment

Left one request for change and one question, o.w. looks good to me.

@@ -2080,10 +2080,24 @@ private DocumentMapperForType docMapper(String type) {

private EngineConfig newEngineConfig(EngineConfig.OpenMode openMode) {
Sort indexSort = indexSortSupplier.get();
final boolean forceNewHistoryUUID;
switch (shardRouting.recoverySource().getType()) {
case EMPTY_STORE:

This comment has been minimized.

Copy link
@ywelsch

ywelsch Sep 19, 2017

Contributor

EMPTY_STORE should force a new history uuid (see also my comment here).
EMPTY_STORE is used for two situations:

  • creating a shard of a fresh index: Here, we trivially want to force a fresh UUID.
  • force allocating an empty shard: Here, we are creating a fresh history, so we want to force a fresh UUID too.

Note that this leaves the case of allocating a stale primary, which should also force a fresh history. Here, we have no way at the moment though to detect that based on the recovery source object. For this, if the recovery source is EXISTING_STORE, we can compare the on-disk allocation id (ShardStateMetaData.FORMAT.load(...)) with the expected allocation id in the shard routing object, and if they mismatch, force a new history id. Loading the ShardStateMetaData can happen in the IndexShard constructor.

I'm ok if you do this in a follow-up, but I still think EMPTY_STORE should be correctly set here to force a new history uuid.

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 19, 2017

Author Member

EMPTY_STORE will always result in a new history uuid as we create a new index writer and it will not have any user data. I'm fine with setting forceNewHistoryUUID to true to be explicit (and later on morph this into just "generateHistoryUUID` once all existing indices are known to have one and we can be explicit about the moments we generate new ones).

Re forcing a stale allocations - good catch. As you say, that's a more complicated one indeed. I think we should do it as a follow up and discuss possible approaches. I'm thinking about a new recovery source. I think this is an edge case we should be explicit about.

assert openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "translog uuid didn't change but open mode is " + openMode;
}
if (needsCommit) {
commitIndexWriter(indexWriter, translog, openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG

This comment has been minimized.

Copy link
@ywelsch

ywelsch Sep 19, 2017

Contributor

commitIndexWriter still has a check if (historyUUID != null). Is that one obsolete now?

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 19, 2017

Author Member

good catch. fixing.

@bleskes bleskes requested a review from ywelsch Sep 19, 2017

@ywelsch
Copy link
Contributor

left a comment

LGTM

@bleskes bleskes merged commit 04385a9 into elastic:master Sep 19, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@bleskes bleskes deleted the bleskes:history_uuid_by_recovery_source branch Sep 19, 2017

bleskes added a commit that referenced this pull request Sep 19, 2017

Restoring from snapshot should force generation of a new history uuid (
…#26694)

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544

bleskes added a commit that referenced this pull request Sep 19, 2017

Restoring from snapshot should force generation of a new history uuid (
…#26694)

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 20, 2017

Merge branch 'master' into global-checkpoint-sync
* master: (67 commits)
  Restoring from snapshot should force generation of a new history uuid (elastic#26694)
  test: Use a single primary shard so that the exception can caught in the same way
  Move pre-6.0 node checkpoint to SequenceNumbers
  Invalid JSON request body caused endless loop (elastic#26680)
  added comment
  fix line length violation
  Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions.
  inner hits: Do not allow inner hits that use _source and have a non nested object field as parent
  Separate Painless Whitelist Loading from the Painless Definition (elastic#26540)
  convert more admin requests to writeable (elastic#26566)
  Handle release of 5.6.1
  Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692)
  Update global checkpoint with permit after recovery
  Filter pre-6.0 nodes for checkpoint invariants
  Skip bad request REST test on pre-6.0
  Reenable BWC tests after disabling for backport
  Add global checkpoint tracking on the primary
  [Test] Fix reference/cat/allocation/line_8 test failure
  [Docs] improved description for fs.total.available_in_bytes (elastic#26657)
  Fix discovery-file plugin to use custom config path
  ...

bleskes added a commit that referenced this pull request Sep 21, 2017

Generating and committing a history_uuid on existing old indices dest…
…roys translog recovery info (#26734)

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

1) A new history uuid is generated as it doesn't exist in the index
2) The translog is opened and it's uuid doesn't change.
3) The committing the new history uuid used the standard commitIndexWriter method.
4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.

bleskes added a commit that referenced this pull request Sep 21, 2017

Generating and committing a history_uuid on existing old indices dest…
…roys translog recovery info (#26734)

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

1) A new history uuid is generated as it doesn't exist in the index
2) The translog is opened and it's uuid doesn't change.
3) The committing the new history uuid used the standard commitIndexWriter method.
4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.

@colings86 colings86 removed the v6.0.0 label Sep 22, 2017

@lcawl lcawl removed the v6.1.0 label Dec 12, 2017

@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.