-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovery from snapshot may leave seq# gaps #27850
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bleskes
added
:Sequence IDs
:Distributed/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
>bug
v6.1.1
v6.2.0
v7.0.0
labels
Dec 16, 2017
jasontedor
approved these changes
Dec 16, 2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
final Index index = resolveIndex(indexName); | ||
final IndexShard primary = internalCluster().getInstance(IndicesService.class, dataNode).getShardOrNull(new ShardId(index, 0)); | ||
// create a gap in the sequence numbers | ||
getEngineFromShard(primary).seqNoService().generateSeqNo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
thx @jasontedor |
bleskes
added a commit
that referenced
this pull request
Dec 18, 2017
When snapshotting the primary we capture a lucene commit at an arbitrary moment from a sequence number perspective. This means that it is possible that the commit misses operations and that there is a gap between the local checkpoint in the commit and the maximum sequence number. When we restore, this will create a primary that "misses" operations and currently will mean that the sequence number system is stuck (i.e., the local checkpoint will be stuck). To fix this we should fill in gaps when we restore, in a similar fashion to normal store recovery.
bleskes
added a commit
that referenced
this pull request
Dec 18, 2017
When snapshotting the primary we capture a lucene commit at an arbitrary moment from a sequence number perspective. This means that it is possible that the commit misses operations and that there is a gap between the local checkpoint in the commit and the maximum sequence number. When we restore, this will create a primary that "misses" operations and currently will mean that the sequence number system is stuck (i.e., the local checkpoint will be stuck). To fix this we should fill in gaps when we restore, in a similar fashion to normal store recovery.
martijnvg
added a commit
that referenced
this pull request
Dec 18, 2017
* es/6.x: (170 commits) Allow TrimFilter to be used in custom normalizers (#27758) recovery from snapshot should fill gaps (#27850) Remove unused class PreBuiltTokenFilters (#27839) Reject scroll query if size is 0 (#22552) (#27842) Mutes ‘Rollover no condition matched’ YAML test Make randomNonNegativeLong() draw from a uniform distribution (#27856) Adapt rest test after backport. Relates #27833 Handle case where the hole vertex is south of the containing polygon(s) (#27685) Move range field mapper back to core Fix publication of elasticsearch-cli to Maven Do not use system properties when building the HttpAsyncClient (#27829) Optimize version map for append-only indexing (#27752) Add NioGroup for use in different transports (#27737) adapt field collapsing skip test version. relates #27833 Add version support for inner hits in field collapsing (#27822) (#27833) Clarify that number of threads is set by packages Register HTTP read timeout setting Fixes Checkstyle Remove `operationThreaded` from Java API (#27836) Fixes failing BytesSizeValues tests ...
clintongormley
added
:Distributed/Engine
Anything around managing Lucene and the Translog in an open shard.
and removed
:Sequence IDs
labels
Feb 14, 2018
jpountz
removed
the
:Distributed/Engine
Anything around managing Lucene and the Translog in an open shard.
label
Jan 29, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
v6.1.1
v6.2.0
v7.0.0-beta1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When snapshotting the primary we capture a lucene commit at an arbitrary moment from a sequence number perspective. This means that it is possible that the commit misses operations and that there is a gap between the local checkpoint in the commit and the maximum sequence number.
When we restore, this will create a primary that "misses" operations and currently will mean that the sequence number system is stuck (i.e., the local checkpoint will be stuck). To fix this we should fill in gaps when we restore, in a similar fashion to normal store recovery.