Add index commit id to searcher #63963

dnhatn · 2020-10-21T02:07:19Z

This commit adds an id, which is composed of the ids of segment files, to ElasticsearchDirectoryReader. With this id, we can retry search requests on another shard copy if its latest "snapshot" has the same segment files as the failing shard.

server/src/main/java/org/elasticsearch/common/lucene/index/ElasticsearchDirectoryReader.java

dnhatn · 2020-10-21T17:30:40Z

I am labeling this WIP, although it's ready as I am not sure the approach in the PR. The way we compose the searcher id might be too aggressive. If we need to improve search resiliency for searchable snapshots only, then we can use the commit id instead.

elasticmachine · 2020-10-21T17:31:06Z

Pinging @elastic/es-distributed (:Distributed/Engine)

henningandersen

I believe there are two goals we could have with this id:

The ability to repeat the fetch phase if it fails. Here we need identical doc-ids on the replica.
The ability to continue to use a PIT despite failures in the cluster. Here we do not need identical doc-ids.

I wonder if the second part is better served by maxSeqNo instead whenever gcp==maxSeqNo? Plus primary term for correctness.

It might be two completely separate ids and thus handled in another PR. I only wanted to raise this here to gather consensus that we cannot come up with a single id or failover scheme that can serve both purposes.

henningandersen · 2020-10-22T06:54:48Z

server/src/main/java/org/elasticsearch/common/lucene/index/ElasticsearchDirectoryReader.java

+        // are always different although they can have the identical segment files.
+        final MessageDigest md = MessageDigests.sha256();
+        for (SegmentCommitInfo sci : segmentInfos) {
+            final byte[] segmentId = sci.getId();


It is not clear to me that this is valid across different shard copies. The id generation starts somewhere random and then increments. I acknowledge the risk is small and I did not dig deeply into whether this increases the risk of collissions over using more standard uuid generation.

dnhatn · 2020-10-26T15:50:45Z

@jimczi If we implement #56828 using a composition of indexUUID, shardId, and sequence number (instead of internal document ID), then PIT can go with Henning's option-2. WDYT?

dnhatn · 2020-11-18T04:12:08Z

@henningandersen @jimczi

We (Jim, Francisco, and I) discussed this PR and agreed to add two IDs (i.e., commitID and seqID) to searchers. I implemented the commitID using an external UUID that is generated when an engine flushes (see 2f410eb). However, I found that using an external UUID isn't better than the id of an index commit. Hence, I re-implemented it using the id of an index commit. Would you please take a look? Thank you!

dnhatn · 2020-12-02T16:54:30Z

Could we also set replicas=1 and then check that the recovered copy has the same commit id?

We flush after copying segment files in peer recovery to associate a new translog UUID, and this flush changes the commit id.

We can avoid this limitation by having an external commit id, and change it whenever the InternalEngine performs a flush. I implemented this, but I reverted it as it helps only this situation. I am happy to bring it back if you prefer.

henningandersen · 2020-12-09T06:55:05Z

but I reverted it as it helps only this situation

Is the purpose of the change not that we can fail over a search to another replica copy containing the same data? I suppose the approach taken here could work for searchable snapshots, but it would be nice to have an approach that also works for shards that already went through file based peer recovery and did not see any changes since (like most warm indices today). Or is there a good reason that we cannot/should not handle that scenario?

s1monw

LGTM

server/src/main/java/org/elasticsearch/common/lucene/Lucene.java

This reverts commit 5ec5781.

dnhatn · 2020-12-10T17:42:09Z

Is the purpose of the change not that we can fail over a search to another replica copy containing the same data?

Yes, that's the original purpose of this PR.

I suppose the approach taken here could work for searchable snapshots.

Yes, it works with searchable snapshots.

It would be nice to have an approach that also works for shards that already went through file based peer recovery and did not see any changes since (like most warm indices today). Or is there a good reason that we cannot/should not handle that scenario?

The problem is that we flush to create a new index commit to associate with a new local translog on replicas after performing a file-based recovery. This generates a new commit_id, although the content of the index commit doesn't change. To circumvent this, we can have an external commit id that doesn't change in file-based recoveries (please see 5ec5781).

henningandersen · 2020-12-10T20:16:42Z

~~Thanks @dnhatn , I think we should go for the external commit id like you did in 5ec5781.~~

henningandersen · 2020-12-11T08:19:31Z

Chatted with @dnhatn about the main issue of the external commit id: relying on lucene not merging when associating a new translog with the commit. In order to continue this path of development, let us stick to the approach you have here (using the commit id of the segment infos), or revisit your original proposal of relying on individual segment commit-ids.

dnhatn · 2020-12-12T15:20:33Z

Thanks everyone for reviewing. I am merging this PR as is and will revisit the previous approach later.

This change assigns the id of an index commit to a searcher, so we can retry search requests on another shard copy if they have the same index commit.

The commit id introduced in #63963 does not work well with searchable snapshots as we create a new index commit when restoring from snapshots. This change revises an approach that generates an id using the ids of the segments of an index commit. Relates #63963

The commit id introduced in elastic#63963 does not work well with searchable snapshots as we create a new index commit when restoring from snapshots. This change revises an approach that generates an id using the ids of the segments of an index commit. Relates elastic#63963

The commit id introduced in #63963 does not work well with searchable snapshots as we create a new index commit when restoring from snapshots. This change revises an approach that generates an id using the ids of the segments of an index commit. Relates #63963

dnhatn added 5 commits October 20, 2020 22:05

Assign id to searcher

e9cc404

stylecheck

e445182

fix tests

23146bb

fix tests

68e2e12

more tests

4db30e6

dnhatn commented Oct 21, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/common/lucene/index/ElasticsearchDirectoryReader.java Outdated Show resolved Hide resolved

dnhatn added WIP :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Oct 21, 2020

dnhatn marked this pull request as ready for review October 21, 2020 17:31

elasticmachine added the Team:Distributed Meta label for distributed team label Oct 21, 2020

dnhatn requested review from jimczi, henningandersen and jpountz October 21, 2020 17:31

henningandersen reviewed Oct 22, 2020

View reviewed changes

dnhatn added 3 commits November 16, 2020 13:31

Merge branch 'master' into add-searcher-id

ae4c4c9

new direction

13116cc

Use es commit id

2f410eb

dnhatn removed the request for review from jpountz November 17, 2020 18:36

dnhatn added 3 commits November 17, 2020 14:37

try extract

800c9be

Merge branch 'master' into add-searcher-id

3892d32

Simplify

47f310d

dnhatn changed the title ~~Add id to searcher~~ Add index commit id to searcher Nov 18, 2020

dnhatn requested a review from henningandersen November 18, 2020 04:12

dnhatn added >enhancement v8.0.0 and removed WIP labels Nov 18, 2020

dnhatn requested a review from henningandersen December 2, 2020 16:54

s1monw approved these changes Dec 9, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/common/lucene/Lucene.java Show resolved Hide resolved

dnhatn added 4 commits December 10, 2020 09:43

Merge branch 'master' into add-searcher-id

b65f083

add javadocs

51d4ef4

Add external commit_id

5ec5781

Revert "Add external commit_id"

e94306d

This reverts commit 5ec5781.

dnhatn merged commit 0e8e02f into elastic:master Dec 12, 2020

dnhatn deleted the add-searcher-id branch December 12, 2020 15:30

dnhatn added the backport pending label Dec 12, 2020

dnhatn mentioned this pull request Dec 12, 2020

Add index commit id to searcher #66221

Merged

dnhatn added a commit that referenced this pull request Dec 12, 2020

Add index commit id to searcher (#63963)

a779f61

This change assigns the id of an index commit to a searcher, so we can retry search requests on another shard copy if they have the same index commit.

dnhatn removed the backport pending label Dec 12, 2020

dnhatn mentioned this pull request Dec 21, 2020

Assign id to searcher using ids of segments #66668

Merged

dnhatn mentioned this pull request Dec 21, 2020

Assign id to searcher using ids of segments #66699

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index commit id to searcher #63963

Add index commit id to searcher #63963

dnhatn commented Oct 21, 2020

dnhatn commented Oct 21, 2020

elasticmachine commented Oct 21, 2020

henningandersen left a comment

henningandersen Oct 22, 2020

dnhatn commented Oct 26, 2020

dnhatn commented Nov 18, 2020

dnhatn commented Dec 2, 2020

henningandersen commented Dec 9, 2020

s1monw left a comment

dnhatn commented Dec 10, 2020

henningandersen commented Dec 10, 2020 •

edited

Loading

henningandersen commented Dec 11, 2020

dnhatn commented Dec 12, 2020

Add index commit id to searcher #63963

Add index commit id to searcher #63963

Conversation

dnhatn commented Oct 21, 2020

dnhatn commented Oct 21, 2020

elasticmachine commented Oct 21, 2020

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Oct 22, 2020

Choose a reason for hiding this comment

dnhatn commented Oct 26, 2020

dnhatn commented Nov 18, 2020

dnhatn commented Dec 2, 2020

henningandersen commented Dec 9, 2020

s1monw left a comment

Choose a reason for hiding this comment

dnhatn commented Dec 10, 2020

henningandersen commented Dec 10, 2020 • edited Loading

henningandersen commented Dec 11, 2020

dnhatn commented Dec 12, 2020

henningandersen commented Dec 10, 2020 •

edited

Loading