Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple recoveries from engine flush #10624

Merged
merged 1 commit into from May 5, 2015

Conversation

Projects
None yet
6 participants
@bleskes
Copy link
Member

bleskes commented Apr 16, 2015

In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted) we have to recover a long translog when we come back.

To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).

Change highlights:

  • Refactor Translog file management to allow for multiple files.
  • Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
  • A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
  • Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
  • Opening and closing the translog is now the responsibility of the IndexShard. ShadowIndexShards do not open the translog.
  • Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to index.translog.sync_interval (was index.gateway.local.sync)

There are a still some no commits and some open issues around the fact that ShadowIndexShards doesn't have a translog (I have some ideas for solutions but I want to discuss before making the PR even bigger). Finally testConcurrentWriteViewsAndSnapshot has a concurrency issue (test bug) I need to solve but I think we can start the review cycles. @s1monw @dakrone can you have a look?


public Boolean isHeldByCurrentThread() {
if (holdingThreads == null) {
return null;

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

I feel weird about returning null here, if holdingThreads is null, can we just return false? Or are we signaling something special (asserts aren't enabled) with this?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

just an "i don't know" semantics. I will change it to throw an exception saying it's only supported when assertions are enabled.

translog.committedTranslogId(committedTranslogId);
} catch (FileNotFoundException ex) {
// nocommit - test this!
if (engineConfig.getIndexSettings().getAsBoolean("index.ignore_unknown_translog", false)) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you make this a static string?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

will do

*/
private Tuple<Long, Long> loadTranslogIds(IndexWriter writer, Translog translog) throws IOException {
private Long loadCommittedTranslogId(IndexWriter writer, Translog translog) throws IOException {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Should probably annotate this as @Nullable since it can return null, just to make it explicit

@bleskes bleskes force-pushed the bleskes:gen_translog branch from fa27693 to 7095b10 Apr 16, 2015

public ReleasableLock(Lock lock) {
this.lock = lock;
boolean useHoldingThreads = false;
assert (useHoldingThreads = true);

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Maybe you can leave a comment here about what the holdingThreads ThreadLocal is used for, that way it doesn't get accidentally changed during a cleanup at a later time.

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

added java docs on the holdingThreads field

@@ -466,7 +476,7 @@ public StoreCloseListener(ShardId shardId, boolean ownsShard) {

@Override
public void handle(ShardLock lock) {
assert lock.getShardId().equals(shardId) : "shard Id mismatch, expected: " + shardId + " but got: " + lock.getShardId();
assert lock.getShardId().equals(shardId) : "shard Id mismatch, expected: " + shardId + " but got: " + lock.getShardId();

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

"Id" can be lowercase here?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

sure.

*/
private Tuple<Long, Long> loadTranslogIds(IndexWriter writer, Translog translog) throws IOException {
private Long loadCommittedTranslogId(IndexWriter writer, Translog translog) throws IOException {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Should be annotated with @Nullable

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

added.

long translogId = translogIdGenerator.incrementAndGet();
translog.newTransientTranslog(translogId);
indexWriter.setCommitData(Collections.singletonMap(Translog.TRANSLOG_ID_KEY, Long.toString(translogId)));
translogId = translog.newTranslog();
logger.trace("starting commit for flush; commitTranslog=true");

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you add the translogId to the log message here? It makes tracking stuff down on shared filesystems much easier.

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Nevermind, I see it later on in the commitIndexWriter method :)

@@ -797,7 +786,7 @@ public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpu
}

@Override
public SnapshotIndexCommit snapshotIndex() throws EngineException {
public SnapshotIndexCommit snapshotIndex(boolean flushFirst) throws EngineException {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

flushFirst is never used? This method is unconditionally flushing right now.

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you add javadoc about what kind of flush happens in this method (non-translog, waiting for ongoing)

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

argh, got lost in rebase. Added.

@@ -78,7 +73,10 @@ public IndexShardGateway(ShardId shardId, @IndexSettings Settings indexSettings,

this.waitForMappingUpdatePostRecovery = indexSettings.getAsTime("index.gateway.wait_for_mapping_update_post_recovery", TimeValue.timeValueSeconds(30));
syncInterval = indexSettings.getAsTime("index.gateway.sync", TimeValue.timeValueSeconds(5));
if (syncInterval.millis() > 0) {
if (indexShard instanceof ShadowIndexShard) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Since you have access to the IndexShard you can use indexShard.routingEntry().primary() == false && IndexMetaData.isIndexUsingShadowReplicas(indexSettings) for this.

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

yeah, maybe we should have a utility method somewhere? I think the best alternative would be to change the juice context not to create this as it is not needed, but I wanted to keep the scope of the change. Wondering if we should just keep instanceof and do a bigger rewrite as a different change?

This comment has been minimized.

@bleskes

bleskes Apr 24, 2015

Author Member

all of this is now moved to the FsTranslog class, so not needed.

@@ -251,6 +250,7 @@ public IndexShard(ShardId shardId, IndexSettingsService indexSettingsService, In
logger.debug("state: [CREATED]");

this.checkIndexOnStartup = indexSettings.get("index.shard.check_on_startup", "false");
this.translog = newTranslog(shardId, indexSettings, indexSettingsService, bigArrays, indexStore);

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

I'm kind of concerned about the Guice interaction since this is an injected constructor if this call throws an exception. I know Guice freaks out if an exception is thrown in a constructor (for instance, if it couldn't create the translog file due to disk full or something)

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

yeah, I hear you. The problem is that some places in the code rely on the translog being there. Ex. constructor IndexShardGateway :

 } else if (syncInterval.millis() > 0) {
            this.indexShard.translog().syncOnEachOperation(false);

maybe we can do this as another change as well?

@@ -1228,6 +1221,11 @@ protected Engine newEngine(boolean skipTranslogRecovery, EngineConfig config) {
return engineFactory.newReadWriteEngine(config, skipTranslogRecovery);
}

/* create a new translog if needed. can return a null if not needed */
protected FsTranslog newTranslog(ShardId shardId, Settings indexSettings, IndexSettingsService indexSettingsService,

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you mark this as @Nullable?

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Also, please annotate with @IndexSettings Settings indexSettings, there's nothing worse than indexSettings being renamed to settings at a later time and then not knowing which Settings it actually is.

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Actually, now that I look at it more, you don't need the index settings if you have the IndexSettingsService, because you can call indexSettingsService.getSettings(), so we should remove the redundant argument?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

yep. reshuffled things.

@@ -80,7 +80,7 @@ public void snapshot(final SnapshotId snapshotId, final IndexShardSnapshotStatus
}

try {
SnapshotIndexCommit snapshotIndexCommit = indexShard.snapshotIndex();
SnapshotIndexCommit snapshotIndexCommit = indexShard.snapshotIndex(true);

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you add a comment why we should flush when taking a snapshot? It may be something we need to add to the documentation too.

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

added

@@ -49,6 +50,7 @@

@Override
public void write(StreamOutput out, Translog.Operation op) throws IOException {
// nocommit: do we want to throw an UnsupportedOperationException?

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

I don't think we need this any more on master, since we can break backwards compatibility. In that case we can throw the UOE

/**
* Returns the largest translog id present in all locations or <tt>-1</tt> if no translog is present.
/** notifies the translog that translogId was committed into lucene, allowing it
* to release all previous translogs

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

To clarify, does this mean that if translog "6" is committed, that it also means all the data from translogs "5" and "4" exists? Or just "6"?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

data from 5 & 4 is guaranteed to be part of lucene and the lucene commit point points to 6 for the rest. Data from 6 may or may not be in lucene. I clarified the comment

Snapshot snapshot();

/** this smallest translog id in this view */
long minTrasnlogId();

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Typo here, "Trasnlog" -> "Translog"

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

oops. intellij is anyway not happy with translog. makes this hard to spot.

Create create = (Create) o;

if (timestamp != create.timestamp) {
return false;

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

This might be more readable and succinct with && chaining, what do you think?

return timestamp == create.timestamp &&
    ttl == create.ttl &&
    version == create.version &&
    id.equals(create.id) &&
    type.equals(create.type) &&
    ... etc ...

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

auto generated code for the works. changed.

@@ -34,13 +34,12 @@
public class TranslogStats implements ToXContent, Streamable {

private long translogSizeInBytes = 0;
private int estimatedNumberOfOperations = 0;
private int estimatedNumberOfOperations = -1;

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Why not leave this at 0? You can keep the assert and then you don't have to worry about someone serializing an empty TranslogStats and running into problems with out.writeVInt(estimatedNumberOfOperations)

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

I wanted to distinguish between cases we don't know (like when opening a translog file for the first time) and an actual 0 . I think it's important.

return lastWrittenPosition;
}

@Override
public Translog.Location add(BytesReference data) throws IOException {
rwl.writeLock().lock();
try {
try (ReleasableLock _ = writeLock.acquire()) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

I think java 9 is going to choke on _ here, if I recall correctly

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

made a lock of all of them

System.arraycopy(buffer, (int) (location.translogLocation - lastWrittenPosition), data, 0, location.size);
return data;
protected void readBytes(ByteBuffer targetBuffer, long position) throws IOException {
try (ReleasableLock _ = readLock.acquire()) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Another _ java 9 will be mad at

}
public FsChannelImmutableReader immutableReader() throws TranslogException {
if (channelReference.tryIncRef() == false) {
throw new ElasticsearchIllegalStateException("can't increment channel [" + channelReference + "] ref count");

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you throw a TranslogException here also (encapsulating the ElasticsearchIllegalStateException if you want) so that the shardId is captured?

This comment has been minimized.

@bleskes

bleskes Apr 17, 2015

Author Member

done

public void sync() throws IOException {
if (!syncNeeded()) {
return;
}
rwl.writeLock().lock();
try {
try (ReleasableLock _ = writeLock.acquire()) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

hooray for try-with-resources!

return this.stream;
}

boolean assertAttach(FsChannelReader owner) {

This comment has been minimized.

@dakrone

dakrone Apr 16, 2015

Member

Can you add javadocs for these two methods? (assertAttach and assertDetach)

channelReference.decRef();
}
public FsChannelImmutableReader immutableReader() throws TranslogException {
if (channelReference.tryIncRef() == false) {

This comment has been minimized.

@s1monw

s1monw Apr 23, 2015

Contributor

I guess we can use the double incRef pattern here too?

This comment has been minimized.

@bleskes

bleskes Apr 24, 2015

Author Member

not sure I follow what you mean?

This comment has been minimized.

@s1monw
@@ -116,7 +111,5 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeVLong(startTime);
out.writeVInt(phase2Operations);
out.writeVLong(phase2Time);
out.writeVInt(phase3Operations);

This comment has been minimized.

@s1monw

s1monw Apr 23, 2015

Contributor

g00d :)

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 23, 2015

I like what I see a lot. Yet, I think we still need to work on the unittest end to add more basic tests, it's just a feeling but we should have more tests that just make use of these classes as they are intended. I can help here once we are closer!

@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented Apr 24, 2015

@s1monw @dakrone pushed another commit or addressing yours and @kimchy 's feedback.

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 28, 2015

@bleskes I added some answers to your comments

@bleskes bleskes force-pushed the bleskes:gen_translog branch from cd7418f to a15eaf6 Apr 28, 2015

@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented Apr 28, 2015

@s1monw I pushed another round. Also added a test and removed the last no commit. I think this is ready now?

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 28, 2015

@bleskes I added some minor commetns on the commit - LGTM feel free up push once fixing

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 29, 2015

LGTM

@bleskes bleskes force-pushed the bleskes:gen_translog branch from 0557843 to 3a6f7a0 Apr 29, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 29, 2015

Decouple recoveries from engine flush
In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted)  we have to recover a long translog when we come back.

To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).

Change highlights:
- Refactor Translog file management to allow for multiple files.
- Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
- A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
- Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery  to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
- Opening and closing the translog is now the responsibility of the IndexShard. ShadowIndexShards do not open the translog.
- Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to `index.translog.sync_interval` (was `index.gateway.local.sync`)

Closes elastic#10624

@bleskes bleskes force-pushed the bleskes:gen_translog branch from 3a6f7a0 to 20d8cb7 Apr 29, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 29, 2015

Decouple recoveries from engine flush
In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted)  we have to recover a long translog when we come back.

To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).

Change highlights:
- Refactor Translog file management to allow for multiple files.
- Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
- A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
- Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery  to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
- Opening and closing the translog is now the responsibility of the IndexShard. ShadowIndexShards do not open the translog.
- Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to `index.translog.sync_interval` (was `index.gateway.local.sync`)

Closes elastic#10624

@bleskes bleskes force-pushed the bleskes:gen_translog branch from 20d8cb7 to 9e0697c Apr 30, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 30, 2015

Decouple recoveries from engine flush
In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted)  we have to recover a long translog when we come back.

To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).

Change highlights:
- Refactor Translog file management to allow for multiple files.
- Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
- A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
- Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery  to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
- Opening and closing the translog is now the responsibility of the IndexShard. ShadowIndexShards do not open the translog.
- Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to `index.translog.sync_interval` (was `index.gateway.local.sync`)

Closes elastic#10624

@bleskes bleskes force-pushed the bleskes:gen_translog branch from 9e0697c to 45c1c9c Apr 30, 2015

Decouple recoveries from engine flush
In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted)  we have to recover a long translog when we come back.

To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).

Change highlights:
- Refactor Translog file management to allow for multiple files.
- Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
- A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
- Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery  to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
- IndexShard now creates the translog together with the engine. The translog is closed by the engine on close. ShadowIndexShards do not open the translog.
- Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to `index.translog.sync_interval` (was `index.gateway.local.sync`)

Closes #10624

@bleskes bleskes force-pushed the bleskes:gen_translog branch from 45c1c9c to d596f5c Apr 30, 2015

@s1monw s1monw merged commit d596f5c into elastic:master May 5, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@s1monw s1monw removed in progress labels May 5, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jan 7, 2016

Translog base flushes can be disabled after replication relocation or…
… slow recovery

elastic#10624 decoupled translog flush from ongoing recoveries. In the process, the translog creation was delayed to moment the engine is created (during recovery, after copying files from the primary). On the other side, TranslogService, in charge of translog based flushes, starts a background checker as soon as the shard is allocated. That checker performs it's first check after 5s expected the translog to be there. However, if the file copying phase of the recovery takes >5s (likely!) or local recovery is slow, the check can run into an exception and never recover. The end result is that the translog based flush is completely disabled.

Note that this is mitigated but shard inactivity which triggers synced flush after 5m of no indexing.

Closes elastic#15814
@yiguolei

This comment has been minimized.

Copy link

yiguolei commented Apr 22, 2016

@bleskes When user delete a doc during recovery, and then the recovery process will send a index request to replica, the replica will index the doc again, so that the doc will appeared again since we removed phase3

@bleskes bleskes deleted the bleskes:gen_translog branch May 11, 2016

@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented May 11, 2016

@yiguolei sorry for the late response, but doc deletes are versioned just like any other write operation and can arrive out of order to the replicas. When the indexing operation is replayed it will not override the delete operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.