Reset replica engine before primary-replica resync #32867

dnhatn · 2018-08-15T03:04:25Z

When a replica starts following a newly promoted primary, it may have
some operations which don't exist on the new primary. We need to reset
replicas to the global checkpoint before executing primary-replica
resync. These two steps will align replicas to the primary.

This change resets an engine of a replica to the safe commit when
detecting a new primary term, then reindex operations from the local
translog up to the global checkpoint.

When a replica starts following a newly promoted primary, it may have some operations which don't exist on the new primary. We need to reset replicas to the global checkpoint before executing primary-replica resync. These two steps will align replicas to the primary. This change resets an engine of a replica to the safe commit when detecting a new primary term, then reindex operations from the local translog up to the global checkpoint.

elasticmachine · 2018-08-15T03:04:27Z

Pinging @elastic/es-distributed

dnhatn · 2018-08-15T03:05:30Z

/cc @not-napoleon

s1monw

I did an initial pass and left some comments

s1monw · 2018-08-15T07:34:49Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -388,17 +388,23 @@ public InternalEngine recoverFromTranslog() throws IOException {
        return this;
    }

+    // for testing
+    final Engine recoverFromTranslog() throws IOException {


is this necessary? can we just use the recoverFromTranslog(Long.MAX_VALUE) instead in the tests?

s1monw · 2018-08-15T07:52:26Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -196,6 +196,11 @@
    protected volatile long pendingPrimaryTerm; // see JavaDocs for getPendingPrimaryTerm
    protected volatile long operationPrimaryTerm;
    protected final AtomicReference<Engine> currentEngineReference = new AtomicReference<>();
+
+    private final AtomicReference<Engine> resettingEngineReference = new AtomicReference<>();


To me having 2 AtomicReference in flight is very very confusing. I think we can simplify this by introducing an EngineReference class that we make final here and add some the reset logic internally. Or, alterantively keep the AtomicReference<EngineReference>.

it could look like this:

class EngineReference { private volatile Engine activeEngine; private volatile Engine pendingEngine; synchronized boolean hasPendingEngine() { return pendingEngine != null; } synchronized void makeActiveReadOnly() { // do the lockdown thing... } synchronized void swapPendingEngine() { // do the swap... and close the current etc. } }

this looks more contained and we can maybe test it in isolation?

s1monw · 2018-08-15T08:06:21Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-        return applyIndexOperation(seqNo, operationPrimaryTerm, version, null, autoGeneratedTimeStamp, isRetry,
-            Engine.Operation.Origin.REPLICA, sourceToParse);
+                                                           boolean isRetry, SourceToParse sourceToParse) throws IOException {
+        return applyIndexOperation(getEngine(), seqNo, operationPrimaryTerm, version, null, autoGeneratedTimeStamp,


I was wondering why you did that and I think I understand what you are trying to do. You try to make sure we always get the latest engine ie. the locked down one if we swap it. but there is still a race imo. inside applyIndexOperation you might have an engine that is already closed unless you put a lock around it. The swap might be atomic but the reference might still receive writes after you locked it down, is this ok?

Before swapping engines, we drain all IndexShardOperationPermits (backed by Semaphore) and a write operation requires an IndexShardPermit. I think we are okay here.

++ maybe we add this as a comment somewhere or even as an assertion?

Don't we need to use getEngineForResync here? Assume that there are documents already replicated by the new primary before this replica has received all resync operations. Also wondering why there have been no test failures, maybe test coverage is not good? Also note that getEngineForResync is probably not the best name for this. I think there's a bigger issue here, let's sync about this tomorrow.

s1monw · 2018-08-15T08:07:58Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -2234,6 +2240,22 @@ public SeqNoStats getSeqNoStats(long globalCheckpoint) {
        return localCheckpointTracker.getStats(globalCheckpoint);
    }

+    @Override
+    public Engine lockDownEngine() throws IOException {


ideally we would do this entirely outside of the engine and maybe just pass and engine to the ctor of ReadOnlyEngine? do we need to make sure we don't receive writes after we did this or why do we acquire a write lock?

I will move this to ctor of ReadOnlyEngine.

ywelsch

I've just given this an initial look.

ywelsch · 2018-08-15T16:38:40Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+
+    final boolean canResetEngine() {
+        // TODO: do not reset the following shard
+        return indexSettings.getIndexVersionCreated().onOrAfter(Version.V_6_4_0);


AFAICS (correct me if I'm wrong) you had to it this way because we don't know on what node version the primary is (i.e. if it is going to send maxSeqNo or not), and the shard is reset when we acquire the replica operation permit (i.e. possibly before we receive the first resync request). It's a shame because it means we can't ensure consistency for older indices. The only other solution I can think of right now would be to always send the maximum sequence number with the replication request (same as we do for the global checkpoint). We could then pass this to acquireReplicaOperationPermit (same as the global checkpoint).

@ywelsch Yeah, you understood it correctly. I had the same thought but did not go with that option as I wasn't sure if it's a right trade-off. I am glad that you suggest it. Should we make that change into this PR or a separate prerequisite PR to reduce noise in this PR?

ywelsch · 2018-08-15T19:49:44Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+    }
+
+    private void resetEngineUpToLocalCheckpoint(long recoverUpToSeqNo) throws IOException {
+        synchronized (mutex) {


If I see this correctly, you're doing recoverFromTranslog under the mutex here? This can potentially block the cluster state update thread for minutes.

ywelsch · 2018-08-15T19:54:43Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+            // the resetting engine will be activated only if its local_checkpoint at least this guard.
+            minRequiredCheckpointForResettingEngine.set(currentMaxSeqNo);
+            resettingEngineReference.set(resettingEngine);
+            changeState(IndexShardState.RECOVERING, "reset engine from=" + currentMaxSeqNo + " to=" + globalCheckpoint);


why move the state back to RECOVERING?

ywelsch · 2018-08-15T19:56:22Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+    }
+
+    private void completeResettingEngineWithLocalHistory() throws IOException {
+        synchronized (mutex) {


same comment as above. You can't do stuff that possibly blocks the mutex for minutes

ywelsch · 2018-08-15T20:18:40Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

+
+    @Override
+    public void refresh(String source) throws EngineException {
+        // noop


refreshes should not be happening? If so, should we throw an UnsupportedOperationException here?

ywelsch · 2018-08-15T20:19:42Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

+
+    @Override
+    public CommitId flush(boolean force, boolean waitIfOngoing) throws EngineException {
+        throw new UnsupportedOperationException();


do we want to assert that all of these methods are never called?

ywelsch · 2018-08-15T20:25:04Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

+/**
+ * An engine that does not accept writes, and always points stats, searcher to the last commit.
+ */
+final class ReadOnlyEngine extends Engine {


I wonder if we should give this a different name, in particular because we might have something similar for frozen indices. There it might be a more complete version of readonly, with possibility to take a Translog.Snapshot. Maybe we could call this SearchOnlyEngine.

ywelsch · 2018-08-15T20:42:29Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-        return applyIndexOperation(seqNo, operationPrimaryTerm, version, null, autoGeneratedTimeStamp, isRetry,
-            Engine.Operation.Origin.REPLICA, sourceToParse);
+                                                           boolean isRetry, SourceToParse sourceToParse) throws IOException {
+        return applyIndexOperation(getEngine(), seqNo, operationPrimaryTerm, version, null, autoGeneratedTimeStamp,


dnhatn · 2018-08-17T01:43:49Z

@s1monw and @ywelsch It's ready for another round. Can you please take a look? Thank you!

s1monw

I left some comments. looks good!

s1monw · 2018-08-20T11:30:33Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+
+    @Override
+    protected void closeNoLock(String reason, CountDownLatch closedLatch) {
+        try {


I think you should protect this against double counting down the closeLatch by wrapping this entire try block in

if (isClosed.compareAndSet(false, true)) { }

s1monw · 2018-08-20T11:31:34Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+        this.seqNoStats = engine.getSeqNoStats(engine.getLastSyncedGlobalCheckpoint());
+        this.translogStats = engine.getTranslogStats();
+        this.lastCommittedSegmentInfos = engine.getLastCommittedSegmentInfos();
+        Searcher searcher = engine.acquireSearcher("lockdown", SearcherScope.INTERNAL);


can you leave a comment here that we keep a reference to the store implicitly through the searcher? I do wonder if we should make it explicit

s1monw · 2018-08-20T11:32:29Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+        this.lastCommittedSegmentInfos = engine.getLastCommittedSegmentInfos();
+        Searcher searcher = engine.acquireSearcher("lockdown", SearcherScope.INTERNAL);
+        try {
+            this.searcherManager = new SearcherManager(searcher.getDirectoryReader(),


this searcher manager seems to be unclosed. I think you should close it as well in the closeNoLock method?

s1monw · 2018-08-20T11:33:35Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+        store.incRef();
+        Releasable releasable = store::decRef;
+        try (ReleasableLock ignored = readLock.acquire()) {
+            final EngineSearcher searcher = new EngineSearcher(source, searcherManager, store, logger);


can you try to exercise this method to make sure we open a new searcher and close / release everything

Ah, getDocIds method in SearchOnlyEngineTests#testSearchOnlyEngine acquires searchers.

s1monw · 2018-08-20T11:35:01Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+    private final SearcherManager searcherManager;
+    private final Searcher lastCommitSearcher;
+
+    public SearchOnlyEngine(Engine engine) throws IOException {


I do wonder if it would make more sense to open this entire thing off a store directly and maybe just pass and EngineConfig to this. it would make it more generic and less bound to an engine. WDYT?

something like this:

public SearchOnlyEngine(EngineConfig config) { super(config); try { Store store = config.getStore(); store.incRef(); DirectoryReader reader = null; boolean success = false; try { this.lastCommittedSegmentInfos = Lucene.readSegmentInfos(store.directory()); this.translogStats = new TranslogStats(0, 0, 0, 0, 0); final SequenceNumbers.CommitInfo seqNoStats = SequenceNumbers.loadSeqNoInfoFromLuceneCommit(lastCommittedSegmentInfos.userData.entrySet()); long maxSeqNo = seqNoStats.maxSeqNo; long localCheckpoint = seqNoStats.localCheckpoint; this.seqNoStats = new SeqNoStats(maxSeqNo, localCheckpoint, localCheckpoint); reader = SeqIdGeneratingDirectoryReader.wrap(ElasticsearchDirectoryReader.wrap(DirectoryReader .open(store.directory()), config.getShardId()), config.getPrimaryTermSupplier().getAsLong()); this.indexCommit = reader.getIndexCommit(); this.searcherManager = new SearcherManager(reader, new SearcherFactory()); success = true; } finally { if (success == false) { IOUtils.close(reader, store::decRef); } } } catch (IOException e) { throw new UncheckedIOException(e); // this is stupid } }

I did something similar a while back so I had it ready... I am not sure it safe to use 💯

I adopted this, but I have to pass SeqNoStats from outside because we use a "reset" local checkpoint which may not equal to the value from an index commit.

s1monw · 2018-08-20T11:37:02Z

server/src/main/java/org/elasticsearch/index/engine/SearchOnlyEngine.java

+
+    @Override
+    public void maybePruneDeletes() {
+


nit extra newline

s1monw · 2018-08-20T11:38:10Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -1266,14 +1269,16 @@ public void trimOperationOfPreviousPrimaryTerms(long aboveSeqNo) {

    // package-private for testing
    int runTranslogRecovery(Engine engine, Translog.Snapshot snapshot) throws IOException {
-        recoveryState.getTranslog().totalOperations(snapshot.totalOperations());
-        recoveryState.getTranslog().totalOperationsOnStart(snapshot.totalOperations());
+        if (isEngineResetting() == false) {


any reason we can't just run this the same way we do if we are not resetting?

s1monw · 2018-08-20T11:40:06Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+    private Engine createNewEngine(EngineConfig config) throws IOException {
+        assert Thread.holdsLock(mutex);
+        if (state == IndexShardState.CLOSED) {
+            throw new AlreadyClosedException(shardId + " can't create engine - shard is closed");


not sure, should we through IndexShardNotClosedException instead?

Yes, we should throw IndexShardClosedException. AlreadyClosedException was a left-over when we folded Engine to IndexShard.

s1monw · 2018-08-20T11:42:19Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+        }
+
+        @Override
+        public void close() throws IOException {


make this synchronized too. it's safer since you modify both references

good catch!

s1monw · 2018-08-20T11:45:31Z

server/src/main/java/org/elasticsearch/index/translog/Translog.java

                .map(BaseTranslogReader::newSnapshot).toArray(TranslogSnapshot[]::new);
-            return newMultiSnapshot(snapshots);
+            Snapshot snapshot = newMultiSnapshot(snapshots);


maybe you just return snapshot if upToSeqNo is == Long.MAX_VALUE?

ywelsch · 2018-08-20T11:23:24Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+        synchronized void closePendingEngine() throws IOException {
+            final Engine closing = this.pendingEngine;
+            this.pendingEngine = null;
+            IOUtils.close(closing);


can you restrict the mutex to the first two lines and call close outside the mutex?

ywelsch · 2018-08-20T11:23:37Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+        }
+
+        @Override
+        public void close() throws IOException {


should this be synchronized so we get a consistent snapshot of the two engines?
Also, again, please do the closing outside the lock.

dnhatn · 2018-08-21T03:30:25Z

@s1monw and @ywelsch I've addressed your comments. Can you please give it another go? I will beef up integration tests as Yannick suggested.

ywelsch

This PR is too much to review in one sitting. Can you open a PR just for the recoverFromTranslog change where we can now specify an upper bound?

ywelsch · 2018-08-21T09:34:53Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                                    getEngine().flush(true, true); // force=true to make sure that we roll a translog generation
+                                    getEngine().resetLocalCheckpoint(localCheckpoint);
+                                }
+                                logger.info("detected new primary with primary term [{}], resetting local checkpoint from [{}] to [{}]",


this log message does not contain the right "before" local checkpoint as you moved it to after the local checkpoint reset

ywelsch · 2018-08-21T09:58:09Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-                                getEngine().resetLocalCheckpoint(localCheckpoint);
-                                getEngine().rollTranslogGeneration();
+                                if (isEngineResetting()) {
+                                    engineHolder.closePendingEngine();


I wonder if this is insufficient in the presence of cascading primary failures. Assume that you have a primary failover, which wants to index sequence number range 2 to 5 (because global checkpoint on new primary was 2, and resync trim-off is 5). Now, while resyncing, the global checkpoint moves from 2 to 3, and the new primary fails. Another primary is selected, which, for our purposes, has the global checkpoint 3. In that case the doc with sequence number 3 will only be in the translog and the pending Lucene index. By throwing the pending Lucene index away here, we now have to reset the local checkpoint and replay from sequence number 2 (to seq number 3).
What the implementation does here though is to not reset the local checkpoint to number 2, but leave it at 3, which, if this new IndexCommit is flushed, will lead to the situation where the local checkpoint info in the index commit is wrong (i.e. it might not contain the operation number 3).

I think this is okay because we start another engine after that in this case. Moreover, we stick with the "reset" local checkpoint (expose the local checkpoint of the active engine) while resetting the engine; thus the global checkpoint won't advance.

ywelsch · 2018-08-21T10:31:23Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                return null;
+            } else {
+                engineHolder.makeActiveEngineSearchOnly();
+                final Engine pendingEngine = createNewEngine(newEngineConfig());


this trims unsafe commits, possibly cleaning up segments that are referenced by the active search only engine?

We open a directory reader in the constructor of a search-only engine and keeps that reader until we manually close the search-only engine. Holding that reader would prevent the segment files of the last commit from deleting during trimming unsafe commits.

ywelsch · 2018-08-21T10:36:57Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+
+    private void completePendingEngineWithLocalHistory() throws IOException {
+        final Engine pendingEngine;
+        synchronized (mutex) {


why do you need to do this under the mutex?

ywelsch · 2018-08-21T10:49:12Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                    this.pendingEngine = null;
+                }
+            }
+            IOUtils.close(closing);


How do we ensure that searches are not accessing acquireSearcher on the closed engine and switching to the new engine? Also, is there a test that checks that searches (with preference set to this node) continue to work during this transition.

There is a small interval in that callers might acquire searchers from the closed engine and hit AlreadClosedException. We can void this entirely by retrying on "AlreadClosedException" if the accessing engine is different than the current active engine. However, I am not sure if we should do it.

There is a test which continuously acquires searchers and makes sure that all acknowledged writes are maintained during the transition. https://github.com/elastic/elasticsearch/pull/32867/files#diff-b268f5fefa5837ece96b957e46f628cbR674 (getShardDocUIDs acquires a searcher and uses that searcher to collect document Ids).

However, I am not sure if we should do it.

why is that? We're building all this machinery to have search availability during the transition, except for this very short moment?

I had the same idea about retrying. An alternative would be to do refcounting for closing the engine, to ensure that we only actually close once all in-flight acquireSearcher calls have been completed.

Sorry, I did not think this carefully. I thought we have to implement the retry on all methods that we support in the search-only, but I was wrong. We only need to implement the retry for "get" and "acquire searcher". These two methods should be simple. Thanks for this great question.

This change allows an engine to recover from its local translog up to the given seqno. The extended API can be used in these use cases: 1. When a replica starts following a new primary, it resets its index to the safe commit, then replays its local translog up to the current global checkpoint (see elastic#32867). 2. When a replica starts a peer-recovery, it can initialize the start_sequence_number to the persisted global checkpoint instead of the local checkpoint of the safe commit. A replica will then replay its local translog up to that global checkpoint before accepting remote translog from the primary. This change will increase the chance of operation-based recovery. I will make this in a follow-up. Relates elastic#32867

dnhatn · 2018-08-21T17:39:11Z

@ywelsch I opened #33032 for the translog change.

This commit allows us to use different TranslogRecoveryRunner when recovering an engine from its local translog. This change is a prerequisite for the commit-based rollback PR (elastic#32867). See elastic#32867 (comment)

This commit allows us to use different TranslogRecoveryRunner when recovering an engine from its local translog. This change is a prerequisite for the commit-based rollback PR. Relates #32867

dnhatn · 2018-09-06T16:10:27Z

Discussed this with @bleskes on another channel. We are going to split this PR into 3 smaller PRs so we can review.

Reset replica engine to the global checkpoint on promotion (Reset replica engine to global checkpoint on promotion #33473)
Restore local history on the new primary upon promotion (Restore local history from translog on promotion #33616). This is to deal with cascading fail-over.
Support search on replicas during promotion (Exposed engine must include all operations below global checkpoint during rollback #36159).

This commit allows us to use different TranslogRecoveryRunner when recovering an engine from its local translog. This change is a prerequisite for the commit-based rollback PR. Relates #32867

This change adds an engine implementation that opens a reader on an existing index but doesn't permit any refreshes or modifications to the index. Relates to elastic#32867 Relates to elastic#32844

This change adds an engine implementation that opens a reader on an existing index but doesn't permit any refreshes or modifications to the index. Relates to #32867 Relates to #32844

When a replica starts following a newly promoted primary, it may have some operations which don't exist on the new primary. Thus we need to throw those operations to align a replica with the new primary. This can be done by first resetting an engine from the safe commit, then replaying the local translog up to the global checkpoint. Relates #32867

If a shard was serving as a replica when another shard was promoted to primary, then its Lucene index was reset to the global checkpoint. However, if the new primary fails before the primary/replica resync completes and we are now being promoted, we have to restore the reverted operations by replaying the translog to avoid losing acknowledged writes. Relates elastic#32867

When a replica starts following a newly promoted primary, it may have some operations which don't exist on the new primary. Thus we need to throw those operations to align a replica with the new primary. This can be done by first resetting an engine from the safe commit, then replaying the local translog up to the global checkpoint. Relates #32867

Relates #elastic#32867

If a shard was serving as a replica when another shard was promoted to primary, then its Lucene index was reset to the global checkpoint. However, if the new primary fails before the primary/replica resync completes and we are now being promoted, we have to restore the reverted operations by replaying the translog to avoid losing acknowledged writes. Relates #33473 Relates #32867

Today we expose a new engine immediately during Lucene rollback. The new engine is started with a safe commit which might not include all acknowledged operation. With this change, we won't expose the new engine until it has recovered from the local translog. Note that this solution is not complete since it's able to reserve only acknowledged operations before the global checkpoint. This is because we replay translog up to the global checkpoint during rollback. A per-doc Lucene rollback would solve this issue entirely. Relates #32867

dnhatn · 2018-12-09T02:29:52Z

All subtasks of this PR were done and merged. I am closing this.

Today we expose a new engine immediately during Lucene rollback. The new engine is started with a safe commit which might not include all acknowledged operation. With this change, we won't expose the new engine until it has recovered from the local translog. Note that this solution is not complete since it's able to reserve only acknowledged operations before the global checkpoint. This is because we replay translog up to the global checkpoint during rollback. A per-doc Lucene rollback would solve this issue entirely. Relates #32867

dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 labels Aug 15, 2018

dnhatn requested review from s1monw, bleskes, ywelsch and jasontedor August 15, 2018 03:04

s1monw requested changes Aug 15, 2018

View reviewed changes

Remove no-arg method

2e10680

ywelsch reviewed Aug 15, 2018

View reviewed changes

dnhatn added 3 commits August 16, 2018 14:39

Remove lockdown method

216b890

Use engine holder

0ebb7e3

Merge branch 'master' into rollback-on-resync

95f3e9f

dnhatn requested review from s1monw and ywelsch August 17, 2018 01:43

s1monw reviewed Aug 20, 2018

View reviewed changes

ywelsch reviewed Aug 20, 2018

View reviewed changes

dnhatn added 2 commits August 20, 2018 18:03

Merge branch 'master' into rollback-on-resync

ce10e16

use last commit + fix synchronization

92021f7

ywelsch suggested changes Aug 21, 2018

View reviewed changes

dnhatn mentioned this pull request Aug 21, 2018

Allow engine to recover from translog upto a seqno #33032

Merged

assert same doc ids

b60d0e8

dnhatn mentioned this pull request Sep 5, 2018

Pass TranslogRecoveryRunner to engine from outside #33449

Merged

dnhatn mentioned this pull request Sep 6, 2018

Reset replica engine to global checkpoint on promotion #33473

Merged

s1monw mentioned this pull request Sep 10, 2018

Add read-only Engine #33563

Merged

s1monw added a commit that referenced this pull request Sep 11, 2018

Add read-only Engine (#33563)

517cfc3

This change adds an engine implementation that opens a reader on an existing index but doesn't permit any refreshes or modifications to the index. Relates to #32867 Relates to #32844

s1monw added a commit that referenced this pull request Sep 11, 2018

Add read-only Engine (#33563)

fe4c886

This change adds an engine implementation that opens a reader on an existing index but doesn't permit any refreshes or modifications to the index. Relates to #32867 Relates to #32844

dnhatn mentioned this pull request Sep 12, 2018

Restore local history from translog on promotion #33616

Merged

dnhatn added the WIP label Sep 12, 2018

gwbrown pushed a commit to gwbrown/elasticsearch that referenced this pull request Sep 14, 2018

TEST: Adapt change in recover from translog in engine

1f5b428

Relates #elastic#32867

colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018

dnhatn removed v6.6.0 v7.0.0 labels Oct 25, 2018

dnhatn mentioned this pull request Dec 3, 2018

Exposed engine must include all operations below global checkpoint during rollback #36159

Merged

dnhatn closed this Dec 9, 2018

dnhatn deleted the rollback-on-resync branch December 9, 2018 02:29

Reset replica engine before primary-replica resync #32867

Reset replica engine before primary-replica resync #32867

Conversation

dnhatn commented Aug 15, 2018

elasticmachine commented Aug 15, 2018

dnhatn commented Aug 15, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Aug 17, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Aug 21, 2018

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn Aug 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn Aug 23, 2018 • edited

Choose a reason for hiding this comment

dnhatn commented Aug 21, 2018

dnhatn commented Sep 6, 2018 • edited

dnhatn commented Dec 9, 2018

dnhatn Aug 21, 2018 •

edited

dnhatn Aug 22, 2018 •

edited

dnhatn Aug 21, 2018 •

edited

dnhatn Aug 23, 2018 •

edited

dnhatn commented Sep 6, 2018 •

edited