Flush immediately after a remote recovery finishes (unless there are ongoing ones) #9439

bleskes · 2015-01-27T10:53:07Z

To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0.

I also added a simple recovery benchmark.

…there are ongoing ones) To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0. I also added a simple recovery benchmark.

mikemccand · 2015-01-27T11:55:40Z

src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java

            assert onGoingRecoveries.get() >= 0 : "ongoingRecoveries must be >= 0 but was: " + onGoingRecoveries.get();
+            if (left == 0) {


I really don't understand the concurrency here, but I just wanted to verify that there is no risk that another recovery kicks off right here, after a first recovery just finished and dropped the count to 0, and then we clear the xlog out from under that second recovery, losing documents?

this is protected in the flush logic it self, so we can safely try:

https://github.com/elasticsearch/elasticsearch/blob/1.4/src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java#L841

https://github.com/elasticsearch/elasticsearch/blob/1.4/src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java#L855

and on the recovery side:

https://github.com/elasticsearch/elasticsearch/blob/1.4/src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java#L1125

OK, I see, looks good. Thanks @bleskes

mikemccand · 2015-01-27T11:56:02Z

LGTM, but I left one concurrency question...

imotov · 2015-01-27T16:24:08Z

LGTM

dakrone · 2015-01-27T16:55:37Z

src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java

            assert onGoingRecoveries.get() >= 0 : "ongoingRecoveries must be >= 0 but was: " + onGoingRecoveries.get();
+            if (left == 0) {
+                try {
+                    flush(new Engine.Flush().type(Flush.Type.COMMIT_TRANSLOG));


For the 1.x and master future-port, how are you going to implement the flush? Are you going to set force and waitIfOngoing?

I think both can be left at their defaults? force=false waitIfOngoing=false. It means there is already an ongoing flush or there is nothing to do. In both cases, no need for an extra flush to reduce the translog.

sounds good to me

dakrone · 2015-01-27T16:56:53Z

LGTM, left one comment

…there are ongoing ones) To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0. I also added a simple recovery benchmark. Closes #9439

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0.

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760

…there are ongoing ones) To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0. I also added a simple recovery benchmark. Closes elastic#9439

When we clean up after recoveries, we currently close the recovery counter first, followed up by the different snapshots. Since the recovery counter may issue a flush (introduced in elastic#9439) , the snapshot references prevent the flush from cleaning up the current translog file. This commit changes the order of the close command. Note: this is not a problem on master, as we moved the translog delete logic, making it kick in if needed when the ref counter goes to 0. Closes elastic#9760

bleskes added 2 commits January 27, 2015 11:15

minor tweak

9c846b6

bleskes added v1.4.3 v1.5.0 v2.0.0-beta1 :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement labels Jan 27, 2015

mikemccand reviewed Jan 27, 2015
View reviewed changes

minor tweak2

bb3f772

dakrone reviewed Jan 27, 2015
View reviewed changes

bleskes closed this in 22a576d Jan 28, 2015

bleskes deleted the recovery_benchmark branch January 28, 2015 08:15

bleskes added the resiliency label Feb 2, 2015

This was referenced Feb 19, 2015

Engine: close snapshots before recovery counter #9760

Closed

Shard stuck in relocating state with recovery stage=translog #9226

Closed

clintongormley changed the title ~~Recovery: flush immediately after a remote recovery finishes (unless there are ongoing ones)~~ Flush immediately after a remote recovery finishes (unless there are ongoing ones) Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush immediately after a remote recovery finishes (unless there are ongoing ones) #9439

Flush immediately after a remote recovery finishes (unless there are ongoing ones) #9439

bleskes commented Jan 27, 2015

mikemccand Jan 27, 2015

bleskes Jan 27, 2015

mikemccand Jan 27, 2015

mikemccand commented Jan 27, 2015

imotov commented Jan 27, 2015

dakrone Jan 27, 2015

bleskes Jan 27, 2015

dakrone Jan 27, 2015

dakrone commented Jan 27, 2015

		assert onGoingRecoveries.get() >= 0 : "ongoingRecoveries must be >= 0 but was: " + onGoingRecoveries.get();
		if (left == 0) {

Flush immediately after a remote recovery finishes (unless there are ongoing ones) #9439

Flush immediately after a remote recovery finishes (unless there are ongoing ones) #9439

Conversation

bleskes commented Jan 27, 2015

mikemccand Jan 27, 2015

Choose a reason for hiding this comment

bleskes Jan 27, 2015

Choose a reason for hiding this comment

mikemccand Jan 27, 2015

Choose a reason for hiding this comment

mikemccand commented Jan 27, 2015

imotov commented Jan 27, 2015

dakrone Jan 27, 2015

Choose a reason for hiding this comment

bleskes Jan 27, 2015

Choose a reason for hiding this comment

dakrone Jan 27, 2015

Choose a reason for hiding this comment

dakrone commented Jan 27, 2015