Refactor RecoveryTarget state management #8092

Closed
wants to merge 11 commits into
from

Projects

None yet

3 participants

@bleskes
Member
bleskes commented Oct 15, 2014

The PR rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that:

  • recovery resources are only cleared once there are no ongoing requests
  • recovery is automatically canceled when the target shard is closed/removed
  • canceled recoveries do not leave temp files behind when canceled.

Highlights of the change:

  1. All temporary files are cleared upon failure/cancel (see #7315 )
  2. All newly created files are always temporary
  3. Doesn't list local files on the cluster state update thread (which throw unwanted exception)
  4. Recoveries are canceled by a listener to IndicesLifecycle.beforeIndexShardClosed, so we don't need to explicitly call it.
  5. Simplifies RecoveryListener to only notify when a recovery is done or failed. Removed subtleties like ignore and retry (they are dealt with internally)

Relates to #7893

bleskes added some commits Oct 6, 2014
@bleskes bleskes Recovery: clean up temporary files when canceling recovery
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space.

Closes #7893
4005aa2
@bleskes bleskes temp file names registry - not there yet. 99e67a2
@bleskes bleskes wip 046967e
@bleskes bleskes Some more cleanup and java docs aaafcf2
@bleskes bleskes Beter encapsulate temporary files 676957d
@bleskes bleskes Fix compilation after rebasing to 1.x 32a9225
@bleskes bleskes changed the title from Recovery: refactor RecoveryTarget state managemnt to Recovery: refactor RecoveryTarget state management Oct 15, 2014
@s1monw s1monw self-assigned this Oct 15, 2014
@s1monw s1monw commented on an outdated diff Oct 16, 2014
...sticsearch/indices/recovery/RecoveriesCollection.java
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.elasticsearch.cluster.node.DiscoveryNode;
+import org.elasticsearch.common.Nullable;
+import org.elasticsearch.common.logging.ESLogger;
+import org.elasticsearch.common.util.concurrent.ConcurrentCollections;
+import org.elasticsearch.index.shard.IndexShardClosedException;
+import org.elasticsearch.index.shard.ShardId;
+import org.elasticsearch.index.shard.service.IndexShard;
+import org.elasticsearch.index.shard.service.InternalIndexShard;
+
+import java.util.Map;
+
@s1monw
s1monw Oct 16, 2014 Contributor

can we get a javadoc string what this class does?

@s1monw s1monw commented on the diff Oct 16, 2014
...earch/indices/cluster/IndicesClusterStateService.java
this.shardRouting = shardRouting;
this.indexService = indexService;
this.indexMetaData = indexMetaData;
}
@Override
- public void onRecoveryDone() {
- shardStateAction.shardStarted(shardRouting, indexMetaData.getUUID(), "after recovery (replica) from node [" + request.sourceNode() + "]");
- }
-
- @Override
- public void onRetryRecovery(TimeValue retryAfter, RecoveryStatus recoveryStatus) {
- recoveryTarget.retryRecovery(request, retryAfter, recoveryStatus, PeerRecoveryListener.this);
- }
-
- @Override
- public void onIgnoreRecovery(boolean removeShard, String reason) {
@s1monw
s1monw Oct 16, 2014 Contributor

I assume the onIgnoreRecovery is unused?

@bleskes
bleskes Oct 16, 2014 Member

Yes. it is now removed from the listener interface.

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...sticsearch/indices/recovery/RecoveriesCollection.java
+ private final Map<Long, RecoveryStatus> onGoingRecoveries = ConcurrentCollections.newConcurrentMap();
+
+ final private ESLogger logger;
+
+ public RecoveriesCollection(ESLogger logger) {
+ this.logger = logger;
+ }
+
+ /**
+ * Starts are new recovery for the given shard, source node and state
+ *
+ * @return the id of the new recovery.
+ */
+ public long startRecovery(InternalIndexShard indexShard, DiscoveryNode sourceNode, RecoveryState state, RecoveryTarget.RecoveryListener listener) {
+ RecoveryStatus status = new RecoveryStatus(indexShard, sourceNode, state, listener);
+ onGoingRecoveries.put(status.recoveryId(), status);
@s1monw
s1monw Oct 16, 2014 Contributor

can be make this a putIfAbsent and assert it's not there?

@bleskes
bleskes Oct 16, 2014 Member

will do

@s1monw s1monw commented on an outdated diff Oct 16, 2014
...sticsearch/indices/recovery/RecoveriesCollection.java
+ } else {
+ return null;
+ }
+ }
+ }
+ return null;
+ }
+
+
+ /** cancel all ongoing recoveries for the given shard. typically because the shards is closed */
+ public void cancelRecoveriesForShard(ShardId shardId, String reason) {
+ for (RecoveryStatus status : onGoingRecoveries.values()) {
+ if (status.shardId().equals(shardId)) {
+ cancelRecovery(status.recoveryId(), reason);
+ }
+
@s1monw
s1monw Oct 16, 2014 Contributor

extra newline here?

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...sticsearch/indices/recovery/RecoveriesCollection.java
@@ -0,0 +1,142 @@
+package org.elasticsearch.indices.recovery;/*
@s1monw
s1monw Oct 16, 2014 Contributor

this file header looks wrong

@bleskes
bleskes Oct 16, 2014 Member

Argh IntelliJ. will fix.

@s1monw s1monw commented on the diff Oct 16, 2014
...ticsearch/index/gateway/IndexShardGatewayService.java
@@ -62,6 +62,7 @@ public IndexShardGatewayService(ShardId shardId, @IndexSettings Settings indexSe
this.shardGateway = shardGateway;
this.snapshotService = snapshotService;
this.recoveryState = new RecoveryState(shardId);
+ this.recoveryState.setType(RecoveryState.Type.GATEWAY);
@s1monw
s1monw Oct 16, 2014 Contributor

should this go into the ctor?

@s1monw
s1monw Oct 16, 2014 Contributor

nevermind it gets updated

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ private final long recoveryId;
+ private final InternalIndexShard indexShard;
+ private final RecoveryState state;
+ private final DiscoveryNode sourceNode;
+ private final String tempFilePrefix;
+ private final Store store;
+ private final RecoveryTarget.RecoveryListener listener;
+
+ private AtomicReference<Thread> waitingRecoveryThread = new AtomicReference<>();
+
+ AtomicBoolean finished = new AtomicBoolean();
+
+ // we start with 1 which will be decremented on cancel/close/failure
+ final AtomicInteger refCount = new AtomicInteger(1);
+
+ private volatile ConcurrentMap<String, IndexOutput> openIndexOutputs = ConcurrentCollections.newConcurrentMap();
@s1monw
s1monw Oct 16, 2014 Contributor

unrelated but I think openIndexOutputs can be final and no need to be volatile?

@bleskes
bleskes Oct 16, 2014 Member

Yes, it now can (given the new access patterns). Good point.

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ private final static AtomicLong idGenerator = new AtomicLong();
+
+ private final String RECOVERY_PREFIX = "recovery.";
+
+ private final ShardId shardId;
+ private final long recoveryId;
+ private final InternalIndexShard indexShard;
+ private final RecoveryState state;
+ private final DiscoveryNode sourceNode;
+ private final String tempFilePrefix;
+ private final Store store;
+ private final RecoveryTarget.RecoveryListener listener;
+
+ private AtomicReference<Thread> waitingRecoveryThread = new AtomicReference<>();
+
+ AtomicBoolean finished = new AtomicBoolean();
@s1monw
s1monw Oct 16, 2014 Contributor

make finished private final?

@bleskes
bleskes Oct 16, 2014 Member

+1. will do.

@s1monw s1monw commented on the diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ }
+
+ public Store store() {
+ return store;
+ }
+
+ /** set a thread that should be interrupted if the recovery is canceled */
+ public void setWaitingRecoveryThread(Thread thread) {
+ waitingRecoveryThread.set(thread);
+ }
+
+ /**
+ * clear the thread set by {@link #setWaitingRecoveryThread(Thread)}, making sure we
+ * do not override another thread.
+ */
+ public void clearWaitingRecoveryThread(Thread threadToClear) {
@s1monw
s1monw Oct 16, 2014 Contributor

can we return and assert that the CAS op was successful?

@bleskes
bleskes Oct 16, 2014 Member

I can definitely return the value. I'm a bit conflicted regarding the assert as strictly speaking we can't guarantee it will work due to the retry logic when may set the thread before the clear command of the previous thread run. In practice it shouldn't be a problem because it only kicks in after 500ms. But still, I'm not sure it adds value to assert here?

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
}
-
- public synchronized void cancel() {
- canceled = true;
+
+ /** renames all temporary files to their true name, potentially overriding existing files */
+ public void renameAllTempFiles() throws IOException {
+ Iterator<String> tempFileIterator = tempFileNames.iterator();
+ final Directory directory = store.directory();
+ while (tempFileIterator.hasNext()) {
+ String tempFile = tempFileIterator.next();
+ String origFile = originalNameForTempFile(tempFile);
+ // first, go and delete the existing ones
+ try {
+ directory.deleteFile(origFile);
+ } catch (FileNotFoundException e) {
@s1monw
s1monw Oct 16, 2014 Contributor

you also need to catch NoSuchFileException it's OS dependent

@bleskes
bleskes Oct 16, 2014 Member

KK. I only copied the old code. Will change.

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
}
-
- public synchronized void cancel() {
- canceled = true;
+
+ /** renames all temporary files to their true name, potentially overriding existing files */
+ public void renameAllTempFiles() throws IOException {
+ Iterator<String> tempFileIterator = tempFileNames.iterator();
+ final Directory directory = store.directory();
@s1monw
s1monw Oct 16, 2014 Contributor

just for kicks I think we should inc the refcount on the store here before we access it

@bleskes
bleskes Oct 16, 2014 Member

Maybe better is to except if the ref count of the local object is <0 (which guarantees the store is kept alive)? Semantically you should only call methods on this object when having a ref count.

@s1monw s1monw commented on the diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ public void renameAllTempFiles() throws IOException {
+ Iterator<String> tempFileIterator = tempFileNames.iterator();
+ final Directory directory = store.directory();
+ while (tempFileIterator.hasNext()) {
+ String tempFile = tempFileIterator.next();
+ String origFile = originalNameForTempFile(tempFile);
+ // first, go and delete the existing ones
+ try {
+ directory.deleteFile(origFile);
+ } catch (FileNotFoundException e) {
+
+ } catch (Throwable ex) {
+ logger.debug("failed to delete file [{}]", ex, origFile);
+ }
+ // now, rename the files... and fail it it won't work
+ store.renameFile(tempFile, origFile);
@s1monw
s1monw Oct 16, 2014 Contributor

what happens if this rename doesn't work here?

@bleskes
bleskes Oct 16, 2014 Member

Then we should fail the shard imho. I copied the old code. I'll double check that this is what happens.

@bleskes
bleskes Oct 19, 2014 Member

Checked. That's correct - shard is failed..

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
- if (canceled || outputs == null) {
- return null;
+
+ /** cancel the recovery. calling this method will clean temporary files and release the store
+ * unless this object is in use (in which case it will be cleaned once all ongoing users call
+ * {@link #decRef()} or {@link #close()}
+ *
+ * if {@link #setWaitingRecoveryThread(Thread)} was used, the thread will be interrupted.
+ */
+ public void cancel(String reason) {
+ if (finished.compareAndSet(false, true)) {
+ logger.debug("recovery canceled (reason: [{}])", reason);
+ // release the initial reference. recovery files will be cleaned as soon as ref count goes to zero, potentially now
+ decRef();
+
+ Thread thread = waitingRecoveryThread.get();
@s1monw
s1monw Oct 16, 2014 Contributor

make this final?

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
- openIndexOutputs = null;
- if (outputs == null) {
- return null;
+ /**
+ * fail the recovery and call listener
+ *
+ * @param e exception that encapsulating the failure
+ * @param sendShardFailure indicates whether to notify the master of the shard failure
+ **/
+ public void fail(RecoveryFailedException e, boolean sendShardFailure) {
+ if (finished.compareAndSet(false, true)) {
+ try {
+ listener.onRecoveryFailure(state, e, sendShardFailure);
+ } catch (Exception notifyError) {
+ logger.error("unexpected failure while notifying listener of failure", notifyError);
+ }
@s1monw
s1monw Oct 16, 2014 Contributor

decRef should happen in a finally

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ private String getTempNameForFile(String origFile) {
+ return tempFilePrefix + origFile;
+ }
+
+ /** return true if the give file is a temporary file name issued by this recovery */
+ private boolean isTempFile(String filename) {
+ return tempFileNames.contains(filename);
+ }
+
+ public IndexOutput getOpenIndexOutput(String key) {
+ return openIndexOutputs.get(key);
+ }
+
+ /** returns the original file name for a temporary file name issued by this recovery */
+ private String originalNameForTempFile(String tempFile) {
+ assert isTempFile(tempFile);
@s1monw
s1monw Oct 16, 2014 Contributor

make this a hard exception?

@bleskes
bleskes Oct 16, 2014 Member

yeah, can do. We don't care about performance here anyway.

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ String tempFileName = getTempNameForFile(fileName);
+ // add first, before it's created
+ tempFileNames.add(tempFileName);
+ IndexOutput indexOutput = store.createVerifyingOutput(tempFileName, IOContext.DEFAULT, metaData);
+ openIndexOutputs.put(fileName, indexOutput);
+ return indexOutput;
+ }
+
+ /**
+ * Used when holding a reference to this recovery status. Does *not* mean the recovery is done or finished.
+ * For that use {@link #cancel(String)} or {@link #markAsDone()}
+ *
+ * @throws Exception
+ */
+ @Override
+ public void close() throws Exception {
@s1monw
s1monw Oct 16, 2014 Contributor

prevent double closing here? maybe you can reuse the finished.compareAndSet(false, true) pattern?

@bleskes
bleskes Oct 16, 2014 Member

This is not really for closing but to decrement the ref count. The close name is really bad but it's what AutoClosable dictates and it allows using a nice try-with clause for the reference. See JavaDoc. Or did I misunderstand you?

@s1monw
s1monw Oct 16, 2014 Contributor

if you use autocloseable you can call the close method more than once by definition in the interface. I think once we called it once you should not decrement the refCount again?

@s1monw s1monw commented on the diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
+ if (refCount.compareAndSet(i, i + 1)) {
+ return true;
+ }
+ } else {
+ return false;
+ }
+ } while (true);
+ }
+
+ /**
+ * Decreases the refCount of this Store instance.If the refCount drops to 0, the recovery process this status represents
+ * is seen as done and resources and temporary files are deleted.
+ *
+ * @see #tryIncRef
+ */
+ public final void decRef() {
@s1monw
s1monw Oct 16, 2014 Contributor

I wonder if we can somehow factor this refcoutning logic out into a util class. something like

public class RefCounted {

public final void decRef() {
//...
}

public final boolean tryIncRef() {
//...
}

public final void incRef() {
//...
}

public inteface CloseListener {
  public void close(); // called when we reach 0
}
}

I think we can then also just use this in Store.java?

@bleskes
bleskes Oct 16, 2014 Member

I think this will be good though might be tricky if it requires multiple inheritance. If you mean a class that is used internally as a field and proxied, I think it means too much overhead to be worth it?

@s1monw
s1monw Oct 16, 2014 Contributor

I think the logic is super tricky here and it can be a standalone class that you pass a listener too to get notified if you really get closed we alrady have that tricky logic in 2 places now

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
}
- public synchronized IndexOutput openAndPutIndexOutput(String key, String fileName, StoreFileMetaData metaData, Store store) throws IOException {
- if (isCanceled()) {
- return null;
+ private void closeInternal() {
+ try {
+ // clean open index outputs
+ Iterator<Entry<String, IndexOutput>> iterator = openIndexOutputs.entrySet().iterator();
@s1monw
s1monw Oct 16, 2014 Contributor

I think you can just do:

IOUtils.closeWhileHandlingException(openIndexOutputs.values());
openIndexOutputs.clear();
@bleskes
bleskes Oct 16, 2014 Member

That's strictly speaking thread unsafe. I think we can do this here in practice, but part of the refactoring was not to have to worry about these kind of things. I think we should keep it as is?

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryStatus.java
}
- final ConcurrentMap<String, IndexOutput> outputs = openIndexOutputs;
- IndexOutput indexOutput = store.createVerifyingOutput(fileName, IOContext.DEFAULT, metaData);
- outputs.put(key, indexOutput);
- return indexOutput;
+ legacyChecksums.clear();
@s1monw
s1monw Oct 16, 2014 Contributor

any reason why we don't do this inside the try/finally?

@bleskes
bleskes Oct 16, 2014 Member

Not really. It was that way. Will move it in.

@s1monw s1monw and 1 other commented on an outdated diff Oct 16, 2014
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
}
- if (onGoingRecovery.isCanceled()) {
- onGoingRecovery.sentCanceledToSource = true;
- throw new IndexShardClosedException(request.shardId());
+ indexOutput.writeBytes(content.array(), content.arrayOffset(), content.length());
+ onGoingRecovery.state().getIndex().addRecoveredByteCount(content.length());
+ RecoveryState.File file = onGoingRecovery.state().getIndex().file(request.name());
+ if (file != null) {
+ file.updateRecovered(request.length());
+ }
+ if (indexOutput.getFilePointer() >= request.length() || request.lastChunk()) {
+ Store.verify(indexOutput);
@s1monw
s1monw Oct 16, 2014 Contributor

I think since you change the finally part you should do this like:

try {
   Store.verify(indexOutput);
} finally {
   indexOutput.close();
}

just to make sure we are closing the stream asap

@s1monw
Contributor
s1monw commented Oct 16, 2014

I left a bunch of comments - I really like that change btw :) 👍

@s1monw s1monw removed the review label Oct 16, 2014
@bleskes bleskes added the review label Oct 19, 2014
@bleskes
Member
bleskes commented Oct 19, 2014

@s1monw I pushed some commits to address your comments. Can we do another round?

@s1monw
Contributor
s1monw commented Oct 21, 2014

LGTM

@s1monw s1monw and 1 other commented on an outdated diff Oct 22, 2014
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
return;
}
// create a new recovery status, and process...
- final RecoveryStatus recoveryStatus = new RecoveryStatus(request.recoveryId(), indexShard, request.sourceNode());
- recoveryStatus.recoveryState.setType(request.recoveryType());
- recoveryStatus.recoveryState.setSourceNode(request.sourceNode());
- recoveryStatus.recoveryState.setTargetNode(request.targetNode());
- recoveryStatus.recoveryState.setPrimary(indexShard.routingEntry().primary());
- onGoingRecoveries.put(recoveryStatus.recoveryId, recoveryStatus);
-
+ RecoveryState recoveryState = new RecoveryState(indexShard.shardId());
+ recoveryState.setType(recoveryType);
+ recoveryState.setSourceNode(sourceNode);
+ recoveryState.setTargetNode(clusterService.localNode());
+ recoveryState.setPrimary(indexShard.routingEntry().primary());
+ final long recoveryId = onGoingRecoveries.startRecovery(indexShard, sourceNode, recoveryState, listener);
threadPool.generic().execute(new Runnable() {
@s1monw
s1monw Oct 22, 2014 Contributor

I actually think we should use AbstractRunnable here

     threadPool.generic().execute(new AbstractRunnable() {
            @Override
            public void onFailure(Throwable t) {
              // call listener
            }

            @Override
            protected void doRun() throws Exception {
 RecoveriesCollection.StatusRef statusRef = onGoingRecoveries.getStatus(recoveryId);
                if (statusRef == null) {
                    return;
                }
                try {
                    doRecovery(statusRef.status());
                } finally {
                    // make sure we never interrupt the thread after we have released it back to the pool
                    statusRef.status().clearWaitingRecoveryThread(Thread.currentThread());
                    statusRef.close();
                }
            }

WDYT?

@s1monw s1monw commented on an outdated diff Oct 22, 2014
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
}
- public void retryRecovery(final StartRecoveryRequest request, TimeValue retryAfter, final RecoveryStatus status, final RecoveryListener listener) {
+ protected void retryRecovery(final long recoveryId, TimeValue retryAfter) {
+ logger.trace("will retrying recovery with id [{}] in [{}]", recoveryId, retryAfter);
threadPool.schedule(retryAfter, ThreadPool.Names.GENERIC, new Runnable() {
@s1monw
s1monw Oct 22, 2014 Contributor

see above about abstract runnable

@s1monw s1monw commented on an outdated diff Oct 22, 2014
...rg/elasticsearch/indices/recovery/RecoveryTarget.java
- // in general, no need to clean the shard on ignored recovery, since we want to try and reuse it later
- // it will get deleted in the IndicesStore if all are allocated and no shard exists on this node...
+ logger.trace("collecting local files for {}", recoveryStatus);
+ Map<String, StoreFileMetaData> existingFiles = ImmutableMap.of();
+ try {
+ existingFiles = recoveryStatus.store().getMetadata().asMap();
@s1monw
s1monw Oct 22, 2014 Contributor

I am not sure if we should catch an exception here IMO the exception should bubble up

@bleskes
Member
bleskes commented Oct 23, 2014

@s1monw I pushed another commit based on the latest feedback. I'm giving CI an hour or two to chew this and will then merge.

@bleskes bleskes added a commit that referenced this pull request Oct 23, 2014
@bleskes bleskes Recovery: refactor RecoveryTarget state management
This commit rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that:
- recovery resources are only cleared once there are no ongoing requests
- recovery is automatically canceled when the target shard is closed/removed
- canceled recoveries do not leave temp files behind when canceled.

Highlights of the change:
1) All temporary files are cleared upon failure/cancel (see #7315 )
2) All newly created files are always temporary
3) Doesn't list local files on the cluster state update thread (which throw unwanted exception)
4) Recoveries are canceled by a listener to IndicesLifecycle.beforeIndexShardClosed, so we don't need to explicitly call it.
5) Simplifies RecoveryListener to only notify when a recovery is done or failed. Removed subtleties like ignore and retry (they are dealt with internally)

Closes #8092 , Closes #7315
ced753c
@bleskes bleskes added a commit that closed this pull request Oct 23, 2014
@bleskes bleskes Recovery: refactor RecoveryTarget state management
This commit rewrites the state controls in the RecoveryTarget family classes to make it easier to guarantee that:
- recovery resources are only cleared once there are no ongoing requests
- recovery is automatically canceled when the target shard is closed/removed
- canceled recoveries do not leave temp files behind when canceled.

Highlights of the change:
1) All temporary files are cleared upon failure/cancel (see #7315 )
2) All newly created files are always temporary
3) Doesn't list local files on the cluster state update thread (which throw unwanted exception)
4) Recoveries are canceled by a listener to IndicesLifecycle.beforeIndexShardClosed, so we don't need to explicitly call it.
5) Simplifies RecoveryListener to only notify when a recovery is done or failed. Removed subtleties like ignore and retry (they are dealt with internally)

Closes #8092 , Closes #7315
24bc8d3
@bleskes bleskes closed this in 24bc8d3 Oct 23, 2014
@bleskes bleskes deleted the enhancement/recovery_target_rewrite branch Oct 23, 2014
@bleskes
Member
bleskes commented Oct 23, 2014

merged. Thx @s1monw !

@bleskes bleskes added the resiliency label Oct 26, 2014
@bleskes bleskes added a commit that referenced this pull request Oct 29, 2014
@bleskes bleskes Recovery: change check for finished to a ref count check
we current check that the recovery is not finished when people access the status local variables. This is wrong and we should check for the refcount being > 0 as it is OK to use the status after it has marked as finished but there are still on going, in-flight reference to it.

Relates to #8092

Closes #8271
8eac79c
@bleskes bleskes added a commit that referenced this pull request Oct 29, 2014
@bleskes bleskes Recovery: change check for finished to a ref count check
we current check that the recovery is not finished when people access the status local variables. This is wrong and we should check for the refcount being > 0 as it is OK to use the status after it has marked as finished but there are still on going, in-flight reference to it.

Relates to #8092

Closes #8271
f57e981
@clintongormley clintongormley added :Recovery and removed review labels Mar 19, 2015
@clintongormley clintongormley changed the title from Recovery: refactor RecoveryTarget state management to Refactor RecoveryTarget state management Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment