Resiliency: Cancelling a recovery may leave temporary files behind #7893

Closed
bleskes opened this Issue Sep 26, 2014 · 1 comment

Projects

None yet

2 participants

@bleskes
Member
bleskes commented Sep 26, 2014

We currently cancel recoveries when the shard is no longer assigned to the target node, or the primary shard (source of copying) is moved to another node (and there are more scenarios). That cancel logic doesn't clean up any temporary files created during the recovery.

Normally that's not a problem as the files will be cleaned up once the shard is safely recovered somewhere else (or locally). However, if one runs into continuous failure cycles we can fill up disk space, causing bigger problems like corrupting other shards on the node.

@clintongormley clintongormley changed the title from Recovery: cancelling a recovery may leave temporary files behind to Resiliency: Cancelling a recovery may leave temporary files behind Sep 26, 2014
@s1monw s1monw added v1.4.0 and removed v1.4.0.Beta1 labels Sep 30, 2014
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this issue Oct 7, 2014
@bleskes bleskes Recovery: clean up temporary files when canceling recovery
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space.

Closes #7893
663e426
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this issue Oct 14, 2014
@bleskes bleskes Recovery: clean up temporary files when canceling recovery
At the moment, we leave around temporary files if a peer (replica) recovery is canceled. Those files will normally be cleaned up once the shard is started else but in case of errors this can lead to trouble. If recovery are started and canceled often, we may cause nodes to run out of disk space.

Closes #7893
4005aa2
@bleskes bleskes added v1.5.0 and removed v1.4.0 labels Nov 3, 2014
@bleskes
Member
bleskes commented Nov 3, 2014

fixed with #8092

@bleskes bleskes closed this Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment