Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Resiliency: Cancelling a recovery may leave temporary files behind #7893
We currently cancel recoveries when the shard is no longer assigned to the target node, or the primary shard (source of copying) is moved to another node (and there are more scenarios). That cancel logic doesn't clean up any temporary files created during the recovery.
Normally that's not a problem as the files will be cleaned up once the shard is safely recovered somewhere else (or locally). However, if one runs into continuous failure cycles we can fill up disk space, causing bigger problems like corrupting other shards on the node.