Creating a snapshot does not verify that all nodes are writing to the same blobstore #81907
Labels
>bug
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Supportability
Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.
Team:Distributed
Meta label for distributed team (obsolete)
Snapshots work by writing to a blobstore in which the same blob can be accessed at the same path across all nodes. By default we check that the blobstore is shared across nodes correctly when the repository is registered. This check helps catch config and permission errors, including cases where the underlying blobstore is not properly shared across all nodes. This check can be bypassed by users who need to register a blobstore which is unavailable at registration time but will become available later on.
Today if the blobstore is accessible but not shared (and the user bypasses the registration-time checks that would prevent this) then snapshot creation will report success because we create snapshots without ever reading a blob that another node has written. Listing, restoring, and deleting snapshots may also sometimes appear to succeed. However it's definitely not safe to rely on such a setup to protect your data.
We should not report success when creating a snapshot in such a setup. We can detect this sort of problem by having the master read at least one blob written by every data node during snapshot creation. We mustn't verify too many blobs (e.g. one per shard) since this would be slow and expensive without adding much extra protection.
I propose that the master reads the first
BlobStoreIndexShardSnapshot
that each data node writes, and fails the snapshot if that read fails. I think we don't need to re-check this on every snapshot creation, it should be enough to remember past successes of nodes that have remained in the cluster since. Possibly we should re-check every 24h or so just in case the repository gets unmounted out from under us.The text was updated successfully, but these errors were encountered: