-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot UUIDs in blob names #19421
Snapshot UUIDs in blob names #19421
Conversation
045a386
to
4a0ad6e
Compare
@@ -72,14 +72,14 @@ public InputStream readBlob(String blobName) throws IOException { | |||
logger.trace("readBlob({})", blobName); | |||
|
|||
if (!blobExists(blobName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this is a part of another PR, but since we are modifying it here - why did we need it in the first place? It looks like we are handling file not found condition when we try to read from the stream anyway. Why do it twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung what do you mean by lenient? What does it do if the file doesn't exist and you are trying to read the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imotov : What I meant was that it didn't throw Exceptions during testing (e.g. when I tried deleting a blob that didn't exist). Of course, this was using a mock, so I don't know if that would be the case in real life, but testing purposes, the existence check was necessary for it to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The better strategy is to change the mocks to conform to how the real APIs would behave (i.e. throw a 404 if the blob doesn't exist), in which case, the blobExists check is extraneous. I will take care of this in a separate commit as part of the PR
@abeyad Looks good. I left a few minor comments. |
c8f1258
to
6a9f488
Compare
Makes deleting snapshots more robust by first deleting the snapshot from the index generational file, then handling individual deletion file errors with log messages instead of failing the entire operation.
for reads and deletes if the blob does not exist.
to snapshot/restore) and the index to UUID mapping is stored in the repository index file.
a219c70
to
f0c9f6e
Compare
Azure and Google cloud blob containers, as the APIs for both return a 404 in the case of a missing object, which we already handle through a NoSuchFileFoundException.
f0c9f6e
to
299b8a7
Compare
@@ -223,6 +228,9 @@ | |||
|
|||
private final ChecksumBlobStoreFormat<BlobStoreIndexShardSnapshots> indexShardSnapshotsFormat; | |||
|
|||
// flag to indicate if the index gen file has been checked for updating from pre 5.0 versions | |||
private volatile boolean indexGenChecked; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that this might cause some issues. We don't control the underlying storage, which might change on us without our knowledge. I would like to run a couple of scenarios by you when you have a chance to see if we should/can make it more robust.
@abeyad left a couple of comments |
upgrades if it determines the read data is in the legacy format. It writes the upgraded version if it is not a read-only repository and caches the repository data if it is a read-only repository.
0148175
to
58d6b9d
Compare
RepositoryData repositoryData = updateIndexGenIfNecessary(); | ||
if (repositoryData == null) { | ||
repositoryData = readIndexGen(); | ||
RepositoryData repositoryData = readIndexGen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you collapse readIndexGen
into getRepositoryData
you will be able to remove otherwise unnecessary isLegacyFormat flag from RepositoryData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @imotov, I'll push a commit with those changes
Left a minor comment. Otherwise, LGTM. |
@imotov thanks for the review! |
This PR adds to #18815 to enable safer behavior with respect to snapshots. #18815 makes blob deletions strict - if the blob doesn't exist, an exception is thrown instead of silently failing. However, this presents an issue with some of our tests scenarios (and possibly real life occurrences) where the inability to delete a blob because it was deleted by someone else would cause the snapshot deletion itself to fail, potentially leaving other blobs around. This could even happen if the snap-.dat blob was successfully deleted but the machine crashed before the other blobs in the snapshot could be deleted. Our current mechanism uses a listing of the snap-.dat blobs to determine what the current snapshots are. If deletions can't be relied upon, then we can't be sure that the existence of
snap-A.dat
in the repository implies thatA
is a current snapshot.This PR uses the index generational files from #19002 to retrieve the snapshot UUID for snapshots and name all blobs by the snapshot UUID instead of the snapshot name. If a snapshot
A
was deleted, then recreated, but not all ofA
's files were deleted, then we would have to worry about overwriting existing blobs, which is problematic. By naming blobs with the snapshot UUID, we avoid this issue.This PR also introduces a unique index ID (a UUID) for indices in the snapshots, so index folders can be named by the UUID and avoid problems such as #7540.
Relates #18156