-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Retry bulk-delete items in GCS #138951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Retry bulk-delete items in GCS #138951
Conversation
|
@DaveCTurner, @joshua-adams-1 |
|
@nicktindall, I didn't have chance to review GCS retry refactoring PR. But want to verify that we still keep retry strategy for non stream calls. |
Yes, only get blob should be affected by my changes |
joshua-adams-1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading #138364 this looks good to me. I'm happy to approve once the CI issues are resolved. Could we also add unit tests for the deleteBlobs function?
|
|
||
| private static boolean isRetryErrCode(int code) { | ||
| return switch (code) { | ||
| case 408 | 429 | 500 | 502 | 503 | 504 -> true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we replace these raw values with
HttpURLConnectionlike here - Is it worth explaining why we can retry on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we used the names in org.elasticsearch.rest.RestStatus but yes names >> numbers here, and if there's any docs about why we should retry these codes then it'd be great to link them in a comment.
| if (failedItems.isEmpty() == false) { | ||
| final var retryBlobId = failedItems.getLast().blobId; | ||
| try { | ||
| client().deleteBlob(retryBlobId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- I assume this blocks?
- Does
storage.delete(blobId);use an exponential back off retry strategy?
| // remaining items go the next bulk | ||
| failedItems.removeLast(); | ||
| } catch (StorageException e) { | ||
| throw new IOException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also log the other elements in failedItems? Otherwise they would fail quietly
| final var retryBlobId = failedItems.getLast().blobId; | ||
| try { | ||
| client().deleteBlob(retryBlobId); | ||
| // remaining items go the next bulk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // remaining items go the next bulk | |
| // remaining items go into the next bulk |
| if (isRetryErrCode(errCode)) { | ||
| failedItems.add(new DeleteFailure(deleteResult.blobId, e.getCode())); | ||
| } else { | ||
| throw new IOException("Failed to process bulk delete, non-retryable error for blobId=" + deleteResult.blobId, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: If we fail to delete N blobs, are these subsequently cleaned up?
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible to me, but needs supporting changes in GoogleCloudStorageHttpHandler to exercise the retries properly.
|
|
||
| private static boolean isRetryErrCode(int code) { | ||
| return switch (code) { | ||
| case 408 | 429 | 500 | 502 | 503 | 504 -> true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we used the names in org.elasticsearch.rest.RestStatus but yes names >> numbers here, and if there's any docs about why we should retry these codes then it'd be great to link them in a comment.
Add retry logic for bulk-delete items in GCS blob store.