-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add option in NIO for re-opening the channel to retry for some errors #1715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option in NIO for re-opening the channel to retry for some errors #1715
Conversation
It also retries a few times on exceptions that are not normally considered retriable (500, 503). Also add corresponding test code.
|
Changes Unknown when pulling 52606c2 on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**. |
| /** | ||
| * Returns the number of times we try re-opening a channel if it's closed unexpectedly | ||
| * while reading. | ||
| */ |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| private long size; | ||
|
|
||
| /** | ||
| * @param maxReopen max. number of times to try re-opening the channel if it closes on us unexpectedly. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| @CheckReturnValue | ||
| @SuppressWarnings("resource") | ||
| static CloudStorageReadChannel create(Storage gcsStorage, BlobId file, long position) | ||
| static CloudStorageReadChannel create(Storage gcsStorage, BlobId file, long position, int maxReopen) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| innerOpen(); | ||
| continue; | ||
| } else if (exs.getCode() == 500 && retries < maxRetries) { | ||
| // server error. Not retryable yet retrying works in my experience. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| int maxRetries = 3; | ||
| dst.mark(); | ||
| while (true) { | ||
| try { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| } | ||
| } | ||
|
|
||
| private void retryDelay(int attempt) { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| return new AutoValue_OptionChannelReopen(retryCount); | ||
| } | ||
|
|
||
| abstract int retry(); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
e10e72a to
d67872a
Compare
|
Changes Unknown when pulling d67872a on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**. |
|
Changes Unknown when pulling d67872a on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**. |
| import com.google.auto.value.AutoValue; | ||
|
|
||
| @AutoValue | ||
| abstract class OptionMaxChannelReopen implements CloudStorageOption.OpenCopy { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| private SeekableByteChannel newReadChannel(Path path, Set<? extends OpenOption> options) | ||
| throws IOException { | ||
| initStorage(); | ||
| int maxChannelReopens = ((CloudStorageFileSystem)path.getFileSystem()).config().maxChannelReopens(); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| throw new UnsupportedOperationException(option.toString()); | ||
| } | ||
| } else if (option instanceof OptionMaxChannelReopen) { | ||
| maxChannelReopens = ((OptionMaxChannelReopen)option).maxChannelReopen(); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| innerOpen(); | ||
| continue; | ||
| } else if ((exs.getCode() == 500 || exs.getCode() == 503) && retries < maxRetries) { | ||
| // Retrying works in practice. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
Changes Unknown when pulling bcabfbc on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**. |
|
Thank you! |
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
While exercising the NIO code on large inputs (~600GB) I noticed occasional errors that crashed the program yet could probably have been retried. Those were:
The program was using the default retry options (3 retries). Looking at the specific exceptions I see that those StorageException had
retryable=false, so the normal retry mechanism isn't triggered. This makes perfect sense for "connection closed prematurely".This PR adds an option that allows NIO to re-open the channel in case of "connection closed prematurely" error. The same option will also retry on 503 and 500 errors (without re-opening the channel in that case). The PR also includes the corresponding test cases.
Re-running the benchmarks with this updated code, I was able to see those errors again and observe that the NIO retry worked: the final output was correct.