Skip to content

Conversation

@jean-philippe-martin
Copy link

While exercising the NIO code on large inputs (~600GB) I noticed occasional errors that crashed the program yet could probably have been retried. Those were:

  • 503 Service Unavailable
  • 500 Server error
  • Connection closed prematurely

The program was using the default retry options (3 retries). Looking at the specific exceptions I see that those StorageException had retryable=false, so the normal retry mechanism isn't triggered. This makes perfect sense for "connection closed prematurely".

This PR adds an option that allows NIO to re-open the channel in case of "connection closed prematurely" error. The same option will also retry on 503 and 500 errors (without re-opening the channel in that case). The PR also includes the corresponding test cases.

Re-running the benchmarks with this updated code, I was able to see those errors again and observe that the NIO retry worked: the final output was correct.

It also retries a few times on exceptions that are not normally
considered retriable (500, 503).

Also add corresponding test code.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 8, 2017
@jean-philippe-martin jean-philippe-martin changed the title Add optional in NIO for re-opening the channel to retry for some errors Add option in NIO for re-opening the channel to retry for some errors Mar 8, 2017
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 81.45% when pulling 919e67b on jean-philippe-martin:jp_retries into 59459fc on GoogleCloudPlatform:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 81.464% when pulling 919e67b on jean-philippe-martin:jp_retries into 59459fc on GoogleCloudPlatform:master.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 52606c2 on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**.

/**
* Returns the number of times we try re-opening a channel if it's closed unexpectedly
* while reading.
*/

This comment was marked as spam.

private long size;

/**
* @param maxReopen max. number of times to try re-opening the channel if it closes on us unexpectedly.

This comment was marked as spam.

@CheckReturnValue
@SuppressWarnings("resource")
static CloudStorageReadChannel create(Storage gcsStorage, BlobId file, long position)
static CloudStorageReadChannel create(Storage gcsStorage, BlobId file, long position, int maxReopen)

This comment was marked as spam.

innerOpen();
continue;
} else if (exs.getCode() == 500 && retries < maxRetries) {
// server error. Not retryable yet retrying works in my experience.

This comment was marked as spam.

int maxRetries = 3;
dst.mark();
while (true) {
try {

This comment was marked as spam.

This comment was marked as spam.

}
}

private void retryDelay(int attempt) {

This comment was marked as spam.

return new AutoValue_OptionChannelReopen(retryCount);
}

abstract int retry();

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d67872a on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d67872a on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**.

import com.google.auto.value.AutoValue;

@AutoValue
abstract class OptionMaxChannelReopen implements CloudStorageOption.OpenCopy {

This comment was marked as spam.

This comment was marked as spam.

private SeekableByteChannel newReadChannel(Path path, Set<? extends OpenOption> options)
throws IOException {
initStorage();
int maxChannelReopens = ((CloudStorageFileSystem)path.getFileSystem()).config().maxChannelReopens();

This comment was marked as spam.

This comment was marked as spam.

throw new UnsupportedOperationException(option.toString());
}
} else if (option instanceof OptionMaxChannelReopen) {
maxChannelReopens = ((OptionMaxChannelReopen)option).maxChannelReopen();

This comment was marked as spam.

This comment was marked as spam.

innerOpen();
continue;
} else if ((exs.getCode() == 500 || exs.getCode() == 503) && retries < maxRetries) {
// Retrying works in practice.

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling bcabfbc on jean-philippe-martin:jp_retries into ** on GoogleCloudPlatform:master**.

@garrettjonesgoogle garrettjonesgoogle merged commit f72746c into googleapis:master Mar 15, 2017
@jean-philippe-martin
Copy link
Author

Thank you!

chingor13 pushed a commit that referenced this pull request Jan 22, 2026
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants