Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retry certain RESOURCE_EXHAUSTED errors observed during ReadRows and report retry attempts #1257

Merged
merged 1 commit into from Aug 24, 2021

Conversation

esert-g
Copy link
Contributor

@esert-g esert-g commented Aug 20, 2021

Bq Storage Read service will start returning a retryable RESOURCE_EXHAUSTED error in the next few weeks when a read session's parallelism is considered to be excessive, so this PR expands retry handling logic for ReadRows with 2 changes:

  1. If a ReadRows request fails with a RESOURCE_EXHAUSTED error and the error has an associated RetryInfo, it is now considered to be retryable and retry delay is set according to the RetryInfo.
  2. If the client decides to retry, it now notifies the user with the provided RetryAttemptListener object. This will be useful as a negative feedback mechanism for future SplitReadStream requests which in return will reduce the likelihood of receiving the new retryable RESOURCE_EXHAUSTED error.

@esert-g esert-g requested a review from as a code owner Aug 20, 2021
@esert-g esert-g requested a review from shollyman Aug 20, 2021
@google-cla google-cla bot added the cla: yes label Aug 20, 2021
@product-auto-label product-auto-label bot added the api: bigquerystorage label Aug 20, 2021
@shollyman shollyman added the kokoro:force-run label Aug 23, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run label Aug 23, 2021
@shollyman shollyman requested a review from stephaniewang526 Aug 23, 2021
@shollyman shollyman added the kokoro:force-run label Aug 23, 2021
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run label Aug 23, 2021
Copy link
Contributor

@shollyman shollyman left a comment

LGTM, with some minor nits.

private RetryAttemptListener readRowsRetryAttemptListener = null;

/**
* If a non null readRowsRetryAttemptListener is provided, client will call onRetryAtempt function
Copy link
Contributor

@shollyman shollyman Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/onRetryAtempt/onRetryAttempt here and in the other versions.

Copy link
Contributor Author

@esert-g esert-g Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public Duration retryDelay = null;
}

private static final Metadata.Key<RetryInfo> KEY_RETRY_INFO =
Copy link
Contributor

@shollyman shollyman Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, we'll need to see if we have compatible key resolvers for other langs. I've not seen this before, but apparently its descriptor fullname and a "-bin" suffix?

Copy link
Contributor Author

@esert-g esert-g Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what it does. I'm having a hard time finding external docs about why it is supposed to be like that, but you can find other libraries interacting with gcp services using the same keys, e.g. https://github.com/googleapis/google-cloud-go/blob/master/spanner/retry.go#L33

Errors.IsRetryableStatusResult result = Errors.isRetryableStatus(status, metadata);
if (result.isRetryable) {
// If result.retryDelay isn't null, we know exactly how long we must wait, so both regular
// and randomized delays are the same.
Copy link
Contributor

@shollyman shollyman Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there still be variance for the randomized delay? result.retryDelay + jitter? Looks like the previous impl didn't jitter either so likely can be ignored if its not been a source of issues.

Copy link
Contributor Author

@esert-g esert-g Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed in this case.

@esert-g esert-g force-pushed the retry_attempts branch 2 times, most recently from c8c2e70 to e6f0db2 Compare Aug 24, 2021
Handle certain RESOURCE_EXHAUSTED errors and report the retry attempts.
@shollyman shollyman added the automerge label Aug 24, 2021
@shollyman shollyman changed the title Retry certain RESOURCE_EXHAUSTED errors observed during ReadRows and report retry attempts feat: retry certain RESOURCE_EXHAUSTED errors observed during ReadRows and report retry attempts Aug 24, 2021
@gcf-merge-on-green gcf-merge-on-green bot merged commit d56e1ca into googleapis:master Aug 24, 2021
16 checks passed
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge label Aug 24, 2021
gcf-merge-on-green bot pushed a commit that referenced this issue Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants