Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implemented BigQueryRetryAlgorithm to retry on the basis of the configured re-triable error messages #1426

Merged
merged 38 commits into from Jul 14, 2021

Conversation

prash-mi
Copy link
Contributor

@prash-mi prash-mi commented Jul 3, 2021

Implemented com.google.cloud.bigquery.BigQueryRetryAlgorithm which usescom.google.cloud.bigquery.BigQueryRetryConfig to retry on the basis of the configured re-triable error messages.

Modified BigQueryImpl.getQueryResults to consume BigQueryRetryHelper.runWithRetries to retry using BigQueryRetryAlgorithm

Fixes #1250 ☕️

By default we retry on the following error:

{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "location" : "q",
    "locationType" : "parameter",
    "message" : "Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors at [5:1]",
    "reason" : "invalidQuery"
  } ],
  "message" : "Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors at [5:1]",
  "status" : "INVALID_ARGUMENT"
}

@prash-mi prash-mi requested a review from as a code owner Jul 3, 2021
@prash-mi prash-mi requested a review from stephaniewang526 Jul 3, 2021
@product-auto-label product-auto-label bot added the api: bigquery label Jul 3, 2021
@google-cla google-cla bot added the cla: yes label Jul 3, 2021
@@ -1362,7 +1366,7 @@ private static QueryResponse getQueryResults(
: jobId.getLocation());
try {
GetQueryResultsResponse results =
runWithRetries(
BigQueryRetryHelper.runWithRetries(
Copy link
Member

@stephaniewang526 stephaniewang526 Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this also to TableResult query(...) methods? That should also allow you to verify idempotency using requestId for jobs.query API.

Copy link
Contributor Author

@prash-mi prash-mi Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have added BigQueryRetryHelper.runWithRetries for TableResult query(...) and have implemented testcase testFastQueryRateLimitIdempotency to test the idempotency

Copy link
Member

@stephaniewang526 stephaniewang526 Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great but I think we need it for the jobs.insert endpoint too (line 1269). There are 2 query paths here. The one you've updated is the "fast" query path (jobs.query).

Copy link
Contributor Author

@prash-mi prash-mi Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have implemented BigQueryRetryHelper.runWithRetries on QueryResponse waitForQueryResults method, which is used by TableResult getQueryResults method

@@ -237,6 +237,10 @@
}

private final BigQueryRpc bigQueryRpc;
private static final BigQueryRetryConfig DEFAULT_RATE_LIMIT_EXCEEDED_RETRY_CONFIG =
Copy link
Member

@stephaniewang526 stephaniewang526 Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just name this DEFAULT_RETRY_CONFIG? It currently has ratelimitexceeded and we can potentially add more in the future.

Copy link
Contributor Author

@prash-mi prash-mi Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated it

@@ -237,6 +237,10 @@
}

private final BigQueryRpc bigQueryRpc;
private static final BigQueryRetryConfig DEFAULT_RATE_LIMIT_EXCEEDED_RETRY_CONFIG =
BigQueryRetryConfig.newBuilder()
.retryOnMessage(BigQueryErrorMessages.RATE_LIMIT_EXCEEDED_MSG)
Copy link
Member

@stephaniewang526 stephaniewang526 Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a little extraneous -- why do we need to specify every message we need to retry on? If we build a config, I imagine we just retry on all the specified error messages in the config.

Copy link
Contributor Author

@prash-mi prash-mi Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Stephanie. As discussed, I have tried to create it on the lines of com.google.cloud.ExceptionHandler and this may give us additional flexibility to configure hooks like retryOnStatus or retryOnReason later as necessary. Happy to refactor it as needed.

@@ -1362,7 +1366,7 @@ private static QueryResponse getQueryResults(
: jobId.getLocation());
try {
GetQueryResultsResponse results =
runWithRetries(
BigQueryRetryHelper.runWithRetries(
Copy link
Member

@stephaniewang526 stephaniewang526 Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great but I think we need it for the jobs.insert endpoint too (line 1269). There are 2 query paths here. The one you've updated is the "fast" query path (jobs.query).

@stephaniewang526
Copy link
Member

@stephaniewang526 stephaniewang526 commented Jul 13, 2021

@prash-mi I tried running the ITs a couple times and it continues to fail here:

java.lang.AssertionError: expected null, but was:<[gs://bigquery-prod-upload-us/prod-scotty-b30183f1-0dcf-4712-83aa-d9d9ac06a899]>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotNull(Assert.java:756)
	at org.junit.Assert.assertNull(Assert.java:738)
	at org.junit.Assert.assertNull(Assert.java:748)
	at com.google.cloud.bigquery.it.ITBigQueryTest.testInsertFromFile(ITBigQueryTest.java:3121)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)

Could you try running locally to see this happens?

@stephaniewang526 stephaniewang526 merged commit 44d9795 into googleapis:master Jul 14, 2021
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants