BigQuery: raise a custom exception if `400 BadRequest` is encountered due to "internal error during execution" #23

bencaine1 · 2019-02-12T15:10:47Z

OS: Linux dc32b7e8763a 4.9.0-6-amd64 googleapis/google-cloud-python#1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux
Python version: Python 2.7.6
google-cloud-bigquery: 1.8.0

We're getting flaky 400 BadRequest errors on our query jobs. We've been seeing this issue for a while on and off, but last night starting at around 7pm we saw a spike in these failures.

These errors are not caught by the default Retry objects because 400 usually signifies a malformed query or a missing table, rather than a transient error.

A fix might be to add a clause catching 400s with this exact error message to _should_retry at https://github.com/googleapis/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/retry.py#L30 and/or RETRY_PREDICATE at https://github.com/googleapis/google-cloud-python/blob/master/api_core/google/api_core/future/polling.py#L32.

Code example

from google.api_core.future import polling
from google.cloud.bigquery import retry as bq_retry

query_job = self.gclient.query(query, job_config=config, retry=bq_retry.DEFAULT_RETRY.with_deadline(max_wait_secs))
query_job._retry = polling.DEFAULT_RETRY.with_deadline(max_wait_secs)
return query_job.result(timeout=max_wait_secs)

Stack trace

One example:

  File "/opt/conda/lib/python2.7/site-packages/verily/bigquery_wrapper/bq.py", line 108, in _wait_for_job
    return query_job.result(timeout=max_wait_secs)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2762, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 703, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 122, in result
    self._blocking_poll(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2736, in _blocking_poll
    super(QueryJob, self)._blocking_poll(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 101, in _blocking_poll
    retry_(self._done_or_raise)()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 80, in _done_or_raise
    if not self.done():
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2723, in done
    location=self.location,
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 672, in _get_query_results
    retry, method="GET", path=path, query_params=extra_params
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 382, in _call_api
    return call()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/_http.py", line 319, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 GET https://www.googleapis.com/bigquery/v2/projects/packard-campbell-synth/queries/9bcea2cb-1747-4a1e-9ac8-e1de40f00d08?timeoutMs=10000&location=US&maxResults=0: The job encountered an internal error during execution and was unable to complete successfully.

The text was updated successfully, but these errors were encountered:

tseaver · 2019-02-12T17:33:00Z

@tswast I'm pretty sure that this is a back-end issue: the API shouldn't be returning '400 Bad Request' for internal server errors. Can you confirm?

tswast · 2019-02-12T19:18:39Z

@shollyman Is this related to another BigQuery backend rollout?

tswast · 2019-02-12T19:19:23Z

I agree that 400 Bad Request is the wrong response code for this error.

tswast · 2019-02-12T20:57:50Z

I've filed bug 124319762 internally to track this issue. I see several similar reports internally, so this is likely not new behavior.

@bencaine1 If you have a support plan, I recommend filing a ticket with them to raise the priority of this issue on the BigQuery backend.

tseaver · 2019-02-12T21:04:51Z

@tswast Do you want to leave this issue open (i.e., do you imagine we will be making changes here to work around the 400?).

tswast · 2019-02-12T21:55:02Z

Let's close this. The client workaround would be to look for certain text in the response body and ignore the response code, which I'd prefer not to do if we can avoid it.

barrywhart · 2019-11-22T18:46:25Z

I saw this error again today. Has the underlying BigQuery issue been fixed? Is there another new issue with the same symptom?

If this bug continues to recur with various BigQuery bugs, I think there is (sadly) a case for having the client retry, because otherwise the non-Google customer application becomes responsible for retrying. That seems ... worse.

tswast · 2019-11-22T18:55:21Z

Backend issue was closed as infeasible. Backend engineers say:

Connection error is the only retryable error returned as a job result, so you only need to add the logic to retry jobs.insert() if jobs.getQueryResult() returns 400 and the error reason set to the "jobBackendError" (which means that the job failed with the connection error), but I don't think you can reuse the job id for this case.

Since the job ID cannot be reused, this error is one that requires the whole job to be retried from the beginning. I think it's reasonable to do this, though will likely a bit difficult to do, as the failure won't be discovered until .result() is called.

pietrodn · 2020-11-27T07:43:27Z

If the error reason is "jobBackendError", it should be definitely be included in the BigQuery error table, so that developers can deal with it appropriately.

pietrodn · 2020-11-27T07:59:43Z

If the job ID is not reusable, and it is not possible to retry the job from within the library, google-cloud-bigquery could catch this BadRequest exception and re-raise is as an InternalServerError exception, so that the application code using the library can easily retry the whole class of server errors transparently.

HemangChothani · 2020-11-27T08:19:03Z

it is not possible to retry the job from within the library

@pietrodn It is possible to retry the job from within the library, for that need to use client.create_job method which creates a new 'job ID' on every retry.

python-bigquery/google/cloud/bigquery/client.py

Line 1594 in 5a422eb

def create_job(self, job_config, retry=DEFAULT_RETRY, timeout=None):

tswast · 2020-12-01T15:55:22Z

google-cloud-bigquery could catch this BadRequest exception and re-raise is as an InternalServerError exception,

I think this is a reasonable feature request.

tswast · 2020-12-02T12:06:51Z

Some requirements for a custom exception:

Backwards compatible -- inherit from the Google API error base class
Preserves stacktrace -- use the exception wrapping mechanism to preserve all the context from the original exception.
Clear that it's a BigQuery-related custom exception from the name. Example: BigQueryServerError.
The message of this exception includes a code snippet on how to recreate the query job (to retry it) with Client.create_job.

chalmerlowe · 2022-11-02T16:59:58Z

At this point, going to close this item as "Will not fix" due to competing priorities.

tswast closed this as completed Feb 12, 2019

tswast reopened this Nov 22, 2019

tswast changed the title ~~BigQuery: flaky 400 BadRequest errors~~ BigQuery: retry queries from the beginning if 400 BadRequest is encountered due to "internal error during execution" Nov 22, 2019

plamut transferred this issue from googleapis/google-cloud-python Feb 4, 2020

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 4, 2020

plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 4, 2020

tswast changed the title ~~BigQuery: retry queries from the beginning if 400 BadRequest is encountered due to "internal error during execution"~~ BigQuery: raise a custom exception if 400 BadRequest is encountered due to "internal error during execution" Dec 1, 2020

shollyman added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Aug 29, 2022

chalmerlowe added the wontfix This will not be worked on label Nov 2, 2022

chalmerlowe closed this as completed Nov 2, 2022

feldjay mentioned this issue Aug 13, 2024

feat(ingest/bigquery): Add query job retries for transient errors datahub-project/datahub#11162

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: raise a custom exception if `400 BadRequest` is encountered due to "internal error during execution" #23

BigQuery: raise a custom exception if `400 BadRequest` is encountered due to "internal error during execution" #23

bencaine1 commented Feb 12, 2019

tseaver commented Feb 12, 2019

tswast commented Feb 12, 2019

tswast commented Feb 12, 2019

tswast commented Feb 12, 2019

tseaver commented Feb 12, 2019

tswast commented Feb 12, 2019

barrywhart commented Nov 22, 2019

tswast commented Nov 22, 2019

pietrodn commented Nov 27, 2020

pietrodn commented Nov 27, 2020

HemangChothani commented Nov 27, 2020

tswast commented Dec 1, 2020

tswast commented Dec 2, 2020 •

edited

Loading

chalmerlowe commented Nov 2, 2022

BigQuery: raise a custom exception if 400 BadRequest is encountered due to "internal error during execution" #23

BigQuery: raise a custom exception if 400 BadRequest is encountered due to "internal error during execution" #23

Comments

bencaine1 commented Feb 12, 2019

Code example

Stack trace

tseaver commented Feb 12, 2019

tswast commented Feb 12, 2019

tswast commented Feb 12, 2019

tswast commented Feb 12, 2019

tseaver commented Feb 12, 2019

tswast commented Feb 12, 2019

barrywhart commented Nov 22, 2019

tswast commented Nov 22, 2019

pietrodn commented Nov 27, 2020

pietrodn commented Nov 27, 2020

HemangChothani commented Nov 27, 2020

tswast commented Dec 1, 2020

tswast commented Dec 2, 2020 • edited Loading

chalmerlowe commented Nov 2, 2022

BigQuery: raise a custom exception if `400 BadRequest` is encountered due to "internal error during execution" #23

BigQuery: raise a custom exception if `400 BadRequest` is encountered due to "internal error during execution" #23

tswast commented Dec 2, 2020 •

edited

Loading