Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: client.create_table raises Conflict instead of AlreadyExists #123

Closed
jpuig-mind opened this issue Jun 4, 2020 · 11 comments · Fixed by #171
Closed

BigQuery: client.create_table raises Conflict instead of AlreadyExists #123

jpuig-mind opened this issue Jun 4, 2020 · 11 comments · Fixed by #171
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: docs Improvement to the documentation for an API.

Comments

@jpuig-mind
Copy link

Environment details

  1. API: BigQuery

  2. OS type and version:

    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 20.04 LTS
    Release:        20.04
    Codename:       focal
  3. Python version and virtual environment information:

    $ python3 --version
    Python 3.8.2
  4. google-cloud-bigquery version:

    $ pip show google-cloud-bigquery
    Name: google-cloud-bigquery
    Version: 1.24.0
    Summary: Google BigQuery API client library
    Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
    Author: Google LLC
    Author-email: googleapis-packages@google.com
    License: Apache 2.0
    Location: /home/REDACTED/.local/lib/python3.8/site-packages
    Requires: protobuf, google-api-core, google-cloud-core, google-auth, google-resumable-media, six
    Required-by:

Steps to reproduce

  1. Have or create a table in BigQuery
  2. Try to create it again using client.create_table()
  3. Confirm that the raised expression is google.api_core.exceptions.Conflict
    • One would expect it to be google.api_core.exceptions.AlreadyExists
    • Checked client.delete_table() with a non-existent table and the result is google.api_core.exceptions.NotFound

Code example

Slightly modified version of https://cloud.google.com/bigquery/docs/tables#python

from google.cloud import bigquery
from google.api_core.exceptions import AlreadyExists, Conflict

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
table_id = "your-project.your_dataset.your_table_name"

table = bigquery.Table(table_id)
table = client.create_table(table)  # Make an API request.
try:
    table = client.create_table(table)  # Make an API request. Second time.
except AlreadyExists:
    print("Caught google.api_core.exceptions.AlreadyExists")
except Conflict:
    print("Caught google.api_core.exceptions.Conflict")

table = client.create_table(table)  # Make an API request. Third time for Stack Trace

Stack trace

$ /usr/bin/python3 REDACTED/example.py
Caught google.api_core.exceptions.Conflict
Traceback (most recent call last):
  File "REDACTED/example.py", line 19, in <module>
    table = client.create_table(table)  # Make an API request. Third time for Stack Trace
  File "/home/REDACTED/.local/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 543, in create_table
    api_response = self._call_api(
  File "/home/REDACTED/.local/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 556, in _call_api
    return call()
  File "/home/REDACTED/.local/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/home/REDACTED/.local/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/home/REDACTED/.local/lib/python3.8/site-packages/google/cloud/_http.py", line 423, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Conflict: 409 POST https://bigquery.googleapis.com/bigquery/v2/projects/REDACTED/datasets/REDACTED/tables: Already Exists: Table REDACTED:REDACTED.your_table_name
@skelliest
Copy link

Also, there does not seem to be a way to distinguish between the two types of Conflict errors: AlreadyExists and Aborted. This is problematic because the existence of the table conditional on each error is different. AlreadyExists is supposed to have grpc_status_code set to grpc.StatusCode.ALREADY_EXISTS, but on my machine it is not.

@HemangChothani HemangChothani transferred this issue from googleapis/google-cloud-python Jun 8, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jun 8, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jun 8, 2020
@HemangChothani HemangChothani added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Jun 8, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. and removed triage me I really want to be triaged. labels Jun 8, 2020
@HemangChothani HemangChothani self-assigned this Jun 9, 2020
@HemangChothani
Copy link
Contributor

HemangChothani commented Jun 9, 2020

@jpuig-mind @skelliest Agreed with you statement, it returns http code rather than grpc_status_code because bigquery is rest client and it works over http , not on grpc so it returns http status code.

As you can see below in source code also, to handle exception use google.api_core.exceptions.Conflict.

except google.api_core.exceptions.Conflict:
if not exists_ok:
raise
return self.get_table(table.reference, retry=retry)

@jpuig-mind
Copy link
Author

@HemangChothani Thanks for the clarification.

For clarity then, shouldn't the exception handling be something more like the following?

except google.api_core.exceptions.Conflict as e: 
    if not exists_ok: 
        raise google.api_core.exceptions.AlreadyExists(e.message, e.errors, e.response)
    return self.get_table(table.reference, retry=retry) 

I suggest this solution since it's the BigQuery client that should hold the "if Conflict then AlreadyExists" knowledge, and take the guesswork off the hands of the user.

I apologise if I made any mistake, python is not my first language.

Cheers!

@HemangChothani
Copy link
Contributor

HemangChothani commented Jun 9, 2020

@jpuig-mind I think it's bit difficult to bifurcate AlreadyExists and Aborted errors, if consider your code of snippet then it raise the error from google/cloud/_http.py because status code is not 2XX.

code snippet:

    if not 200 <= response.status_code < 300:             
        raise exceptions.from_http_response(response)

and then another exception raised what you suggest:

    except google.api_core.exceptions.Conflict as e: 
        if not exists_ok: 
            raise google.api_core.exceptions.AlreadyExists(e.message, e.errors, e.response)

I don't think it's good idea, but still i would like to ask @plamut
It would be great if you give any suggestion here?

@HemangChothani
Copy link
Contributor

@plamut any suggestion?

@plamut
Copy link
Contributor

plamut commented Jun 10, 2020

@HemangChothani If I understand correctly, the concern is that we first create an exception from the HTTP response (in api_core), but would then again have to transform the very same exception in BigQuery as proposed above?

Ideally, we should only translate the exception once (in api-core). However, if api-core does not have all information to pick the right exception type in all cases (e.g. exists_ok flag), then it's up to BigQuery to decide which flavor of Conflict error to actually raise.

It would probably also be beneficial to keep the link with the original exception:

except api_core.exceptions.Conflict as exc: 
    if not exists_ok:
        rebranded_error = api_core.exceptions.AlreadyExists(exc.message, exc.errors, exc.response)
        six.raise_from(rebranded_error, exc)

Or did I misunderstand the question?

@HemangChothani
Copy link
Contributor

@plamut Thanks for the help,appreciate it. The main concern is how to distinguish errors between Already Exists and Aborted errors as both inherit Conflict class. I found only one option to compare an exc.message with a static string which might be not appropriate.

@plamut
Copy link
Contributor

plamut commented Jun 11, 2020

@HemangChothani Distinguishing based on a message string is indeed not ideal and 100% future-proof, but if that's our best bet, we might have no other choice, if we want to fix this for the users ... :/

Since it's probably not too much work, I'd say we at least open a PR for better visibility and discuss there whether relying on the error message is still "good enough" for our needs.

@HemangChothani
Copy link
Contributor

@paul1319 Ok, I will open PR soon, thank you.

@HemangChothani
Copy link
Contributor

As per the comment received on PR #131 (comment) i think googler disagree with the solution and suggest to leave as it is as same in other manual libraries which is based on HTTP.

@plamut Could i close the PR and issue?

@plamut
Copy link
Contributor

plamut commented Jul 10, 2020

@HemangChothani It appears so, yes. Let's be consistent with other manual libraries then, and instead explain this behavior in a docstring as mentioned in the same comment.

@plamut plamut added type: docs Improvement to the documentation for an API. and removed type: question Request for information or clarification. Not an issue. labels Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: docs Improvement to the documentation for an API.
Projects
None yet
5 participants