Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigQuery, BigQuery Storage]: Add option to use BigQuery Storage API to download results in BigQuery DB-API #16

Closed
tswast opened this issue Oct 14, 2019 · 1 comment · Fixed by #36
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Oct 14, 2019

Is your feature request related to a problem? Please describe.

In projects like superset, that use the SQL Alchemy connector, it can be quite slow to download large query results. The BigQuery Storage API speeds this up for to_dataframe / pandas, but not when used via the DB-API / SQL Alchemy. See: googleapis/python-bigquery-sqlalchemy#41

Describe the solution you'd like

When creating a DB-API Connection, provide a way to supply a BQ Storage client, in addition to a BQ client. Use this client to download results for the relevant methods in the Cursor object.

Describe alternatives you've considered

Could have a use_bqstorage_api option, but this would be inconsistent with the current constructor, which expects a client.

/cc @yiga2

@tswast tswast self-assigned this Oct 14, 2019
@tswast tswast assigned plamut and unassigned tswast Jan 8, 2020
@tswast
Copy link
Contributor Author

tswast commented Jan 8, 2020

The proposal here is to add a bqstorage_client argument to the Connection class:

https://github.com/googleapis/google-cloud-python/blob/b387134827dbc3be0e1b431201e0875798002fda/bigquery/google/cloud/bigquery/dbapi/connection.py#L21-L26

Instead of always calling list_rows to fetch the query results

https://github.com/googleapis/google-cloud-python/blob/b387134827dbc3be0e1b431201e0875798002fda/bigquery/google/cloud/bigquery/dbapi/cursor.py#L215

attempt to call the BQ Storage API first. This can be modelled after to_dataframe, where an iterable over rows is abstracted away.

https://github.com/googleapis/google-cloud-python/blob/b387134827dbc3be0e1b431201e0875798002fda/bigquery/google/cloud/bigquery/table.py#L1407-L1436

Just as with to_dataframe, there will be cases where the BQ Storage API will fail, but tabledata.list will succeed, so we'll need to fallback to tabledata.list even when a BQ Storage API client is available.

Changes from to_dataframe logic:

@plamut plamut transferred this issue from googleapis/google-cloud-python Feb 4, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 4, 2020
@plamut plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants