Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for 'Allow Large Results' to BigQuery connector #15

Closed
jreback opened this issue Feb 26, 2017 · 7 comments
Closed
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

xref pandas-dev/pandas#10474

gbq.py currently returns an error if the result of a query is what Google considers to be 'Large'. The google api allows jobs to be sent with a flag to allow large results. It would be very beneficial to provide this as an option in the BigQuery connector.

@jreback jreback added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 26, 2017
@parthea parthea self-assigned this Mar 6, 2017
@jreback jreback modified the milestone: 0.3.0 Mar 11, 2017
@tswast
Copy link
Collaborator

tswast commented Jun 12, 2017

People have posted some pretty elaborate workarounds on StackOverflow. https://stackoverflow.com/questions/34201923/python-bigquery-allowlargeresults-with-pandas-io-gbq/34203369

@jasonqng
Copy link
Contributor

jasonqng commented Oct 6, 2017

Can now do this in open PR by passing it via configuration setting: #25

read_gbq(sql, configuration={"allow_large_results":True})

This uses new google-cloud-python api.

@jreback jreback removed this from the 0.3.0 milestone Nov 25, 2017
@yahyamortassim
Copy link

@jasonqng you also have to add destinationTable.

@tswast
Copy link
Collaborator

tswast commented Nov 30, 2017

Actually, I think the current API does support this, even without #25. The read_gbq function accepts a configuration keyword argument which is a job configuration resource, so to allow large results one would do either

Standard SQL:

pd.read_gbq(
    query,
    'my-project',
    dialect='standard',
    configuration={
        'query': {
            'destinationTable': {
                'projectId': 'my-project',
                'datasetId': 'mydataset',
                'tableId': 'mytable'
             }
        }
    })

Legacy SQL:

pd.read_gbq(
    query,
    'my-project',
    dialect='standard',
    configuration={
        'query': {
            'allowLargeResults': True,
            'destinationTable': {
                'projectId': 'my-project',
                'datasetId': 'mydataset',
                'tableId': 'mytable'
             }
        }
    })

Admittedly this is a bit onerous to do. We may wish to provide a friendlier interface for options such as these.

@Gitman-code
Copy link

The updated answer on stack overflow suggests just using dialect='standard' like tswast did but more simply as
pd.read_gbq(query, 'my-super-project', dialect='standard')

and notes AllowLargeResults: For standard SQL queries, this flag is ignored and large results are always allowed. This worked for me but maybe it is not generic.

@tswast
Copy link
Collaborator

tswast commented Nov 30, 2017

I'm glad that worked for you. I believe there may be some size threshold where a destination table is required, even with standard SQL, but perhaps the threshold is larger than it was for legacy SQL.

@tswast tswast closed this as completed Feb 12, 2018
@tswast
Copy link
Collaborator

tswast commented Feb 12, 2018

Closing, as this can be passed in via the configuration argument to read_gbq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

6 participants