Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas GBQ Integration Tests #309

Closed
alimcmaster1 opened this issue Feb 6, 2020 · 8 comments
Closed

Pandas GBQ Integration Tests #309

alimcmaster1 opened this issue Feb 6, 2020 · 8 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: question Request for information or clarification. Not an issue.

Comments

@alimcmaster1
Copy link

alimcmaster1 commented Feb 6, 2020

Pandas CI -> test_gbq.py can be flakey.

Following API call times out:

address = ('bigquery.googleapis.com', 443), timeout = 60, source_address = None
error = NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f25022abf98>: Failed to establish a new connection: [Errno 110] Connection timed out',)

Called from https://github.com/pydata/pandas-gbq/blob/master/pandas_gbq/gbq.py#L1312

Can we allow to_gbq to set a more generous timeout? Or any other solution here?

Example test failures on our master branch:

e.g : https://travis-ci.org/pandas-dev/pandas/jobs/646554600?utm_medium=notification&utm_source=github_status
and
https://travis-ci.org/pandas-dev/pandas/jobs/646597032?utm_medium=notification&utm_source=github_status

cc @jreback @tswast

@tswast
Copy link
Collaborator

tswast commented Feb 7, 2020

It appears to be create_dataset that is timing out. 60 seconds feels like an unreasonably long time for dataset creation, so this feels like a bad backend release or just a network flake to me.

How often are these errors showing up?

@tswast tswast added the type: question Request for information or clarification. Not an issue. label Feb 7, 2020
@alimcmaster1
Copy link
Author

Thanks @tswast

Much more frequently than normal - several examples today.

Recent example:

https://travis-ci.org/pandas-dev/pandas/jobs/647143875?utm_medium=notification&utm_source=github_status

@tswast
Copy link
Collaborator

tswast commented Feb 7, 2020

I did some digging. googleapis/google-cloud-python#10219 added a default timeout to all requests. It seems the 60 seconds default was not high enough. Talking with some teammates at Google to see what might be causing this and what a more realistic timeout should be.

@alimcmaster1
Copy link
Author

Aah thanks very much for the investigation @tswast - much appreciated! Let us know if there is anything we should do on our side.

@tswast
Copy link
Collaborator

tswast commented Feb 11, 2020

@plamut believes this should be fixed in the latest version of the google-auth library.

@plamut
Copy link

plamut commented Feb 11, 2020

I would try with google-auth>=1.11.0, but it might not fix (all) timeout issues, especially if an unnecessary timeout is raised in a different place (it might actually happen in the BigQuery client itself, too, but would need to investigate a bit deeper to confirm 100%).

@plamut
Copy link

plamut commented Mar 17, 2020

How often can the flakiness be observed, and how easy/difficult is to reproduce it locally?

Also, would it be feasible to set up a CI check that would run the tests with google-auth>=1.11.0 and the current development version of google-cloud-bigquery? Both of these contain a few changes to the timeout-related logic, which might have fixed this already.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
@tswast
Copy link
Collaborator

tswast commented Nov 30, 2021

Marking as a duplicate of #418

@tswast tswast closed this as completed Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants