Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor `to_dataframe` deterministicly update progress bar. #8303

Merged
merged 5 commits into from Jun 18, 2019

Conversation

@tswast
Copy link
Contributor

tswast commented Jun 13, 2019

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, put_nowait was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for download_dataframe_bqstorage and
download_dataframe_tabledata_list have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
_pandas_helpers module.

Closes #8175

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, `put_nowait` was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for `download_dataframe_bqstorage` and
`download_dataframe_tabledata_list` have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
`_pandas_helpers` module.
@tswast tswast requested a review from googleapis/api-bigquery as a code owner Jun 13, 2019
@googlebot googlebot added the cla: yes label Jun 13, 2019
@tswast tswast requested a review from tseaver Jun 13, 2019
@tswast tswast requested review from plamut and shollyman Jun 13, 2019
Copy link
Contributor

plamut left a comment

The changes generally seem fine to me, but I did leave a few comments that might be relevant - please check, just in case.

@plamut
plamut approved these changes Jun 17, 2019
Copy link
Contributor

plamut left a comment

Looks good, and thanks for the additional explanation of the design decisions!

Will wait if the other reviewers have something else to add.

tswast added 2 commits Jun 18, 2019
…ge-flake
@tswast tswast merged commit fd24b4b into googleapis:master Jun 18, 2019
50 checks passed
50 checks passed
Kokoro - API Core Build successful
Details
Kokoro - Asset Build successful
Details
Kokoro - AutoML Build successful
Details
Kokoro - BigQuery Build successful
Details
Kokoro - BigQuery Data Transfer Build successful
Details
Kokoro - BigQuery Storage Build successful
Details
Kokoro - Bigtable Build successful
Details
Kokoro - Container Build successful
Details
Kokoro - Container Analysis Build successful
Details
Kokoro - Core Build successful
Details
Kokoro - DLP Build successful
Details
Kokoro - DNS Build successful
Details
Kokoro - Data Catalog Build successful
Details
Kokoro - Data Labeling Build successful
Details
Kokoro - Dataproc Build successful
Details
Kokoro - Datastore Build successful
Details
Kokoro - Error Reporting Build successful
Details
Kokoro - Firestore Build successful
Details
Kokoro - Grafeas Build successful
Details
Kokoro - IAM Build successful
Details
Kokoro - IRM Build successful
Details
Kokoro - IoT Build successful
Details
Kokoro - KMS Build successful
Details
Kokoro - Logging Build successful
Details
Kokoro - Monitoring Build successful
Details
Kokoro - Natural Language Build successful
Details
Kokoro - OS Login Build successful
Details
Kokoro - Phishing Protection Build successful
Details
Kokoro - Pub/Sub Build successful
Details
Kokoro - Redis Build successful
Details
Kokoro - Resource Manager Build successful
Details
Kokoro - Runtime Configuration Build successful
Details
Kokoro - Scheduler Build successful
Details
Kokoro - Security Center Build successful
Details
Kokoro - Security Scanner Build successful
Details
Kokoro - Spanner Build successful
Details
Kokoro - Speech Build successful
Details
Kokoro - Storage Build successful
Details
Kokoro - Talent Build successful
Details
Kokoro - Tasks Build successful
Details
Kokoro - Text-to-Speech Build successful
Details
Kokoro - Trace Build successful
Details
Kokoro - Translation Build successful
Details
Kokoro - Video Intelligence Build successful
Details
Kokoro - Vision Build successful
Details
Kokoro - Web Risk Build successful
Details
ci/circleci Your tests passed on CircleCI!
Details
cla/google All necessary CLAs are signed
security/snyk - appveyor/requirements.txt (googleapis) No manifest changes detected
security/snyk - docs/requirements.txt (googleapis) No manifest changes detected
@tswast tswast deleted the tswast:issue8175-bqstorage-flake branch Jun 18, 2019
# prevents the queue from filling up, because the main thread
# has smaller gaps in time between calls to the queue's get
# method. For a detailed explaination, see:
# https://friendliness.dev/2019/06/18/python-nowait/

This comment has been minimized.

Copy link
@plamut

plamut Jun 19, 2019

Contributor

This is great. 👍

@googleapis googleapis deleted a comment Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.