Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to_dataframe deterministicly update progress bar. #8303

Merged
merged 5 commits into from Jun 18, 2019

Conversation

@tswast
Copy link
Contributor

@tswast tswast commented Jun 13, 2019

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, put_nowait was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for download_dataframe_bqstorage and
download_dataframe_tabledata_list have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
_pandas_helpers module.

Closes #8175

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, `put_nowait` was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for `download_dataframe_bqstorage` and
`download_dataframe_tabledata_list` have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
`_pandas_helpers` module.
@tswast tswast requested a review from as a code owner Jun 13, 2019
@tswast tswast requested a review from tseaver Jun 13, 2019
@tswast tswast requested review from plamut and shollyman Jun 13, 2019
Copy link
Contributor

@plamut plamut left a comment

The changes generally seem fine to me, but I did leave a few comments that might be relevant - please check, just in case.

plamut
plamut approved these changes Jun 17, 2019
Copy link
Contributor

@plamut plamut left a comment

Looks good, and thanks for the additional explanation of the design decisions!

Will wait if the other reviewers have something else to add.

@tswast tswast merged commit fd24b4b into googleapis:master Jun 18, 2019
50 checks passed
@tswast tswast deleted the issue8175-bqstorage-flake branch Jun 18, 2019
# prevents the queue from filling up, because the main thread
# has smaller gaps in time between calls to the queue's get
# method. For a detailed explaination, see:
# https://friendliness.dev/2019/06/18/python-nowait/
Copy link
Contributor

@plamut plamut Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. 👍

@googleapis googleapis deleted a comment Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants