feat: add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` #575

plamut · 2021-03-24T16:21:25Z

Closes #561.

This PR limits the size of the internal queue that stores result pages when streaming data over the BQ Storage API. It also makes the limit configurable.

Still need to add a few additional unit tests, but that should be it.

Note:
The new parameter is not exposed to the bigquery Jupyter cell magic - I presume that's fine? I don't think cell magic needs such fine-grained control, since it's not really meant to fetch huge query results into a Jupyter notebook session where any performance difference could actually matter.

PR checklist:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

The new parameter allows configuring the maximum size of the internal queue used to hold result pages when query data is streamed over the BigQuery Storage API.

tswast · 2021-03-30T15:12:04Z

Looks like some tests are timing out now. I suspect that 1 is not the right default

tswast · 2021-03-30T15:15:01Z

How about we only add the argument to to_dataframe_iterable, as that is where it is most relevant. I think None or maybe = to number of workers is probably the right default.

In the other methods we are expected to download the whole table/query results at once anyway, so conserving memory isn't as important.

plamut · 2021-03-30T16:59:26Z

I'm fine with that, I'll remove the parameter from other methods where it's expected that query results are downloaded in full. Will also check the timeouts and what a better default could be.

plamut · 2021-03-31T20:47:43Z

google/cloud/bigquery/table.py

+                By default, the max queue size is set to the number of BQ Storage streams
+                created by the server. If ``max_queue_size`` is :data:`None`, the queue
+                size is infinite.


Just in case somebody really wants the old behavior, I added it as an option.

plamut · 2021-04-14T14:03:46Z

@tswast ping :)

tswast

Wonderful!

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 24, 2021

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 24, 2021

plamut force-pushed the iss-561 branch 2 times, most recently from 30ffe85 to 680f952 Compare March 24, 2021 16:36

plamut requested a review from tswast March 24, 2021 16:36

plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 24, 2021

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 24, 2021

plamut changed the title ~~feat: add configurable max size for the queue holding the result pages streamed over the BQ Stroage API~~ feat: add configurable max size for the queue holding the result pages streamed over the BQ Storage API Mar 29, 2021

plamut added 2 commits March 29, 2021 15:51

feat: add max_queue_size option for BQ Storage API

05cd336

The new parameter allows configuring the maximum size of the internal queue used to hold result pages when query data is streamed over the BigQuery Storage API.

Slightly simplify bits of page streaming logic

70553bc

plamut force-pushed the iss-561 branch from 680f952 to 70553bc Compare March 29, 2021 13:52

plamut marked this pull request as ready for review March 29, 2021 13:52

plamut requested a review from a team March 29, 2021 13:52

plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 30, 2021

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 30, 2021

plamut added 2 commits March 31, 2021 20:28

Only retain max_queue_size where most relevant

19215dc

Adjust tests, add support for infinite queue size

85929ed

plamut commented Mar 31, 2021

View reviewed changes

Remove deleted param's description

489fa10

tswast approved these changes Apr 14, 2021

View reviewed changes

tswast merged commit f95f415 into googleapis:master Apr 14, 2021

tswast changed the title ~~feat: add configurable max size for the queue holding the result pages streamed over the BQ Storage API~~ feat: add max_queue_size argument to RowIterator.to_dataframe_iterable Apr 14, 2021

plamut deleted the iss-561 branch April 14, 2021 21:13

release-please bot mentioned this pull request Jan 4, 2022

chore(main): release python-bigquery 1.27.1 #1097

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` #575

feat: add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` #575

plamut commented Mar 24, 2021 •

edited

Loading

tswast commented Mar 30, 2021

tswast commented Mar 30, 2021

plamut commented Mar 30, 2021

plamut Mar 31, 2021

plamut commented Apr 14, 2021

tswast left a comment

feat: add max_queue_size argument to RowIterator.to_dataframe_iterable #575

feat: add max_queue_size argument to RowIterator.to_dataframe_iterable #575

Conversation

plamut commented Mar 24, 2021 • edited Loading

tswast commented Mar 30, 2021

tswast commented Mar 30, 2021

plamut commented Mar 30, 2021

plamut Mar 31, 2021

Choose a reason for hiding this comment

plamut commented Apr 14, 2021

tswast left a comment

Choose a reason for hiding this comment

feat: add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` #575

feat: add `max_queue_size` argument to `RowIterator.to_dataframe_iterable` #575

plamut commented Mar 24, 2021 •

edited

Loading