Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(bigquery): use query_and_wait for better performance on queries of small data with smaller result sets #9418

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 21, 2024

Moves BigQuery over to use query_and_wait to see if that reduces overhead.

Closes #8987.

Depends on #9425.

@cpcloud cpcloud added this to the 9.2 milestone Jun 21, 2024
@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase performance Issues related to ibis's performance labels Jun 21, 2024
@cpcloud cpcloud requested a review from tswast June 21, 2024 10:10
@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jun 21, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jun 21, 2024
@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jun 21, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jun 21, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Jun 21, 2024

This does seem to have helped! The total job duration for both BigQuery jobs was just under 2x faster than main.

@cpcloud cpcloud requested a review from gforsyth June 21, 2024 12:04
ibis/backends/bigquery/__init__.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member Author

cpcloud commented Jun 21, 2024

BigQuery looks good:

cloud in 🌐 falcon in …/ibis on  query-and-wait-bigquery is 📦 v9.1.0 via 🐍 v3.10.14 via ❄️   impure (ibis-3.10.14-env)
❯ pytest -m bigquery -n auto --dist loadgroup
====================================================================== test session starts ======================================================================
platform linux -- Python 3.10.14, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=730130127
rootdir: ...
configfile: pyproject.toml
plugins: benchmark-4.0.0, timeout-2.3.1, xdist-3.6.1, hypothesis-6.103.2, repeat-0.9.3, randomly-3.15.0, clarity-1.0.1, anyio-4.4.0, cov-5.0.0, pytest_httpserver-1.0.10, mock-3.14.0, snapshot-0.9.0
32 workers [1937 items]
x.............x...x......xx......xx.......x.........x....x.....x............x.....x.....s.....x.........x...x..x....x...x..xxxx..x.........xx..x...xx.... [  7%]
.........x........x..x...x..xx.........x...x......xxx.........x...x.......x......x.......x....x......x.x......x..x..x...............x.x.x........x....... [ 15%]
.x..x..x...x.x.x............x...x..x..x.........x..x....x...x....ssssssssssssssssssssss........x.x...x.............x.x.........x......x....x........x..x. [ 23%]
.x........xx.......x...x.x..........x.x.............xx.xxx......x............x............x....x..x...x...x.........xx.xx.x.x.x..x..xxx.xxxxx.xxxx..xxxxx [ 31%]
xxxxxxxxxxxxx.xxxxxxxxxx.xx.xxx..xxxxxxxxxxxx.xxxxxxxx.xxxx.xxxx..xx.xxxx.xxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxx.x.xxx.xsx.xxxxsxxx..x..x.xxx.x.xxxs.xxxxxx...x [ 39%]
.x.....xx........x...x..............x...x..................x...xx....s.............................s.s...............x................................... [ 47%]
..x..s....................x........................................xx........................................................................x.x......... [ 55%]
.....x.....x...x.............................x..............xxx.....xx..................x..............x.......xx.x.x...x......x....xxx.........x........ [ 63%]
...x.x..x....xxx......x............x..x......x..x..........x....x.........................x..x.x.x.x.x...xx.......x.x........x...x.x....xx.........x.x.x. [ 71%]
....x.....x....x........x.............xx.......................................xx.x............................................x...........x...........x. [ 78%]
.....x.............x...........................................................................................................x......................... [ 86%]
.........x..........................................................................................x..x...x...x......................................... [ 94%]
..........x..x.......................................................................................                                                     [100%]
=================================================== 1570 passed, 30 skipped, 337 xfailed in 244.75s (0:04:04) ===================================================

@cpcloud cpcloud merged commit ad1e915 into ibis-project:main Jun 21, 2024
74 checks passed
@cpcloud cpcloud deleted the query-and-wait-bigquery branch June 21, 2024 18:38
@tswast
Copy link
Collaborator

tswast commented Jun 26, 2024

This does seem to have helped! The total job duration for both BigQuery jobs was just under 2x faster than main.

Woohoo! Thanks for seeing this through @cpcloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to ibis's performance refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci(bigquery): bigquery ci is very slow
3 participants