Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make QueryJob.done() method more performant #544

Merged
merged 1 commit into from Mar 10, 2021

Conversation

@plamut
Copy link
Contributor

@plamut plamut commented Mar 9, 2021

Fixes #534.

This PR removes refreshing query results from the QueryJob.done() method, the latter is now just the done() method inherited from the _AsyncJob base class that at most reloads the job itself and checks if its state is DONE.

Since blocking poll from the PollingFuture base class repeatedly invokes done(), the change would cause too many job reload requests while waiting for the query results. The QueryJob class thus overrides the _done_or_raise() method repeatedly used by the blocking poll so that the polling is actually performed by fetching the query results. The latter call can block for up to 10 seconds, meaning that fewer polling requests are made than if reload the job was used.

How to test

Set logging level to DEBUG to see what HTTP requests are made. Then run a query job that normally takes more than 10 seconds to complete, for example:

SELECT
CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
view_count, 1 AS foo
FROM `bigquery-public-data.stackoverflow.posts_questions`"""
"""
ORDER BY view_count DESC

Tip: If running the query in multiple test runs in a row, change 1 AS foo to a different value so that the query is re-run and cached query results are not used)

While the query is running, call query_job.result() - query results should be fetched, but with a reasonable amount of requests.

Beside testing the .result() method, the .done() method should be checked, too - if it is run repeatedly while the query is running, each call should finish "fast", i.e. it should block for significantly less time than 10 seconds, because all it needs to do is to reload the job data itself.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
@plamut plamut marked this pull request as ready for review Mar 10, 2021
@plamut plamut requested a review from as a code owner Mar 10, 2021
@plamut plamut requested review from stephaniewang526 and tswast and removed request for Mar 10, 2021
tswast
tswast approved these changes Mar 10, 2021
Copy link
Contributor

@tswast tswast left a comment

Magnificent!

Loading

@tswast tswast merged commit a3ab9ef into googleapis:master Mar 10, 2021
11 checks passed
Loading
@plamut plamut deleted the iss-534 branch Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants