/ python-bigquery Public
feat: make QueryJob.done() method more performant #544
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge.
This PR removes refreshing query results from the
QueryJob.done()method, the latter is now just the
done()method inherited from the
_AsyncJobbase class that at most reloads the job itself and checks if its state is DONE.
Since blocking poll from the
PollingFuturebase class repeatedly invokes
done(), the change would cause too many job reload requests while waiting for the query results. The
QueryJobclass thus overrides the
_done_or_raise()method repeatedly used by the blocking poll so that the polling is actually performed by fetching the query results. The latter call can block for up to 10 seconds, meaning that fewer polling requests are made than if reload the job was used.
How to test
Set logging level to DEBUG to see what HTTP requests are made. Then run a query job that normally takes more than 10 seconds to complete, for example:
While the query is running, call
query_job.result()- query results should be fetched, but with a reasonable amount of requests.
Beside testing the
.done()method should be checked, too - if it is run repeatedly while the query is running, each call should finish "fast", i.e. it should block for significantly less time than 10 seconds, because all it needs to do is to reload the job data itself.