# BigQuery - Using Asynchronous APIs

This notebook demonstrates how to use asynchronous versions of the pygcp.bigquery APIs from within a notebook.

### In this notebook you will
* Learn about the Query and Table APIs that have \*_async versions
* Learn how to use these APIs to return quickly so you can continue to do other work
* Learn how to monitor the state of background async tasks and know when they are complete

Related Links:

* [BigQuery](https://cloud.google.com/bigquery/)

----

NOTE:

* If you're new to notebooks, or want need an introduction to using BigQuery, check out the full [list](..) of notebooks.


In [1]:
import gcp.bigquery as bq

Many if the APIs that exist on Query and Table objects in the gcp.bigquery library have async forms.

These include:

* Query.extract
* Query.to_file
* Query.execute
* Table.load
* Table.extract
* Table.to_file
* View.execute

In each case, the signature is exactly the same; the only difference is that the async versions have an \_async suffix on the method name and return Job objects.

For example, the code below will attempt to extract the first 1000 rows of the natality sample table to a temporary file:



In [2]:
t = bq.Table('publicdata:samples.natality')
j = t.sample(count=1000).to_file_async('/tmp/natality1000.csv')
j

Job 27b548fa-6fc0-446a-bda3-64809a606402 in progress

Notice how the extract\_async method returned a job with a GUID ID and a status. For a correct job on a very fast 
machine you may see 'completed' for the job status, but more likely you see 'in progress'.

You can always check on the state of a job object by calling its is\_complete method:

In [3]:
j.is_complete

False

To wait until a job completes we can call wait():

In [4]:
j.wait()

Job 27b548fa-6fc0-446a-bda3-64809a606402 completed

Once it is complete, the fatal\_error property will tell us if a job failed, while the errors property will inform us of any non-fatal errors that may have occurred:

In [5]:
print "Fatal: %s" % str(j.fatal_error)
print "Non-fatal: %s" % str(j.errors)

Fatal: None
Non-fatal: None


Similarly, we can call the Job.failed method to test for success:

In [6]:
j.failed

False


To see what happens with a failing job we can try a similar operation but with an extract using an invalid GCS name:

In [7]:
try:
  t = bq.Table('publicdata:samples.natality')
  
  # Note 'natality:invalid.csv' is an invalid name; specifically the ':' in the name.
  j = t.sample(count=1000).extract_async('natality:invalid.csv')
except Exception as e:
  print "%s: %s" % (type(e).__name__, e)

JobError: Invalid extract destination URI 'natality:invalid.csv'. Must be a valid Google Storage path.


Notice how in this case we got a JobError exception.

There are two useful utility functions in gcp.bigquery available for working with jobs, namely wait_any and wait_all. Each of these can take a reference to a job, or a list of jobs, plus an optional timeout. wait_any will return when at least one job has completed (or a timeout happens) while wait_all will return when all jobs have completed (or the timeout happens). In each case the return value is the list of complete jobs. We can illustrate this with the following code:

In [8]:
q1 = bq.Query('SELECT * FROM [publicdata:samples.natality] LIMIT 200')
q2 = bq.Query('SELECT * FROM [publicdata:samples.natality] LIMIT 2000')
j1 = q1.execute_async(use_cache=False)
j2 = q2.execute_async(use_cache=False)

while True:
    completed = bq.wait_any([j1, j2])
    print str(completed)
    if len(completed) == 2:
        break


[Job job__TzbEl2Sw8dCUZxNN3G_ECAyJxU completed, Job job_Hf4F_hqPehit_ATvV55P-i4W5Iw completed]
