# BigQuery Magic Commands and API

The examples in this notebook introduce features of [BigQuery Standard SQL](https://cloud.google.com/bigquery/sql-reference/) and [BigQuery SQL Data Manipulation Language (beta)](https://cloud.google.com/bigquery/sql-reference/dml-syntax). BigQuery Standard SQL is compliant with the SQL 2011 standard. You've already seen the use of the magic command `%%bq` in the [Hello BigQuery](Hello BigQuery.ipynb) and [BigQuery Commands](BigQuery Commands.ipynb) notebooks. This command and others in the Datalab API support BigQuery Standard SQL.

This notebook shows some more uses of the BigQuery Python API.

## Using the BigQuery Magic command with Standard SQL

First, we will cover some more uses of the `%%bq` magic command. Let's define a query to work with:

In [None]:
%%bq query --name UniqueNames2013
WITH UniqueNames2013 AS
(SELECT DISTINCT name
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE Year = 2013)
SELECT * FROM UniqueNames2013

Now let's list all available commands to work with `%%bq`

In [None]:
%%bq -h

The `dryrun` argument in ``%%bq`` can be helpful to confirm the syntax of the SQL query. Instead of executing the query, it will only return some statistics:

In [None]:
%%bq dryrun -q UniqueNames2013

Now, let's get a small sample of the results using the `sample` argument in ``%%bq``:

In [None]:
%%bq sample -q UniqueNames2013

Finally, We can use the `execute` argument in %%bq to display the results of our query:

In [None]:
%%bq execute -q UniqueNames2013

## Using Standard SQL with the Datalab BigQuery API

The Cloud Datalab APIs are provided in the `datalab` Python library, and the BigQuery functionality is contained within the `google.datalab.bigquery` module. 

The most important BigQuery-related API is the one that allows you to execute a SQL query. The `google.datalab.bigquery.Query` class provides that functionality. To run a query using BigQuery Standard SQL, create a new `Query` object with the desired SQL string, or use an object that has already been defined by the `%%bq` command. Let's take a look at the object we created before: `UniqueNames2013`.

In [None]:
UniqueNames2013

Now let's see how we can recreate it using the API:

In [None]:
import google.datalab.bigquery as bq

In [None]:
UniqueNames2013 = bq.Query(sql='''
  WITH UniqueNames2013 AS
  (SELECT DISTINCT name
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE Year = 2013)
  SELECT * FROM UniqueNames2013
''')

To execute the query and view a sample from the result table, we will use a Table `QueryOutput` object:

In [None]:
sampling = bq.Sampling.random(percent=2)
job = UniqueNames2013.execute(sampling=sampling)

To run the query and display the entire result set in a table, use the following:

In [None]:
job.result()

Notice every time we run the query above, we get a different set of results, since we chose a random sampling of 2%.

we can also run the query and copy the sampled result into a pandas DataFrame. For that, we use a `QueryOutput` object:

In [None]:
output_options = bq.QueryOutput.dataframe(max_rows=10)
job = UniqueNames2013.execute(output_options=output_options)

In [None]:
job.result()

In [None]:
type(job.result())

# Using Google BigQuery SQL Data Manipulation Language

Below, we will demonstrate how to use Google BigQuery SQL Data Manipulation Language (DML) in Google Cloud Datalab.

## Preparation

First, let's create a sample dataset and table to help demonstrate the features of Google BigQuery DML.

In [None]:
# Create a new dataset (this will be deleted later in the notebook)
sample_dataset = bq.Dataset('sampleDML')
if not sample_dataset.exists():
  sample_dataset.create(friendly_name = 'Sample Dataset for testing DML', description = 'Created from Sample Notebook in Google Cloud Datalab')
  sample_dataset.exists()

In [None]:
# To create a table, we also need to create a schema.
# Its easiest to create a schema from some existing data, so this
# example demonstrates using an example object
fruit_row = {
  'name': 'string value',
  'count': 0
}

sample_table1 = bq.Table("sampleDML.fruit_basket").create(schema = bq.Schema.from_data([fruit_row]), 
                                                          overwrite = True)

## Inserting Data

We can add rows to our newly created `fruit_basket` table by using an `INSERT` statement in our BigQuery Standard SQL query.

In [None]:
%%bq query -n insertFruit
INSERT sampleDML.fruit_basket (name, count)
VALUES('banana', 5),
      ('orange', 10),
      ('apple', 15),
      ('mango', 20)

In [None]:
%bq execute -q insertFruit

You may rewrite the previous query as:

In [None]:
%%bq query -n insertFruit2
INSERT sampleDML.fruit_basket (name, count)
SELECT * 
FROM UNNEST([('peach', 25), ('watermelon', 30)])

In [None]:
%bq execute -q insertFruit2

You can also use a `WITH` clause with `INSERT` and `SELECT`.

In [None]:
%%bq query -n insertFruit3
INSERT sampleDML.fruit_basket(name, count)
WITH w AS (
  SELECT ARRAY<STRUCT<name string, count int64>>
      [('cherry', 35),
      ('cranberry', 40),
      ('pear', 45)] col
)
SELECT name, count FROM w, UNNEST(w.col)

In [None]:
%bq execute -q insertFruit3

Here is an example that copies one table's contents into another. First we will create a new table.

In [None]:
fruit_row_detailed = {
  'name': 'string value',
  'count': 0,
  'readytoeat': False
}
sample_table2 = bq.Table("sampleDML.fruit_basket_detailed").create(schema = bq.Schema.from_data([fruit_row_detailed]), 
                                                                   overwrite = True)

In [None]:
%%bq query -n insertFruitFromTable
INSERT sampleDML.fruit_basket_detailed (name, count, readytoeat)
SELECT name, count, false
FROM sampleDML.fruit_basket

In [None]:
%bq execute -q insertFruitFromTable

## Updating Data

You can update rows in the `fruit_basket` table by using an `UPDATE` statement in the BigQuery Standard SQL query. We will try to do this using the Google Cloud Datalab BigQuery API.

In [None]:
%%bq query -n set_orange_ready_to_eat
UPDATE sampleDML.fruit_basket_detailed
SET readytoeat = True
WHERE name = 'banana'

In [None]:
set_orange_ready_to_eat.execute()

To view the contents of a table in BigQuery, use `%%bq tables view` command:

In [None]:
%%bq tables view -n sampleDML.fruit_basket_detailed

## Deleting Data

You can delete rows in the `fruit_basket` table by using a `DELETE` statement in the BigQuery Standard SQL query.

In [None]:
%%bq query -n deleteFruit
DELETE sampleDML.fruit_basket
WHERE name in ('cherry', 'cranberry')

In [None]:
%bq execute -q deleteFruit

Use the following query to delete the corresponding entries in `sampleDML.fruit_basket_detailed`

In [None]:
%%bq query -n deleteFruitDetailed
DELETE sampleDML.fruit_basket_detailed
WHERE NOT EXISTS
  (SELECT * FROM sampleDML.fruit_basket
  WHERE fruit_basket_detailed.name = fruit_basket.name)

In [None]:
%bq execute -q deleteFruitDetailed

## Deleting Resources

In [None]:
# Clear out sample resources
sample_dataset.delete(delete_contents = True)