# BigQuery Standard SQL

The examples in this notebook introduce features of [BigQuery Standard SQL](https://cloud.google.com/bigquery/sql-reference/) and [BigQuery SQL Data Manipulation Language (beta)](https://cloud.google.com/bigquery/sql-reference/dml-syntax).  BigQuery Standard SQL is compliant with the SQL 2011 standard. You've already seen the use of `%%sql` in the [Hello BigQuery](Hello BigQuery.ipynb) notebook and ``%%bigquery`` in the [BigQuery Commands](BigQuery Commands.ipynb) notebook. By default, both of these magic commands will configure Google BigQuery to run queries using the legacy BigQuery dialect.  

# Querying Data

First, we will demonstrate how to use `%%sql` with the `dialect` option which is required to run queries using BigQuery Standard SQL.

In [1]:
%%sql -d standard
WITH UniqueNames2013 AS
(SELECT DISTINCT name
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE Year = 2013)
SELECT COUNT(*) FROM UniqueNames2013

f0_
9510


## SQL

In the example above, the SQL query was run directly via `%%sql`. In Cloud Datalab, you can also save the query in a SQL module. Here is the equivalent query:

In [2]:
%%sql --module UniqueNames2013
WITH UniqueNames AS
(SELECT DISTINCT name
  FROM `bigquery-public-data.usa_names.usa_1910_2013`
  WHERE Year = 2013)
SELECT * FROM UniqueNames
LIMIT 15

## Using the BigQuery Magic command with Standard SQL

Below we will demonstrate how to set the `dialect` option so we can use BigQuery Standard SQL with `%%bigquery`. First, let's display all the arguments of `%%bigquery`.

In [3]:
%%bigquery -h

usage: bigquery [-h]
                {sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
                ...

Execute various BigQuery-related operations. Use "%bigquery <command> -h" for
help on a specific command.

positional arguments:
  {sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
                        commands
    sample              Display a sample of the results of a BigQuery SQL
                        query. The cell can optionally contain arguments for
                        expanding variables in the query, if -q/--query was
                        used, or it can contain SQL for a query.
    create              Create a dataset or table.
    delete              Delete a dataset or table.
    dryrun              Execute a dry run of a BigQuery query and display
                        approximate usage statistics
    udf                 Create a named Javascript BigQuery UDF
    exec

The `dryrun` argument in ``%%bigquery`` can be helpful to confirm the syntax of the SQL query :

In [4]:
%%bigquery dryrun -q UniqueNames2013 -d standard

Now, let's get a small sample of the results using the `sample` argument in ``%%bigquery``:

In [5]:
%%bigquery sample -q UniqueNames2013 -d standard

name
Coleton
Jadelyn
Anwar
Kennedy
Rainier
Joaquin
Harlan
Elienai
Myra
Jentry


Finally, We can use the `execute` argument in %%bigquery to display the results of our query:

In [6]:
%%bigquery execute -q UniqueNames2013 -d standard

name
Coleton
Amberlee
Anwar
Kennedy
Rainier
Harlan
Anum
Joaquin
Gisela
Breanne


## Using Standard SQL with the Datalab BigQuery API

The Cloud Datalab APIs are provided in the `datalab` Python library, and the BigQuery functionality is contained within the `datalab.bigquery` module. 

The most important BigQuery-related API is the one that allows you to execute a SQL query. The `bq.Query` class provides that functionality. To run a query using BigQuery Standard SQL, simply set the `dialect` parameter to `'standard'`.

In [7]:
import datalab.bigquery as bq

First, let's view the sql query that we're about to run.

In [8]:
bq.Query(UniqueNames2013).sql

u'WITH UniqueNames AS\n(SELECT DISTINCT name\n  FROM `bigquery-public-data.usa_names.usa_1910_2013`\n  WHERE Year = 2013)\nSELECT * FROM UniqueNames\nLIMIT 15'

To run the query and view a sample from the result set, use the following:

In [9]:
bq.Query(UniqueNames2013).sample(dialect='standard')

name
Coleton
Anwar
Rainier
Elienai
Myra


To run the query and display the entire result set in a table, use the following:

In [10]:
bq.Query(UniqueNames2013).results(dialect='standard')

name
Coleton
Amberlee
Anwar
Kennedy
Rainier
Harlan
Anum
Joaquin
Gisela
Breanne


Finally, to run the query and display the entire result set in a pandas DataFrame, use the following:

In [11]:
bq.Query(UniqueNames2013).to_dataframe(dialect='standard')

Unnamed: 0,name
0,Coleton
1,Amberlee
2,Anwar
3,Kennedy
4,Rainier
5,Harlan
6,Anum
7,Joaquin
8,Gisela
9,Breanne


# Using Google BigQuery SQL Data Manipulation Language

Below, we will demonstrate how to use Google BigQuery SQL Data Manipulation Language (DML) in Google Cloud Datalab. This requires the `dialect` option :

## Preparation

First, let's create a sample dataset and table to help demonstrate the features of Google BigQuery DML.

In [12]:
# Create a new dataset (this will be deleted later in the notebook)
sample_dataset = bq.Dataset('sampleDML')
if not sample_dataset.exists():
  sample_dataset.create(friendly_name = 'Sample Dataset for testing DML', description = 'Created from Sample Notebook in Google Cloud Datalab')
  sample_dataset.exists()

In [13]:
# To create a table, we also need to create a schema.
# Its easiest to create a schema from some existing data, so this
# example demonstrates using an example object
fruit_row = {
  'name': 'string value',
  'count': 0
}

sample_table1 = bq.Table("sampleDML.fruit_basket").create(schema = bq.Schema.from_data([fruit_row]), 
                                                          overwrite = True)

## Inserting Data

We can add rows to our newly created `fruit_basket` table by using an `INSERT` statement in our BigQuery Standard SQL query.

In [14]:
%%sql -d standard
INSERT sampleDML.fruit_basket (name, count)
VALUES('banana', 5),
      ('orange', 10),
      ('apple', 15),
      ('mango', 20)

count,name
15,apple
5,banana
10,orange
20,mango


You may rewrite the previous query as:

In [15]:
%%sql -d standard
INSERT sampleDML.fruit_basket (name, count)
SELECT * 
FROM UNNEST([('peach', 25), ('watermelon', 30)])

count,name
15,apple
5,banana
10,orange
20,mango
25,peach
30,watermelon


You can also use a `WITH` clause with `INSERT` and `SELECT`.

In [16]:
%%sql -d standard
INSERT sampleDML.fruit_basket(name, count)
WITH w AS (
  SELECT ARRAY<STRUCT<name string, count int64>>
      [('cherry', 35),
      ('cranberry', 40),
      ('pear', 45)] col
)
SELECT name, count FROM w, UNNEST(w.col)

count,name
15,apple
5,banana
10,orange
20,mango
25,peach
30,watermelon
45,pear
35,cherry
40,cranberry


Here is an example that copies one table's contents into another. First we will create a new table.

In [17]:
fruit_row_detailed = {
  'name': 'string value',
  'count': 0,
  'readytoeat': False
}
sample_table2 = bq.Table("sampleDML.fruit_basket_detailed").create(schema = bq.Schema.from_data([fruit_row_detailed]), 
                                                                   overwrite = True)

In [18]:
%%sql -d standard
INSERT sampleDML.fruit_basket_detailed (name, count, readytoeat)
SELECT name, count, false
FROM sampleDML.fruit_basket

count,readytoeat,name
20,False,mango
25,False,peach
5,False,banana
15,False,apple
30,False,watermelon
40,False,cranberry
10,False,orange
35,False,cherry
45,False,pear


## Updating Data

You can update rows in the `fruit_basket` table by using an `UPDATE` statement in the BigQuery Standard SQL query. We will try to do this using the Google Cloud Datalab BigQuery API.

In [19]:
%%sql --module set_orange_ready_to_eat
UPDATE sampleDML.fruit_basket_detailed
SET readytoeat = true
WHERE name = 'orange'

In [20]:
bq.Query(set_orange_ready_to_eat).execute(dialect='standard')

Job x-x/job_R completed

To view the contents of a table in BigQuery, you can use `%%bigquery table`.

In [21]:
%%bigquery table sampleDML.fruit_basket_detailed

count,readytoeat,name
5,False,banana
10,True,orange
20,False,mango
45,False,pear
30,False,watermelon
35,False,cherry
25,False,peach
15,False,apple
40,False,cranberry


## Deleting Data

You can delete rows in the `fruit_basket` table by using a `DELETE` statement in the BigQuery Standard SQL query.

In [22]:
%%sql -d standard
DELETE sampleDML.fruit_basket
WHERE name in ('cherry', 'cranberry')

count,name
15,apple
5,banana
10,orange
20,mango
25,peach
30,watermelon
45,pear


Use the following query to delete the corresponding entries in `sampleDML.fruit_basket_detailed`

In [23]:
%%sql -d standard
DELETE sampleDML.fruit_basket_detailed
WHERE NOT EXISTS
  (SELECT * FROM sampleDML.fruit_basket
  WHERE fruit_basket_detailed.name = fruit_basket.name)

count,readytoeat,name
20,False,mango
15,False,apple
5,False,banana
30,False,watermelon
10,True,orange
45,False,pear
25,False,peach


## Deleting Resources

In [24]:
# Clear out sample resources
sample_dataset.delete(delete_contents = True)