# SQL Parameters

Currently Google BigQuery SQL does not support parameterization. However, within notebooks, it is interesting to be able to use Python variables defined in the notebook as parameter values for SQL.

Google Cloud Datalab introduces a pattern for declaring and using parameterized queries.

## Data Preview

In [1]:
%%bigquery sample --count 10
SELECT * FROM [cloud-datalab-samples:httplogs.logs_20140615]

timestamp,latency,status,method,endpoint
2014-06-15 07:00:00.003772,122,200,GET,Interact3
2014-06-15 07:00:00.428897,144,200,GET,Interact3
2014-06-15 07:00:00.536486,48,200,GET,Interact3
2014-06-15 07:00:00.652760,28,405,GET,Interact2
2014-06-15 07:00:00.670100,103,200,GET,Interact3
2014-06-15 07:00:00.834251,121,405,GET,Interact2
2014-06-15 07:00:00.943075,28,200,GET,Other
2014-06-15 07:00:01.000102,124,405,GET,Interact2
2014-06-15 07:00:01.071107,49,200,GET,Interact3
2014-06-15 07:00:01.159701,119,200,GET,Other


In [2]:
%%sql
SELECT endpoint FROM [cloud-datalab-samples:httplogs.logs_20140615] GROUP BY endpoint

endpoint
Interact3
Interact2
Other
Popular
Home
Create
Admin
Interact1
Warmup
Recent


# Parameterization via SQL Modules

Parameters are declared in SQL modules using a `name = default_value` syntax before the SQL, and then using `$name` within the SQL to reference the parameter.

In [3]:
%%sql --module endpoint_stats
endpoint = 'Other'

SELECT endpoint, COUNT(latency) As requests, MIN(latency) AS min_latency, MAX(latency) AS max_latency
FROM [cloud-datalab-samples:httplogs.logs_20140615]
WHERE endpoint = $endpoint
GROUP BY endpoint

This defined a SQL query with a String `name` parameter named `endpoint`, which defaults to the value `Other` (as you'll see when the query is used to sample data without specifying a specific value).

In [4]:
%%bigquery execute --query endpoint_stats

endpoint,requests,min_latency,max_latency
Other,326889,1,121277


## Declarative Query Execution

Parameter values can be specified with a `%%bigquery sample` command as follows (parameter values are defined in a YAML block):

In [5]:
%%bigquery execute --query endpoint_stats
endpoint: Recent

endpoint,requests,min_latency,max_latency
Recent,734,2,18715


The YAML text can reference values defined in the notebook as well, again using the `$variable` syntax.

In [6]:
interesting_endpoint = 'Popular'

In [7]:
%%bigquery execute --query endpoint_stats
endpoint: $interesting_endpoint

endpoint,requests,min_latency,max_latency
Popular,7658,2,6443


## Imperative Query Execution

Parameter values can be passed to BigQuery APIs when constructing a `Query` object.

In [8]:
import datalab.bigquery as bq

In [9]:
stats_query = bq.Query(endpoint_stats, endpoint = interesting_endpoint)
print stats_query.sql

SELECT endpoint, COUNT(latency) As requests, MIN(latency) AS min_latency, MAX(latency) AS max_latency
FROM [cloud-datalab-samples:httplogs.logs_20140615]
WHERE endpoint = "Popular"
GROUP BY endpoint


From the above SQL, you can see that the value for the `$endpoint` variable was expanded. This parameter replacement happens locally, before the resulting SQL is sent to BigQuery.

In [10]:
stats_query.results()

endpoint,requests,min_latency,max_latency
Popular,7658,2,6443


# Looking Ahead

Parameterization enables one part of the SQL and Python integration: being able to use values in Python code in the notebook, and passing them in as part of the query when retrieving data from BigQuery.

The next notebook will cover the other part of the SQL and Python integration: retrieving query results into the notebook for use with Python code.

Parameterization is also a building block for creating complex queries that use whole queries as parameter values.