# Practicing SQL Queries Using BigQuery

We will be looking at the "covid19_google_mobility" dataset located [here](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=covid19_open_data&page=dataset&project=cool-monolith-286222&ws=!1m4!1m3!3m2!1sbigquery-public-data!2scovid19_open_data).


## Setup

In [1]:
from google.cloud import bigquery
from google.colab import auth
import pandas as pd
import plotly.express as px

auth.authenticate_user()

In [2]:
# assign the project ID for BILLING purposes, i.e. who is going to pay for the query?
project_id = 'cool-monolith-286222'

# Create client object
client = bigquery.Client(project=project_id)

## Initial Exploration

### List the data sets


https://cloud.google.com/bigquery/docs/listing-datasets#python_1



In [3]:
# assign the project ID that OWNS the data set
owner_project_id = "bigquery-public-data"

datasets = list(client.list_datasets(project=owner_project_id))  # Make an API request.

len(datasets)

332

In [4]:
if datasets:
  print(f"Datasets in project {owner_project_id}:")
  print("\n".join( f"\t{d.dataset_id}" for d in datasets[:10] ) )
else:
  print(f"{owner_project_id} project does not contain any datasets.")


Datasets in project bigquery-public-data:
	america_health_rankings
	austin_311
	austin_bikeshare
	austin_crime
	austin_incidents
	austin_waste
	baseball
	bbc_news
	bigqueryml_ncaa
	bitcoin_blockchain


### Data set properties

https://cloud.google.com/bigquery/docs/listing-datasets#get_information_about_datasets

#### Friendly name

In [5]:
for dataset in datasets[:10]:
  full_dataset_id = f"{dataset.project}.{dataset.dataset_id}"
  friendly_name = dataset.friendly_name
  print(
    f"Got dataset '{full_dataset_id}' with friendly_name '{friendly_name}'."
  )


Got dataset 'bigquery-public-data.america_health_rankings' with friendly_name 'None'.
Got dataset 'bigquery-public-data.austin_311' with friendly_name 'None'.
Got dataset 'bigquery-public-data.austin_bikeshare' with friendly_name 'None'.
Got dataset 'bigquery-public-data.austin_crime' with friendly_name 'None'.
Got dataset 'bigquery-public-data.austin_incidents' with friendly_name 'None'.
Got dataset 'bigquery-public-data.austin_waste' with friendly_name 'None'.
Got dataset 'bigquery-public-data.baseball' with friendly_name 'None'.
Got dataset 'bigquery-public-data.bbc_news' with friendly_name 'None'.
Got dataset 'bigquery-public-data.bigqueryml_ncaa' with friendly_name 'None'.
Got dataset 'bigquery-public-data.bitcoin_blockchain' with friendly_name 'None'.


#### Description

In [6]:
import textwrap
for dataset in datasets[:10]:
  print("==> " + dataset.full_dataset_id)
  fdi = f"{dataset.project}.{dataset.dataset_id}"
  dataset = client.get_dataset(fdi)
  desc = f"Description: {dataset.description}".split("\n")
  desc = "\n".join( [ f"\t{t}" for l in desc for t in textwrap.fill(l, 100).split("\n") ] )
  print(f"{desc if desc else 'None'}")
  print()



==> bigquery-public-data:america_health_rankings
	Description: America Health Rankings

==> bigquery-public-data:austin_311
	Description: None

==> bigquery-public-data:austin_bikeshare
	Description: Austin Bikeshare dataset

==> bigquery-public-data:austin_crime
	Description: Austin Crime dataset

==> bigquery-public-data:austin_incidents
	Description: None

==> bigquery-public-data:austin_waste
	Description: austin waste and diversion

==> bigquery-public-data:baseball
	Description: Overview: This public data includes pitch-by-pitch data for Major League Baseball (MLB)
	games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup
	event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup
	event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the
	regular season). The schemas for the games_wide and games_post_wide tables are identical. With this
	data you can effecti

### Show labels

In [7]:
for dataset in datasets[:10]:
  print("==> " + dataset.full_dataset_id)
  fdi = f"{dataset.project}.{dataset.dataset_id}"
  labels = client.get_dataset(fdi).labels
  if labels:
    print("Labels:")
    print( "\n".join( f"\t{k}: {v}" for k, v in labels.items() ) )
  print()

==> bigquery-public-data:america_health_rankings

==> bigquery-public-data:austin_311

==> bigquery-public-data:austin_bikeshare

==> bigquery-public-data:austin_crime

==> bigquery-public-data:austin_incidents

==> bigquery-public-data:austin_waste

==> bigquery-public-data:baseball

==> bigquery-public-data:bbc_news

==> bigquery-public-data:bigqueryml_ncaa

==> bigquery-public-data:bitcoin_blockchain



### List the tables

#### Version 1


In [8]:
for dataset in datasets[:10]:
  print("==> " + dataset.full_dataset_id)
  # list tables
  print( "\n".join( "\t" + t.table_id \
                   for t in list(client.list_tables(dataset)) ))


==> bigquery-public-data:america_health_rankings
	ahr
	america_health_rankings
==> bigquery-public-data:austin_311
	311_service_requests
	test_view_3
==> bigquery-public-data:austin_bikeshare
	bikeshare_stations
	bikeshare_trips
==> bigquery-public-data:austin_crime
	crime
==> bigquery-public-data:austin_incidents
	incidents_2008
	incidents_2009
	incidents_2010
	incidents_2011
	incidents_2016
==> bigquery-public-data:austin_waste
	waste_and_diversion
==> bigquery-public-data:baseball
	games_post_wide
	games_wide
	schedules
==> bigquery-public-data:bbc_news
	fulltext
==> bigquery-public-data:bigqueryml_ncaa
	cume_games_view
==> bigquery-public-data:bitcoin_blockchain
	blocks
	transactions


#### Version 2

In [11]:
# assign the project ID that OWNS the data set
owner_project_id = "bigquery-public-data"

# Construct a reference to the "covid19_google_mobility" dataset
project_dataset = "covid19_google_mobility"
# project_dataset = "bitcoin_blockchain"

dataset_ref = client.dataset(project_dataset, project=owner_project_id)

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

# Get all the tables in the dataset
tables = list(client.list_tables(dataset))

# Print names of all tables in the dataset
for table in tables:
  print(table.table_id)

mobility_report


### Look at the table schema

In [12]:
# Construct a reference to the "mobility report" table
table_ref = dataset.table("mobility_report")

# API request - fetch the table
table = client.get_table(table_ref)

# See the table's schema - name, field type, mode, description
table.schema

[SchemaField('country_region_code', 'STRING', 'NULLABLE', None, '2 letter alpha code for the country/region in which changes are measured relative to the baseline. These values correspond with the ISO 3166-1 alpha-2 codes', (), None),
 SchemaField('country_region', 'STRING', 'NULLABLE', None, 'The country/region in which changes are measured relative to the baseline', (), None),
 SchemaField('sub_region_1', 'STRING', 'NULLABLE', None, 'First geographic sub-region in which the data is aggregated. This varies by country/region to ensure privacy and public health value in consultation with local public health authorities', (), None),
 SchemaField('sub_region_2', 'STRING', 'NULLABLE', None, 'Second geographic sub-region in which the data is aggregated. This varies by country/region to ensure privacy and public health value in consultation with local public health authorities', (), None),
 SchemaField('metro_area', 'STRING', 'NULLABLE', None, 'A specific metro area to measure mobility withi

#### Schema in a dataframe

In [13]:
fields = pd.DataFrame( [ x.to_api_repr() for x in table.schema ] )
fields.shape


(17, 4)

In [14]:
fields


Unnamed: 0,name,type,mode,description
0,country_region_code,STRING,NULLABLE,2 letter alpha code for the country/region in ...
1,country_region,STRING,NULLABLE,The country/region in which changes are measur...
2,sub_region_1,STRING,NULLABLE,First geographic sub-region in which the data ...
3,sub_region_2,STRING,NULLABLE,Second geographic sub-region in which the data...
4,metro_area,STRING,NULLABLE,A specific metro area to measure mobility with...
5,iso_3166_2_code,STRING,NULLABLE,Unique identifier for the geographic region as...
6,census_fips_code,STRING,NULLABLE,Unique identifier for each US county as define...
7,place_id,STRING,NULLABLE,A textual identifier that uniquely identifies ...
8,date,DATE,NULLABLE,Changes for a given date as compared to baseli...
9,retail_and_recreation_percent_change_from_base...,INTEGER,NULLABLE,Mobility trends for places like restaurants ca...


In [15]:
# Preview the first five lines of the table
client.list_rows(table, max_results=5).to_dataframe()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,source_url,etl_timestamp
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-03-09,-3,4,4,-10,-1,4,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-03-10,-4,6,3,-11,-2,4,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-05-14,-53,-20,-67,-61,-43,29,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-08-12,-22,-6,-41,-43,-22,11,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-09-01,-22,-3,-40,-40,-21,10,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00


##  Add safe config settings

BigQuery allows you to query up to 1 TB per month. You can quickly reach this limit if you are not careful. Luckily, there are ways to assess and limit the amount of data you are querying.

Set constants for sizes

In [16]:
ONE_MB = 1_000*1_000
ONE_GB = 1_000*ONE_MB

Sample Query 1 - Covid - Dry Run
You can use a 'dry run' to estimate the size of a query before running it.

In [17]:
query = """
        SELECT *
        FROM bigquery-public-data.covid19_google_mobility.mobility_report
        LIMIT 5
        """

dry_run_config = bigquery.QueryJobConfig(dry_run = True)
dry_run_query_job = client.query(query, job_config= dry_run_config)
size = dry_run_query_job.total_bytes_processed
print(f"{size:_}")

2_215_181_051


Sample Query 1 - Covid - Safe Config
You can also specify a limit for how much data you want to scan.

In [18]:
# This line should be included every time
# It seems like you should be able to set it and reuse it, but that doesn't work
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=ONE_GB)

safe_query_job = client.query(query, job_config=safe_config)
df = safe_query_job.to_dataframe()
df.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,source_url,etl_timestamp
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-03-04,-1,7,-2,-5,3,2,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-05-07,-53,-23,-68,-63,-44,30,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-05-13,-51,-21,-66,-61,-43,28,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-05-15,-62,-32,-80,-70,-35,21,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-06-04,-41,-14,-60,-52,-31,23,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00


In [19]:
df.shape

(5, 17)

## What do a couple of entries look like?

In [20]:
query = """
        SELECT *
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        LIMIT 5
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,source_url,etl_timestamp
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-04-25,-62,-32,-77,-74,-41,25,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-05-23,-52,-17,-73,-66,-47,25,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-07-13,-26,-7,-45,-45,-23,14,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-07-25,-25,-8,-46,-46,-11,10,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-09-06,-18,1,-35,-39,-21,9,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00


##What do the next 5 entries look like?  
5 just wasn't enough!  

In [21]:
query = """
        SELECT *
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        LIMIT 5 OFFSET 5
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline,source_url,etl_timestamp
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-11-05,-20,0,-29,-35,-19,9,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-12-08,-18,-1,-29,-34,-17,8,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-12-20,-11,4,-15,-28,-21,8,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-12-30,-7,9,-18,-29,-21,8,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2021-01-06,-16,2,-29,-31,-18,9,https://www.gstatic.com/covid19/mobility/Globa...,2024-06-24 00:03:25.988446+00:00


## How many records are there?

In [23]:
query = """
        SELECT COUNT(1) as record_count
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,record_count
0,11730025


## What countries are represented in this dataset?

In [24]:
data_prefix = 'bigquery-public-data.covid19_google_mobility'
data_table = 'mobility_report'

In [32]:
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=ONE_MB)
query = f"""
        SELECT DISTINCT country_region
        FROM {data_prefix}.{data_table}
        """
df = client.query(query, job_config = safe_config).to_dataframe()
df.head()

Unnamed: 0,country_region
0,Australia
1,Bangladesh
2,Guatemala
3,Portugal
4,Qatar


In [33]:
df.shape


(135, 1)

In [34]:
countries = df['country_region']
print(f"There are {countries.count()} countries")

There are 135 countries


**There are 193 or maybe 195 total countries so we are missing 60 countries in this dataset!!!**

## What the subregions are in the US?

In [35]:
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=ONE_MB)
query = """
        SELECT sub_region_1, sub_region_2
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE country_region = 'United States'
        LIMIT 10
        """
df = client.query(query, job_config = safe_config).to_dataframe()
df.head()

Unnamed: 0,sub_region_1,sub_region_2
0,Indiana,Dubois County
1,Indiana,Elkhart County
2,Indiana,Elkhart County
3,Indiana,Elkhart County
4,Indiana,Elkhart County


This isn't very informative. Let's look where the sub_regions do not equal 'None'.

In [36]:
query = """
        SELECT DISTINCT sub_region_1, sub_region_2
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE country_region = 'United States' AND sub_region_2 != 'None'
        """
df = client.query(query).to_dataframe()
df.head(10)

Unnamed: 0,sub_region_1,sub_region_2
0,Indiana,Greene County
1,Indiana,Harrison County
2,Indiana,Howard County
3,Indiana,Jennings County
4,Indiana,Lake County
5,Indiana,Parke County
6,Indiana,Porter County
7,Indiana,Pulaski County
8,Indiana,Putnam County
9,Indiana,St. Joseph County


Sub_region_1 appears to be the state and sub_region_2 appears to be the county.

## What dates does this cover?

In [37]:
query = """
        SELECT MIN(date) AS min_date, MAX(date) AS max_date
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE country_region = 'United States'
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,min_date,max_date
0,2020-02-15,2022-10-15


We have data from mid-February 2020 to mid-October of 2022


## On average have retail and recreation trips decreased in Bernalillo County?

In [38]:
query = """
        SELECT AVG(retail_and_recreation_percent_change_from_baseline) as mean
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County" AND sub_region_1 = "New Mexico"
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,mean
0,-15.294661


## Are there any Bernalillo Counties in other states?

In [39]:
query = """
        SELECT DISTINCT sub_region_1
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County"
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,sub_region_1
0,New Mexico


We're the only one!!!

## How many states have a subregion 2 that is Lincoln County or similar?

In [40]:
query = """
        SELECT DISTINCT sub_region_1, sub_region_2, country_region
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 LIKE "Lincoln%"
        """
df = client.query(query).to_dataframe()
df.head(50)

Unnamed: 0,sub_region_1,sub_region_2,country_region
0,New Mexico,Lincoln County,United States
1,Wyoming,Lincoln County,United States
2,Oregon,Lincoln County,United States
3,Idaho,Lincoln County,United States
4,Missouri,Lincoln County,United States
5,North Carolina,Lincoln County,United States
6,Wisconsin,Lincoln County,United States
7,Arkansas,Lincoln County,United States
8,Washington,Lincoln County,United States
9,Louisiana,Lincoln Parish,United States


## What was the lowest level of retail & recreation in Bernalillo county and when was that?

In [41]:
query = """
        SELECT MIN(retail_and_recreation_percent_change_from_baseline)
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County"
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,f0_
0,-86


In [42]:
query = """
        SELECT date
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County" AND sub_region_1 = "New Mexico"
              AND retail_and_recreation_percent_change_from_baseline = -86
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,date
0,2020-12-25


Christmas! We probably don't want to account for that day (or Thanksgiving day).

In [43]:
query = """
        SELECT MIN(retail_and_recreation_percent_change_from_baseline)
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County"
          AND date not in
          ('2020-12-25','2020-11-26','2021-12-25','2021-11-25','2022-11-24','2022-12-25')
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,f0_
0,-61


In [44]:
query = """
        SELECT date
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County" AND sub_region_1 = "New Mexico"
              AND retail_and_recreation_percent_change_from_baseline = -61
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,date
0,2020-04-12


## Was that in a period of low retail and recreation activity or just noise in the data?

In [45]:
query = """
        SELECT date, retail_and_recreation_percent_change_from_baseline
        -- SELECT COUNT(*)
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 = "Bernalillo County"
              AND retail_and_recreation_percent_change_from_baseline < -40
        -- ORDER BY retail_and_recreation_percent_change_from_baseline
        ORDER BY date
        """
df = client.query(query).to_dataframe()
df.head(25)

Unnamed: 0,date,retail_and_recreation_percent_change_from_baseline
0,2020-03-21,-42
1,2020-03-24,-43
2,2020-03-25,-44
3,2020-03-26,-45
4,2020-03-27,-45
5,2020-03-28,-51
6,2020-03-29,-46
7,2020-04-04,-47
8,2020-04-05,-44
9,2020-04-08,-41


## What country has decreased retail and recreation activity the most?

In [46]:
query = """
        SELECT country_region, ROUND(AVG(retail_and_recreation_percent_change_from_baseline), 1) as mean
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        GROUP BY country_region
        ORDER BY mean
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,country_region,mean
0,Myanmar (Burma),-38.2
1,Liechtenstein,-33.9
2,Panama,-28.6
3,Guinea-Bissau,-25.3
4,Kuwait,-24.3


In [47]:
df.tail(5)

Unnamed: 0,country_region,mean
130,Mongolia,43.1
131,Burkina Faso,46.8
132,Niger,57.6
133,Yemen,61.5
134,Libya,69.7


## How does New Mexico compare to similar states?


In [48]:
query = """
        SELECT sub_region_1, AVG(retail_and_recreation_percent_change_from_baseline) as mean
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_1 IN ( "New Mexico", "Colorado", "Arizona", "Oklahoma", "Texas", "Utah" )
        GROUP BY sub_region_1
        ORDER BY mean
        """
df = client.query(query).to_dataframe()
df.head()

Unnamed: 0,sub_region_1,mean
0,New Mexico,-11.332305
1,Arizona,-7.709368
2,Colorado,-6.410616
3,Texas,-5.194934
4,Utah,0.878987


## What does all the data for Bernalillo County look like?

In [49]:
table.schema

[SchemaField('country_region_code', 'STRING', 'NULLABLE', None, '2 letter alpha code for the country/region in which changes are measured relative to the baseline. These values correspond with the ISO 3166-1 alpha-2 codes', (), None),
 SchemaField('country_region', 'STRING', 'NULLABLE', None, 'The country/region in which changes are measured relative to the baseline', (), None),
 SchemaField('sub_region_1', 'STRING', 'NULLABLE', None, 'First geographic sub-region in which the data is aggregated. This varies by country/region to ensure privacy and public health value in consultation with local public health authorities', (), None),
 SchemaField('sub_region_2', 'STRING', 'NULLABLE', None, 'Second geographic sub-region in which the data is aggregated. This varies by country/region to ensure privacy and public health value in consultation with local public health authorities', (), None),
 SchemaField('metro_area', 'STRING', 'NULLABLE', None, 'A specific metro area to measure mobility withi

In [50]:
query = """
        SELECT date,
               retail_and_recreation_percent_change_from_baseline AS retail_recreation,
               grocery_and_pharmacy_percent_change_from_baseline AS grocery,
               parks_percent_change_from_baseline AS parks,
               transit_stations_percent_change_from_baseline AS transit,
               workplaces_percent_change_from_baseline AS work,
               residential_percent_change_from_baseline AS residential
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_1 ="New Mexico" AND sub_region_2 = "Bernalillo County"
        ORDER BY date
        """
df = client.query(query).to_dataframe()
df.head(40)

Unnamed: 0,date,retail_recreation,grocery,parks,transit,work,residential
0,2020-02-15,4,10,25,0,0,-1
1,2020-02-16,12,10,28,4,2,-1
2,2020-02-17,10,6,35,2,-24,4
3,2020-02-18,5,8,17,6,3,0
4,2020-02-19,1,8,6,3,1,0
5,2020-02-20,5,4,9,3,1,0
6,2020-02-21,3,6,4,1,4,-1
7,2020-02-22,1,3,-12,-7,0,1
8,2020-02-23,3,1,-32,-5,0,1
9,2020-02-24,6,5,9,5,3,-1


##  How many counties in the U.S. have a name that starts with a B?

In [51]:
query = """
        SELECT DISTINCT sub_region_2
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_2 LIKE "B%" AND country_region = "United States"
        """
df = client.query(query).to_dataframe()
df.head(6)

Unnamed: 0,sub_region_2
0,Bastrop County
1,Blaine County
2,Buffalo County
3,Belknap County
4,Burlington County
5,Bullock County


In [52]:
df.count()

sub_region_2    143
dtype: int64

## What does all the data for Bernalillo County look like for June?

In [53]:
query = """
        SELECT date,
               retail_and_recreation_percent_change_from_baseline AS retail_recreation,
               grocery_and_pharmacy_percent_change_from_baseline AS grocery,
               parks_percent_change_from_baseline AS parks,
               transit_stations_percent_change_from_baseline AS transit,
               workplaces_percent_change_from_baseline AS work,
               residential_percent_change_from_baseline AS residential
        FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
        WHERE sub_region_1 ="New Mexico" AND sub_region_2 = "Bernalillo County"
              AND date BETWEEN '2021-06-01' AND '2021-06-30'
        ORDER BY date
        """
df = client.query(query).to_dataframe()
df.head(30)

Unnamed: 0,date,retail_recreation,grocery,parks,transit,work,residential
0,2021-06-01,-2,8,34,-14,-37,7
1,2021-06-02,-7,1,28,-14,-37,8
2,2021-06-03,-3,1,40,-16,-37,7
3,2021-06-04,-8,1,29,-11,-33,6
4,2021-06-05,-14,-4,31,-6,-17,2
5,2021-06-06,-5,-3,26,-6,-13,1
6,2021-06-07,-5,-1,36,-12,-37,7
7,2021-06-08,-4,1,28,-8,-36,8
8,2021-06-09,-5,2,26,-11,-36,8
9,2021-06-10,-5,0,36,-14,-37,8


## Your Turn
You will now practice using queries with Kaggle's Intro to SQL, located [here](https://www.kaggle.com/learn/intro-to-sql).