# BigQuery Pandas and Magic Integration

In this notebook I walk through the process of integrating bigquery and pandas, plus using magic cell syntax to simplify the ability to work with BigQuery and produce more visually appealing outputs.

## Install Packages

Building off of the previous packages we've already installed in the GoogleConnectTests notebook,  we have three more dependencies:

In [None]:
#pip install pandas #Hopefully obvious...

In [None]:
#pip install --upgrade 'google-cloud-bigquery[pandas]' --user #New package requirement

In [None]:
#pip install --upgrade google-cloud-bigquery_storage --user  #New package requirement

## Enable BigQuery Cell Magic

This is what enables the %%bigquery magic cell stuff work:

In [None]:
%load_ext google.cloud.bigquery

## Point to Service Account Credentials

Point to your service account key:

In [18]:
#Must be run again after restarting kernel
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/<path>/<key>.json"

## Import BigQuery Modules (note the plural!)

This is similar to before, however there's a dependency on the bigquery_storage package we installed earlier. I think the bigquery[pandas] package will try to import this if you don't already have it loaded, but just in case:

In [19]:
from google.cloud import bigquery  #enable BigQuery - rerun at kernel restart
from google.cloud import bigquery_storage #NOTE THE ADDITION HERE - this is a requirement for the pandas inclusion

## Confirm Things from Before Still Work

Now let's make sure what we did earlier still works. For example, this will fail if you forgot to enable the service account, so a good check.

In [20]:
dataset_id = 'bigquery-public-data.covid19_aha' #specify the dataset you want to work with,located in bigquery-public-data 

client = bigquery.Client()

tables = client.list_tables(dataset_id)  # Make an API request.

print("Tables contained in '{}':".format(dataset_id))
for table in tables:
    print("{}.{}.{}".format(table.project, table.dataset_id, table.table_id))

Tables contained in 'bigquery-public-data.covid19_aha':
bigquery-public-data.covid19_aha.hospital_beds
bigquery-public-data.covid19_aha.staffing


In [22]:
query_job = client.query(
    """
   SELECT county_name, state_name, total_hospital_beds FROM `bigquery-public-data.covid19_aha.hospital_beds` LIMIT 10"""
)
results = query_job.result()  # Waits for job to complete.

for row in results:
    print("{} : {} : {} ".format(row.county_name, row.state_name, row.total_hospital_beds))

Windsor County : Vermont : 70 
Apache County : Arizona : 143 
Iberville Parish : Louisiana : 8 
Adair County : Oklahoma : 34 
Graves County : Kentucky : 227 
Platte County : Nebraska : 51 
Lincoln Parish : Louisiana : 177 
Lawrence County : Tennessee : 99 
Geneva County : Alabama : 147 
Hamilton County : Illinois : 25 


## Do the Same Thing, Only Easier / Prettier / Better (?)

OK, now that we're functional using our previous methods, let's do the same query as above but using our fancy new tool kit.

In [23]:
%%bigquery
SELECT
    county_name,
    state_name,
    total_hospital_beds
FROM `bigquery-public-data.covid19_aha.hospital_beds`
LIMIT 10

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 257.78query/s]
Downloading: 100%|██████████| 10/10 [00:01<00:00,  5.46rows/s]


Unnamed: 0,county_name,state_name,total_hospital_beds
0,Windsor County,Vermont,70
1,Apache County,Arizona,143
2,Iberville Parish,Louisiana,8
3,Adair County,Oklahoma,34
4,Graves County,Kentucky,227
5,Platte County,Nebraska,51
6,Lincoln Parish,Louisiana,177
7,Lawrence County,Tennessee,99
8,Geneva County,Alabama,147
9,Hamilton County,Illinois,25


## Ooooh, pretty! And definitely easier to code and interact with vs. the previous method.