# Sample Notebook to Query Bigquery Data
In this notebook, we demonstrate how to collect data from BigQuery, using the BigQuery public dataset.
In order to query BigQuery, a Google Cloud account and appropriate authentication must have been established.
This tutorial doesn't cover that.  There is an excellent tutorial at https://codelabs.developers.google.com/codelabs/cloud-bigquery-python, and we have followed the steps there.  Specifically, we created a service account, gave the service account appropriate permissions, and created a key.  The key should be stored in key.json in the home directory (not shown in this tutorial).  The first set of queries we'll run come from the tutorial

Step 1: Import the library

In [21]:
from google.cloud import bigquery

Step 2: set the GOOGLE_APPLICATION_CREDENTIALS environment variable to /home/jovyan/key.json.  This variable _must_ be set to the name of the file which contains the service account key that has been created before a bigquery CLient can be created.

In [22]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/jovyan/key.json'

Step 3: Create the client to use in bigQuery

In [23]:
client = bigquery.Client()

Create the query as a SQL statement and use client to execute the query and put the results into results

In [24]:
query = """
    SELECT corpus AS title, COUNT(word) AS unique_words
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY title
    ORDER BY unique_words
    DESC LIMIT 10
"""


In [25]:
results = client.query(query)

Iterate over the results, printing them out.  The particular database we queries is a unique wordcount in William Shakespeare's plays

In [26]:
for row in results:
    title = row['title']
    unique_words = row['unique_words']
    print(f'{title:<20} | {unique_words}')


hamlet               | 5318
kinghenryv           | 5104
cymbeline            | 4875
troilusandcressida   | 4795
kinglear             | 4784
kingrichardiii       | 4713
2kinghenryvi         | 4683
coriolanus           | 4653
2kinghenryiv         | 4605
antonyandcleopatra   | 4582


We can also read results into a Pandas Dataframe, which is convenient for graphing using Galyleo and for other analytics.  In order 
to do this, import the pandas library

In [27]:
import pandas as pd

Run a new query.  This time, we'll do it on the Tsunami data set Google maintains, and look for tsunamis where the water height rose by 40 meters

In [28]:
query = '''SELECT *  FROM `bigquery-public-data.noaa_tsunami.historical_runups` where water_ht > 40;'''

In [29]:
dataframe = client.query(query).to_dataframe()

In [31]:
dataframe

Unnamed: 0,id,tsevent_id,year,month,day,timestamp,doubtful,country,state,location_name,...,deaths,deaths_description,injuries,injuries_description,damage_millions_dollars,damage_description,houses_damaged,houses_damaged_description,houses_destroyed,houses_destroyed_description
0,203,315,1674,2,17,1674-02-17 11:30:00,,INDONESIA,MALUKU,"LIMA, AMBON ISLAND",...,127,3,6,1,,2,,,,2
1,200,315,1674,2,17,1674-02-17 11:30:00,,INDONESIA,MALUKU,"HILA, AMBON ISLAND",...,1461,4,,,,3,,,,3
2,10532,315,1674,2,17,1674-02-17 11:30:00,,INDONESIA,MALUKU,"SEITH (CEYT), AMBON ISLAND",...,619,3,,,,3,,,,3
3,430,503,1771,4,24,1771-04-24 00:00:00,,JAPAN,OKINAWA,"MIYARA, ISHIGAKI ISLAND",...,13486,4,,,,4,,,3237,4
4,3701,1880,1958,7,10,1958-07-10 06:15:59,,USA,AK,"LITUYA BAY, AK",...,2,1,,,,1,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,15838,2439,2004,12,26,2004-12-26 00:58:53,,INDONESIA,ACEH,"ACEH, SUMATRA",...,,,,,,,,,,
76,22321,5413,2011,3,11,2011-03-11 05:46:24,?,JAPAN,IWATE,"IWATE PREFECTURE, TOHOKU REGION",...,,,,,,,,,,
77,23251,5413,2011,3,11,2011-03-11 05:46:24,?,JAPAN,IWATE,"IWATE PREFECTURE, TOHOKU REGION",...,,,,,,,,,,
78,30477,1954,1964,3,28,1964-03-28 03:36:14,,USA,AK,"CLIFF MINE, VALDEZ INLET",...,,,,,,,,,,


Now read the dataframe  into a Galyleo Table, as usual

In [32]:
from galyleo.galyleo_table import GalyleoTable
table = GalyleoTable('tsunami')
table.load_from_dataframe(dataframe)
