<h1 style = "text-align: center"> SQL Tutorial </h1>

In this tutorial, we'll learn how to leverage SQL to query the data that we need from Big Query. It will introduce to the following basic sql commands: 

1. SELECT FROM
2. WHERE
3. GROUP BY and JOIN

You will be working with two different tables in the project `ironhacks-data`: `ironhacks-data.ironhacks_training.weather_data` and `ironhacks-data.ironhacks_training.covid19_cases`

You can find out more about the schema in those tables [here](https://docs.google.com/spreadsheets/d/1IowaQ8bDQA7xvc92TzpJ252KsHPDL6zbi2mdXNr3irs/edit?usp=drive_web&ouid=111649936971597408311). Indeed, it is important that you make yourself familiar with this schema before you start with this tutorial.

In this tutorial we will use two different libaries. Namely, the `bigquery` library, and the `pandas` library. The later will be used to display the tables as a dataframe. 


## Step 1: Setting up the connection
As a first step you have to set up your connection (see tutorial #2)

In [9]:
import os
import pandas
from google.cloud import bigquery
from google.oauth2 import service_account
from google.cloud.bigquery import magics

In [10]:
BIGQUERY_PROJECT = 'ironhacks-data'

bigquery_client = bigquery.Client(project=BIGQUERY_PROJECT)

## Step 2: `SELECT FROM` command
Now, we will use the SELECT FROM command to query the table with the weather data in our training dataset. 


In [11]:
query = """
SELECT * FROM `ironhacks-data.ironhacks_training.weather_data`
"""

query_job = bigquery_client.query(query)
data = query_job.to_dataframe()
data.head()

Unnamed: 0,date,max_rel_humidity,max_temperature,mean_temperature,min_rel_humidity,min_temperature,potential_water_deficit,precipitation_data,wind_speed
0,2019-10-12,69.2646,13.5804,7.6987,28.2524,1.817,-3.0055,0.0,5.124
1,2019-12-22,93.9565,11.8324,4.6358,34.5712,-2.5607,-1.2796,0.0,2.8557
2,2019-08-24,91.3571,24.6652,18.6607,44.4712,12.6562,-4.7381,0.0,4.4706
3,2020-07-14,90.5395,29.7732,23.158,43.2159,16.5427,-5.8112,0.0,2.3504
4,2019-12-07,98.1103,6.4609,0.5011,42.769,-5.4587,-0.9686,0.0,3.2087


## Step 3: `WHERE` COMMAND
Next, we wish to extract the date and relative humidity from the table when the date is 2020-06-16. 
This is where are using the WHERE command. 
We combine this with a pandas sytax to display the results (the first few lines of the table). 

In [12]:
query = """

SELECT date, max_rel_humidity
FROM ironhacks-data.ironhacks_training.weather_data
WHERE date='2020-06-16'




"""

query_job = bigquery_client.query(query)
!python3 -m pip install pandas
import pandas
data = query_job.to_dataframe()
data.head()



Unnamed: 0,date,max_rel_humidity
0,2020-06-16,82.2795


##  STEP 3: `GROUP BY` and `JOIN` command. 

Now, you want to build a single table that contains information about COVID19 cases and also weather information! This information is in two different tables namely `ironhacks-data.ironhacks_training.weather_data` and `ironhacks-data.ironhacks_training.covid19_cases`! Specifically, you want to build a table with the following parameters. `mean_temperature`, `wind_speed`, and `cases`.  

To do so, we have to first bring the data to the same level of granularity. So the steps are: 

1. Aggregagate the `weather_data` so that you are reporting weekly using the `GROUP BY` command. 
2. Join the newly created temporary table with the `covid19_cases` using the `join` command.
3. ordering the results by week_number using the `order by` command. 
3. Displaying the results with pandas. 

In [13]:
query = """

Select 
a.*,
b.cases 

FROM 

(SELECT 
extract(week(Monday) from date) as week_number,
AVG(mean_temperature) as mean_temperature_week,
date as start_date,
AVG(wind_speed) as mean_wind_speed_week
FROM `ironhacks_training.weather_data`
group by week_number,start_date) a

JOIN `ironhacks-data.ironhacks_training.covid19_cases` b 
ON a.week_number=b.week_number
order by week_number



"""

query_job = bigquery_client.query(query)
!python3 -m pip install pandas
import pandas
data = query_job.to_dataframe()
data.head()



Unnamed: 0,week_number,mean_temperature_week,start_date,mean_wind_speed_week,cases
0,1,8.5567,2020-01-06,2.8754,5289
1,1,13.467,2020-01-07,4.0858,5289
2,1,8.436,2020-01-08,4.18,5289
3,1,-1.1626,2020-01-09,7.8843,5289
4,1,1.2549,2020-01-11,7.3401,5289


> We are at the end. This was a very simple introduction into SQL using the `bigquery` library. If you want a more advanced tutorial please let the IronHacks team know. 