<h1 style = "text-align: center"> SQL Tutorial </h1>

#### In this tutorial, we'll learn how to leverage SQL to query the data that we need from Big Query. Below are some examples for your reference. Feel free to explore more from the links that we have provided. In the example below, we are querying the entire data frame to investigate it further. 

In [1]:
import os
import pandas
from google.cloud import bigquery
from google.oauth2 import service_account
from google.cloud.bigquery import magics

In [2]:
BIGQUERY_PROJECT = 'ironhacks-covid19-data'
BIGQUERY_KEYPATH = '/home/jovyan/keys/ironhacks-covid19-data-f5b44e38bce9.json'

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = BIGQUERY_KEYPATH
bigquery_client = bigquery.Client(project=BIGQUERY_PROJECT)

In [5]:
query = """
SELECT * FROM `ironhacks-covid19-data.ironhacks_covid19_training.weather_data`
"""

query_job = bigquery_client.query(query)
data = query_job.to_dataframe()
data.head()

Unnamed: 0,date,max_rel_humidity,max_temperature,mean_temperature,min_rel_humidity,min_temperature,potential_water_deficit,precipitation_data,wind_speed
0,2019-10-12,69.2646,13.5804,7.6987,28.2524,1.817,-3.0055,0.0,5.124
1,2019-12-22,93.9565,11.8324,4.6358,34.5712,-2.5607,-1.2796,0.0,2.8557
2,2019-08-24,91.3571,24.6652,18.6607,44.4712,12.6562,-4.7381,0.0,4.4706
3,2020-07-14,90.5395,29.7732,23.158,43.2159,16.5427,-5.8112,0.0,2.3504
4,2019-12-07,98.1103,6.4609,0.5011,42.769,-5.4587,-0.9686,0.0,3.2087


#### The command shown below gives us the names of the columns of the dataset called "weather data"

In [6]:
data.columns.tolist()

['date',
 'max_rel_humidity',
 'max_temperature',
 'mean_temperature',
 'min_rel_humidity',
 'min_temperature',
 'potential_water_deficit',
 'precipitation_data',
 'wind_speed']

#### Next, we wish to extract the date and relative humidity from the table when the date is 2020-06-16. "WHERE" command comes into play here; it is used when we want to get data with specific constraints, the constraint being the date here.

In [7]:
query = """

SELECT date, max_rel_humidity
FROM ironhacks-covid19-data.ironhacks_covid19_training.weather_data
WHERE date='2020-06-16'




"""

query_job = bigquery_client.query(query)
!python3 -m pip install pandas
import pandas
data = query_job.to_dataframe()
data.head()



Unnamed: 0,date,max_rel_humidity
0,2020-06-16,82.2795


#### Now, let's learn how to combine two tables from Google Big Query: "weather_data" and "covid19_tests_cases_deaths_IN". "JOIN" is used here; it combines covid tests from one table with the relative humidity from the other table, in accordance with the dates.

In [None]:
QUERY = """

SELECT
  t1.DATE,
  t1.COVID_TEST,
  t2.max_rel_humidity
FROM ironhacks_covid19_training.covid19_tests_cases_deaths_IN t1
JOIN ironhacks_covid19_training.weather_data t2 ON t1.DATE = t2.date

"""

query_job = bigquery_client.query(QUERY)
!python3 -m pip install pandas
import pandas
data = query_job.to_dataframe()
data.head()