# Linear Regression with BigQueryML
## What's this

this is a demo for using BigQueryML.

BigQuery is not only analyzing BigData, but also creating machine learning model and predicting.

## What is this demo

This demo is using a data that recorded weather condition during ww2.

I use this data and predict MaxTemperature

## Dataset

[Kaggle - Weather Conditions in World War Two](https://www.kaggle.com/account/authenticate/facebook?isModal=true)

## Codes
### Connect BigQuery and SELECT

In [1]:
import pandas as pd
query = 'SELECT * FROM weather_at_ww2.weather'
data_frame = pd.read_gbq(query, 'my-labs-187013', dialect='standard')

In [2]:
data_frame.head()

Unnamed: 0,STA,Date,Precip,WindGustSpd,MaxTemp,MinTemp,MeanTemp,Snowfall,PoorWeather,YR,...,FB,FTI,ITH,PGT,TSHDSBRSGF,SD3,RHX,RHN,RVG,WTE
0,10705,1941-10-09,0,,-17.777778,-17.777778,-17.777778,0,,41,...,,,,,,,,,,
1,10705,1941-10-18,0,,-17.777778,-17.777778,-17.777778,0,,41,...,,,,,,,,,,
2,10705,1941-10-19,0,,-17.777778,-17.777778,-17.777778,0,,41,...,,,,,,,,,,
3,10705,1941-10-22,0,,-17.777778,-17.777778,-17.777778,0,,41,...,,,,,,,,,,
4,10705,1941-10-24,0,,-17.777778,-17.777778,-17.777778,0,,41,...,,,,,,,,,,


### Create a Model(1)

Create a Model that predict MaxTemp with MinTemp.

This time, I use the data between 1940/1/1 and 1943/1/1 to learning.

In [3]:
create_model01_query = """CREATE OR REPLACE MODEL `weather_at_ww2.regression01`
OPTIONS(model_type='linear_reg', labels = ['MaxTemp']) AS
SELECT
  * EXCEPT(Date, STA)
FROM (
  SELECT
    Date,
    STA,
    MaxTemp,
    MinTemp 
  FROM weather_at_ww2.weather
  WHERE Date BETWEEN CAST('1940-01-01' AS date) AND CAST('1943-01-01' as date)
)
"""
data_frame = pd.read_gbq(create_model01_query, 'my-labs-187013', dialect='standard')

### Predict with Model(1)

Then, predict data between 1943/1/2 and 1945/1/1.

And compare true data.

It seems not bad.

In [4]:
predict01_query="""SELECT
  actual.STA,
  actual.Date,
  actual.MinTemp,
  actual.MaxTemp,
  predicted.predicted_MaxTemp
FROM (
  SELECT STA, Date,MinTemp, predicted_MaxTemp
  from ml.predict(model `weather_at_ww2.regression01`, (
    SELECT STA, Date, MinTemp
    FROM weather_at_ww2.weather WHERE Date BETWEEN CAST('1943-01-02' AS date) AND CAST('1945-01-01' as date)
  ))
) AS predicted
JOIN (
  SELECT STA, Date, MinTemp, MaxTemp FROM weather_at_ww2.weather
) AS actual
ON predicted.STA = actual.STA AND predicted.Date = actual.Date
ORDER BY predicted.Date;
"""
data_frame = pd.read_gbq(predict01_query, 'my-labs-187013', dialect='standard')
data_frame.head()

Unnamed: 0,STA,Date,MinTemp,MaxTemp,predicted_MaxTemp
0,10502,1943-01-02,22.222222,30.555556,29.996381
1,12701,1943-01-02,19.444444,28.888889,27.121711
2,10001,1943-01-02,23.333333,30.555556,31.146249
3,31701,1943-01-02,16.666667,26.111111,24.247042
4,11701,1943-01-02,19.444444,31.666667,27.121711


### Create a Model(2)

This time, I made a model predicting key is MinTemp and MeanTemp.

In [5]:
create_model02_query = """CREATE OR REPLACE MODEL `weather_at_ww2.regression02`
OPTIONS(model_type='linear_reg', labels = ['MaxTemp']) AS
SELECT
  * EXCEPT(Date, STA)
FROM (
  SELECT
    Date,
    STA,
    MaxTemp,
    MeanTemp,
    MinTemp 
  FROM weather_at_ww2.weather
  WHERE Date BETWEEN CAST('1940-01-01' AS date) AND CAST('1943-01-01' as date)
)
"""
data_frame = pd.read_gbq(create_model02_query, 'my-labs-187013', dialect='standard')

### Predict with Model(2)

seems to be better.

In [None]:
predict02_query="""SELECT
  actual.STA,
  actual.Date,
  actual.MinTemp,
  actual.MaxTemp,
  predicted.predicted_MaxTemp01,
  predicted.predicted_MaxTemp02
FROM (
  SELECT predicted01.STA, predicted01.Date, predicted01.predicted_MaxTemp AS predicted_MaxTemp01, predicted02.predicted_MaxTemp AS predicted_MaxTemp02
  FROM (
    SELECT STA, Date,MinTemp, predicted_MaxTemp
    FROM ml.predict(model `weather_at_ww2.regression01`, (
      SELECT STA, Date, MinTemp
      FROM weather_at_ww2.weather WHERE Date BETWEEN CAST('1943-01-02' AS date) AND CAST('1945-01-01' as date)
    ))
  ) AS predicted01
  JOIN (
    SELECT STA, Date,MinTemp, predicted_MaxTemp
    FROM ml.predict(model `weather_at_ww2.regression02`, (
      SELECT STA, Date, MinTemp, MeanTemp
      FROM weather_at_ww2.weather WHERE Date BETWEEN CAST('1943-01-02' AS date) AND CAST('1945-01-01' as date)
    ))
   )  AS predicted02 ON predicted01.STA = predicted02.STA AND predicted01.Date = predicted02.Date
) AS predicted
JOIN (
  SELECT STA, Date, MinTemp, MaxTemp FROM weather_at_ww2.weather
) AS actual
ON predicted.STA = actual.STA AND predicted.Date = actual.Date
ORDER BY predicted.Date;
"""
data_frame = pd.read_gbq(predict02_query, 'my-labs-187013', dialect='standard')
data_frame.head()