# Leaning Objectives

### In this module, we will cover:
* Demonstrate the idea behind **linear regression**

* Understand the role of **parameters** in a model

## Regression

**Regression** is one of the simplest supervised learning approaches to learn relationships between input variables ```features``` and output variables ```predictions```.

### Motivation Example: height Vs. weight

Suppose we want to understand the relationship between height and weight... or equivalently, can we **predict** a person's weight from their hight?

<img src="Datasets/Regression_Line.jpg" alt="Drawing" style="width: 700px;"/>

* If we can find such a line, we can use it to make **predictions** (i.e., estimate a person's weight given their height)
* How do we **formulate** the problem of finding a line?
* If no line will fit the data exactly, how to **approximate**?
* What is the "best" line?

### Recap: Equation for a Line

What is the formula describing the line?
$$y = m x + b$$
or
$$\mbox{weight} = m \times \mbox{height} + b$$

<img src="Datasets/Formula_Line.jpg" alt="Drawing" style="width: 700px;"/>

<img src="Datasets/Formula_Line_mul.jpg" alt="Drawing" style="width: 700px;"/>

In fact, we could describe the line as an inner product as:

$$ \mbox{weight} = (\mbox{height}, \mbox{age}, 1) \cdot (m_1, m_2, b)$$

Namely,
$$y = (x_1, x_2, 1) \cdot (m_1, m_2, b)$$



## Linear Regression

In general, 

<img src="Datasets/Linear_Regression.jpg" alt="Drawing" style="width: 400px;"/>

But here we have **many** observations, so we can rewrite using a matrix: 
$$y_i = X_i \cdot \theta$$
<img src="Datasets/Linear_Regression_matrix.jpg" alt="Drawing" style="width: 400px;"/>

### ? Solve for theta
$$ X\theta = y \\
   X^TX\theta = X^Ty \\
   \theta = (X^TX)^{-1}X^Ty
$$

## Summary of concepts

* **Regression** is one of the simplest forms of supervised learning
* **Linear Regression** is essentially equivalent to finding a line that best fits the data
* Can express this as solving a system of matrix equations

## Regression in Python

* Explore how to express linear regression equations in terms of Python data structures
* Work through a (simple) **real-world regression example**
* Compare a "manual" implementation of linear regression to a library function

### Example - Air quality prediction

We'll look at the problem of predicting **air quality**, using an index called pm2.5, measured in Beijing

* This is a "simpler" dataset than some of the others we've been working with, as the relevant features are all real-valued

<img src="Datasets/Beijing_PM25.jpg" alt="Drawing" style="width: 600px;"/>


#### What are we trying to predict?

$$ \mbox{pm2.5} = \theta_0 + \theta_1 \times \mbox{temp}$$


In [1]:
import pandas as pd
df = pd.read_csv("Datasets/Beijing_PM25_air_data.csv", sep=',', header=0)
df.head()

Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0


### Code: Extracting features and labels

In [2]:
dataset = df.dropna(subset=['pm2.5'])
y = dataset['pm2.5']
X = dataset['TEMP']

print (len(X), len(y))

41757 41757


In [3]:
from sklearn import linear_model

### Code: Finding the parameters

So we can predict as:
$$
\mbox{pm2.5} = 107.1 - 0.68 * \mbox{temp}
$$

In [6]:
X= X.to_numpy().reshape(-1, 1)

regr = linear_model.LinearRegression()
regr.fit(X,y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Intercept: 
 107.10183392374576
Coefficients: 
 [-0.68447989]


In [7]:
X = dataset[['TEMP', 'hour']]

regr = linear_model.LinearRegression()
regr.fit(X,y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Intercept: 
 108.4637133461681
Coefficients: 
 [-0.67340075 -0.13034581]


## Time-Series Regression

* Explore options to apply regression to time-series data
* Consider the merits of alternative approaches
* Introduce the "autoregression" framework for time-series regression


### Method 1: "Moving Average"

Using a window of some fixed length.

<img src="Datasets/Moving_AVG.jpg" alt="Drawing" style="width: 500px;"/>

### Method 2: "Exponential Smooth"

Weight the most recent points exponentially higher

<img src="Datasets/Exponential_Smooth.jpg" alt="Drawing" style="width: 500px;"/>

### Time-series Regression: 

Why not just **learn** the weights?

* We can now fit this model using least-squares
* This procedure is known as **autoregression**
* Using this model, we can capture **periodic** effects, e.g., that the traffic of a website is most similar to its traffic 7 days ago

### Example: Air quality prediction

* The air quality data is made up of sequential (hourly) predictions
* So, we can predict the next air quality measurement from the previous ones

In [8]:
import pandas as pd
import numpy as np

df = pd.read_csv("Datasets/Beijing_PM25_air_data.csv", sep=',', header=0)
dataset = (df.dropna(subset=['pm2.5'])).to_numpy()
dataset[0]


array([25, 2010, 1, 2, 0, 129.0, -16, -4.0, 1020.0, 'SE', 1.79, 0, 0],
      dtype=object)

### Extract the autoregressive features

```python [ind-windowSize:ind]] ``` is the vector of previous (windowSize) observations:

* Feature vector is made up of the previous pm2.5 observations
* The number of previous observations is provided as a configurable parameter


In [21]:
def feature(dataset, ind, windowSize):
    previousValues = [float(d[5]) for d in dataset[ind-windowSize:ind]]  
    return previousValues

In [22]:
windowSize = 10
N = len(dataset)

In [23]:
X = [feature(dataset, ind, windowSize) for ind in range(windowSize, N)]
X[:10]

[[129.0, 148.0, 159.0, 181.0, 138.0, 109.0, 105.0, 124.0, 120.0, 132.0],
 [148.0, 159.0, 181.0, 138.0, 109.0, 105.0, 124.0, 120.0, 132.0, 140.0],
 [159.0, 181.0, 138.0, 109.0, 105.0, 124.0, 120.0, 132.0, 140.0, 152.0],
 [181.0, 138.0, 109.0, 105.0, 124.0, 120.0, 132.0, 140.0, 152.0, 148.0],
 [138.0, 109.0, 105.0, 124.0, 120.0, 132.0, 140.0, 152.0, 148.0, 164.0],
 [109.0, 105.0, 124.0, 120.0, 132.0, 140.0, 152.0, 148.0, 164.0, 158.0],
 [105.0, 124.0, 120.0, 132.0, 140.0, 152.0, 148.0, 164.0, 158.0, 154.0],
 [124.0, 120.0, 132.0, 140.0, 152.0, 148.0, 164.0, 158.0, 154.0, 159.0],
 [120.0, 132.0, 140.0, 152.0, 148.0, 164.0, 158.0, 154.0, 159.0, 164.0],
 [132.0, 140.0, 152.0, 148.0, 164.0, 158.0, 154.0, 159.0, 164.0, 170.0]]

In [24]:
y = [float(d[5]) for d in dataset[windowSize:]]

In [25]:
from sklearn import linear_model

regr = linear_model.LinearRegression()
regr.fit(X,y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Intercept: 
 3.923543065080068
Coefficients: 
 [ 0.01296079  0.00414674 -0.00925187  0.00794837 -0.01458271 -0.0164066
  0.00626613  0.04355583 -0.22994129  1.15548897]


### Code: Mixing autoregression and regular regression

Add features for temperature, pressure, and wind-speed
```feat = [float(dataset[ind][7]), float(dataset[ind][8]), float(dataset[ind][10])]```

* Note that we don't need to use autoregression **or** regular regression exclusively
* We can include both types of features simultaneously

In [26]:
def feature(dataset, ind, windowSize):
    feat = [float(dataset[ind][7]), float(dataset[ind][8]), float(dataset[ind][10])]
    previousValues = [float(d[5]) for d in dataset[ind-windowSize:ind]]  
    return feat + previousValues

In [27]:
X = [feature(dataset, ind, windowSize) for ind in range(windowSize, N)]
y = [float(d[5]) for d in dataset[windowSize:]]

In [28]:
from sklearn import linear_model

regr = linear_model.LinearRegression()
regr.fit(X,y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Intercept: 
 166.26145135274538
Coefficients: 
 [-0.17185574 -0.15644142 -0.02504713  0.01507651  0.00366639 -0.00937757
  0.00743749 -0.01508783 -0.01678953  0.00571177  0.04276405 -0.22955893
  1.15031328]


## Summary of concepts

* Demonstrate how to perform autoregression in Python

### On your own...

* Experiment with different sliding window sizes and their impact on performance
* Try alternative approaches (e.g. sliding windows) and compare them to autoregression

### Ilustration

s