# Linear regression

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/insop/ML_crash_course/blob/main/2_linear_regression.ipynb)

## Overview
- recap from previous [notebook](https://github.com/insop/ML_crash_course/blob/main/1_ml_overview.ipynb)
- linear regression framework
- type: which predictor?
- loss function: how good is the predictor?
- optimization: how to compute the best predictor?
- sample example
***

## Recap

- ML as a way to predict target values or types by using training dataset.
- Classification task predicts categories and 
- regression tasks predicts real numbers.

- Supervised learning uses labeled dataset to train
- Unsupervised learning tryies to cluster
- a simple regression example

- `numpy` and `pandas`.
***

Previously, we have reviewed ML as a way to predict target values or types by using training dataset. Classification task predicts categories and regression tasks predicts real numbers.

For the type of training, supervised learning uses labeled dataset to train and unsupervised learning tryies to cluster or find implcit information from unlabeled dataset.
We havre reviewed a simple regression task that predicts life expectancy based on GDP per capita

For the tools, we have reviewed `numpy` and `pandas`.
***

## Linear regression

- Whe we have these point pairs, can we predict $y$ for new $x$, such as 3?
- $x$ :[1,2,4], $y$:[1,3,3]

<div>
<img src="figures/linear_regression_example_1.png" width="500"/>
</div>

***

## Linear regression
- if we can learn this linear line, then we can predict $y$ given new $x$

<div>
<img src="figures/linear_regression_example_2.png" width="500"/>
</div>

***

## Linear regression

- predicts values based on input dataset
- $x$ (input) $\rightarrow$ $f$ (predictor) $\rightarrow$ $\hat{y}$ (output)
- How to design predictor?
    - what type of predictor to use
    - **loss function**: how to measure the goodness?
    - **optimization**: how to compute the predictor? 
***

## Linear regression

- hypothesis predictor class: $w \in \mathbb{R}^2$
- we want to find $f_w(x) = w_1 x + w_2$
- **weight vector**: $W= [w_1, w_2]$
- **feature vector**: $\phi(x) = [x, 1]$
- $f_w(x) = W \cdot \phi(x)$ = $w_1 x + w_2 1$

<div>
<img src="figures/linear_regression_example_3.png" width="500"/>
</div>

***

## Loss function: how good is a predictor?

<div>
<img src="figures/linear_regression_example_4.png" width="500"/>
</div>

***

## Loss function: how good is a predictor?

- $Loss(x, y, W)$ = $(f_w(x) - y)^2$, **squared loss**, ($f_w(x) = w_1 x + w_2$)
- Train_loss($W$) = $\frac{1}{|D_{train}|}\sum_{{(x,y)}\in D_{train}} Loss(x, y, W)$
- Example with $W = [0.7,1]$
    - Loss($1,1,W$) = $(0.7\times1 + 1 - 1)^2$
    - Loss($2,3,W$) = $(0.7\times2 + 3 - 3)^2$
    - Loss($4,3,W$) = $(0.7\times4 + 3 - 3)^2$
    - Train_loss($W$) = $\frac{1}{3}$(Loss($1,1,W$) + Loss($2,3,W$) + Loss($4,3,W$))
    - See the example code below


***

In [1]:
# Loss calculation example

import numpy as np

training_data = {
    'x':[1,2,4], 
    'y':[1,3,3]}

# [0.7, 1]: gree
# [0.3, 2]: red
# [0.57, 1]: other
Ws = [[0.7, 1], [0.3, 2], [-1,-1]]

def phi(x):
    """Get feature of x"""
    return np.array([x, 1])

def dot(X, Y):
    """ Do the dot product """
    return sum([x*y for x, y in zip(X, Y)])

for w in Ws:
    """Evaluate w's loss"""
    print()
    losses = []
    for i,(x,y) in enumerate(zip(training_data['x'], training_data['y'])):
        phi_x = phi(x)

        loss = (dot(w, phi_x) - y)**2
        losses.append(loss)

#         print("loss_{}: {}".format(i, loss))
    print("Total loss for {}: {}".format(w, sum(losses)/len(losses)))
    


Total loss for [0.7, 1]: 0.4966666666666666

Total loss for [0.3, 2]: 0.6299999999999998

Total loss for [-1, -1]: 36.333333333333336


## Visualizing loss function

- Train_loss($W$) = $\frac{1}{|D_{train}|}\sum_{{(x,y)}\in D_{train}} Loss(x, y, W)$

<div>
<img src="figures/loss_visualization.png" width="500"/>
</div>

Figure is from Ref[2]
***

## Optimization: how to find the best $W$?
- **Goal**: find *minimum* Train_loss($W$)
    - min$_w$ Train_loss($W$)
- **gradient**: the gradient $\nabla_w$ Train_loss($W$) is the direction that *increases* the training loss the most
- **Gradient descent** algorithm
    - initialize $w$ = [0, ... 0]
    - set $\eta$, step size (learning rate)
    - For t = 1, ..., T: # called epochs
        - $w \leftarrow w - \eta$ $\nabla$ Train_loss($W$) 
***

## Computing the gradient

- Train_loss($W$) = $\frac{1}{|D_{train}|}\sum_{{(x,y)}\in D_{train}} (W \cdot \phi(x) - y)^2$

- Gradient
    - $\nabla_w$ Train_loss($W$) = $\frac{1}{|D_{train}|}\sum_{{(x,y)}\in D_{train}} 2(W \cdot \phi(x) - y)\phi(x)$
***

We are doing gradient with respect $W$, so other terms can be considered as constant, then we can apply chain rules.

In [2]:
# Gradient descent example

import numpy as np

training_data = {
    'x':[1,2,4], 
    'y':[1,3,3]}

def phi(x):
    """Get feature of x"""
    return np.array([x, 1])

def dot(X, Y):
    """ Do the dot product """
    return sum([x*y for x, y in zip(X, Y)])

debug=False
eta = 0.1
w = np.array([0, 0])

for t in range(500):
    gradients = []
    for i, (x,y) in enumerate(zip(training_data['x'], training_data['y'])):
        phi_x = phi(x)
        gradient = (2*(dot(w, phi_x)-y)*phi_x)
        gradients.append(gradient)
    w = w - eta* sum(gradients)/len(gradients)
    if debug:
        print(sum(gradients)/len(gradients), w)

print()
print("Final weight: {}".format(w))

    


Final weight: [0.57142857 1.        ]


## (?) Features and vectors

## Summary
- review linear regression
- how to form linear regression
- loss function, how to measure goodness of the hypothesis
- optimization, how to find the best parameters (weights)

## Credits

This notebook uses the contents from the followring materials:

1. [cs221 ML linear regression](https://stanford-cs221.github.io/autumn2021-extra/modules/machine-learning/linear-regression.pdf)


## Further readings

1. Chapter 1 from Book [Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
1. [cs221 ML linear regression](https://stanford-cs221.github.io/autumn2021-extra/modules/machine-learning/linear-regression.pdf)
1. [Jovian's Linear Regression with Scikit Learn](https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms/lesson/linear-regression-with-scikit-learn)