<a href="https://colab.research.google.com/github/rohanwagh01/036-MachineLearning/blob/main/MIT_6_036_Homework_1_Colab_Notebook_Fall_2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MIT 6.036: HW02 Fall 2021





## Setup
First, download the code distribution for this homework that contains test cases and helper functions.

Run the next code block to download and import the code for this lab.

In [1]:
import numpy as np
!rm -rf code_for_hw02*
!wget --quiet https://go.odl.mit.edu/subject/6.036/_static/catsoop/homework/hw02/code_for_hw02.py --no-check-certificate
from code_for_hw02 import *

##4) Regularization and Cross Validation

We will now try to synthesize what we've learned in order to perform ridge regression on the datacommons obesity <a href="https://docs.google.com/spreadsheets/d/1E7AWa69QUoybnAsBrXYR93Y44isph6-1/edit?usp=sharing&ouid=115847065627713602338&rtpof=true&sd=true">dataset</a>. Unlike in lab02, where we did some simple linear regressions, here we now employ and explore regularization, with the goal of building a model which generalizes better (than without regularization) to unseen data.

The metric we will use to measure the quality of our learned predictors is ** Mean Square Error (MSE). ** This is useful metric because it gives a sense of the deviation in the natural units of the predictor. MSE is defined as:

$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^n \left( y^{(i)} - f(x^{(i)}) \right)^2 $$

where $f$ is our learned predictor: in this case, $f(x) = \theta \cdot x + \theta_0$. This gives a measure of how far away the true values are from the predicted values.

We will use ** ridge regression **, which is defined by this form:

$$ \lambda || \theta ||^2 $$

where $\lambda$ is the regularization parameter. The overall objective function is thus

$$ J_\text{ridge}(\theta,\theta_0)=\frac{1}{n}\sum_{i=1}^n\left(\theta^Tx^{(i)}+\theta_0-y^{(i)}\right)^2+\lambda||\theta||^2 $$

Remarkably, there is an analytical function giving $\Theta = (\theta, \theta_0)$ which minimizes this objective, given $X$, $Y$, and $\lambda$. But how should we choose $\lambda$?

To choose an optimum $\lambda$, we can use the following approach. Each particular value of $\lambda$ gives us a different linear regression model. And we want the best model: one which balances providing good predictions (fitting well to given training data) with generalizing well (avoiding over-fitting training data). And as we see in the lecture notes, we can employ ** cross-validation ** to evaluate and compare different models.

###Implementation of cross-validation algorithm

Let us begin by implementing this algorithm for cross-validation:

<img src="https://go.odl.mit.edu/subject/6.036/_static/catsoop/homework/hw02/IMAGE-cross-validation-algorithm-from-notes.png">

We'll split this into a few parts, and have you implement three short functions that build up to an implementation of the above algorithm.

####4A) `lin_reg`

We will first implement a generic linear regression function, `lin_reg`, that has the following input arguments:
* `x`: the list of data points ($d\times n$)
* `th`: the coefficients of the regression ($d\times1$)
* `th0`: the offset ($1\times1$)

Our function `lin_reg` returns a $1\times n$ matrix:
* `y`: the result of applying the regression on `x`

In [2]:
def lin_reg(x, th, th0):
    return th.T@x+th0

test_lin_reg(lin_reg)

All test cases passed!


####4B) `square_loss`

Next, we will implement a function that calculates the squared loss of linear regression, `square_loss`, that has the following input arguments:
* `x`: the list of data points ($d\times n$)
* `y`: the true values of the responders ($1\times n$)
* `th`: the coefficients of the regression ($d\times1$)
* `th0`: the offset ($1\times1$)

Our function `square_loss` returns a $1\times n$ matrix, the squared loss of this linear regression on each data point. A working implementation of `lin_reg` will be available to you on catsoop.

In [3]:
def square_loss(x, y, th, th0):
    return np.square(lin_reg(x, th, th0) - y)

test_square_loss(square_loss)

All test cases passed!


####4C) `mean_square_loss`

Now, we will implement a function that calculates the mean squared loss of linear regression, `mean_square_loss`, that has the following input arguments:
* `x`: the list of data points ($d\times n$)
* `y`: the true values of the responders ($1\times n$)
* `th`: the coefficients of the regression ($d\times1$)
* `th0`: the offset ($1\times1$)

Our function `mean_square_loss` returns a $1\times1$ matrix, the mean squared loss of this linear regression on the list of data points. Working implementations of both `lin_reg` and `square_loss` will be available to you on catsoop.

In [4]:
def mean_square_loss(x, y, th, th0):
    return (1/y.shape[1])*np.sum(square_loss(x, y, th, th0))

test_mean_square_loss(mean_square_loss)

All test cases passed!


####4D) `cross_validate`

Last, we will implement a the above cross-validation algorithm in `cross_validate`, that has the following input arguments:
* `X`: the list of data points ($d\times n$)
* `Y`: the true values of the responders ($1\times n$)
* `n_splits`: the number of chunks to divide the dataset into
* `lam`: the regularization parameter

Our function `cross_validate` returns a scalar, the cross validation error of applying linear regression on the list of data points. Working implementations of `lin_reg`, `square_loss`, and `mean_square_loss` will be available to you in catsoop, along with the following functions:

```python
def make_splits(X, Y, n_splits):
    '''Splits the dataset into n chunks, creating 10 sets of cross validation data.
    Returns a list of n tuples (X_train, Y_train, X_test, Y_test)
    
    X : d x n numpy array (d = # features, n = # data points)
    Y : 1 x n numpy array
    n_splits: int'''

def ridge_analytic(X_train, Y_train, lam=lam):
    '''Applies analytic ridge regression on the given training data.
    Returns th, th0

    X : d x n numpy array (d = # features, n = # data points)
    Y : 1 x n numpy array
    lam : (float) regularization strength parameter
    th : d x 1 numpy array
    th0: 1 x 1 numpy array'''

```

In [6]:
def cross_validate(X, Y, n_splits, lam):
    test_errors = []
    for (X_train, Y_train, X_test, Y_test) in make_splits(X, Y, n_splits):
        th, th0 = ridge_analytic(X_train, Y_train, lam=lam)
        test_errors.append(mean_square_loss(X_test, Y_test, th, th0))
    return np.array(test_errors).mean()

test_cross_validate(cross_validate)

All test cases passed!
