# Grade: /100 points

# Assignment 01: Supervised learning, Linear models, and Loss functions

In this assignment, you're going to write your own methods to fit a linear model using either an OLS or LAD cost function.  

## Data set 

We will examine a data set containing 200 observations. The data set has 3 variables:

1. $y$: the outcome of interest.
2. $x1$: first predictor.
3. $x2$: second predictor.

## Follow These Steps Before Submitting
Once you are finished, ensure to complete the following steps.

1.  Restart your kernel by clicking 'Kernel' > 'Restart & Run All'.

2.  Fix any errors which result from this.

3.  Repeat steps 1. and 2. until your notebook runs without errors.

4.  Submit your completed notebook to OWL by the deadline.


## Preliminaries

In [None]:
# Import all the necessary packages: 
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
import scipy.stats as ss 
import scipy.optimize as so
from sklearn import linear_model

%matplotlib inline

#### Before you start...

Recall that L1 loss function (sum of magnitudes, used for LAD model):

$$L_1(\theta) = \sum_{i=1}^{n} \lvert {y_i-\hat{y_i}} \rvert$$

L2 loss function (RSS, residual sum of squares, used for OLS model):

$$L_2(\theta) = \sum_{i=1}^{n} ({y_i-\hat{y_i}})^2$$


## Part 1
### Question 1.1:  /10 points


Read the `my_dataset.csv` file as a `pandas.DataFrame` and (1) change column names $x1$ and $x2$ to pred_1 and pred_2, respectively (2) print the first 10 rows. Remove the 5th observation since its pred_1 value is an outlier. Using the resulted data set from the previous step, for the 5th observation replace its pred_1 value (i.e., NAN) with 9.5. Print the first 10 rows of the final data set. Plot a scatterplot for every pairwise combination (scatterplot matrix), and briefly discuss the relationships between the variables. Based on your findings, discuss whether a simple linear regression is appropriate here or not.

In [None]:
# Your code here.

**Written answer: What do you see here? Discuss your findings**

In [None]:
# Your answer here.

### Question 1.2: /5 point

Recall that the linear model, we obtain predictions by computing 

$$ \hat{\mathbf{y}} = \mathbf{X} \hat{\beta} $$

Here, $\mathbf{X}$ is a design matrix which includes a column of ones, $\hat{\beta}$ are coefficients, and $\hat{\mathbf{y}}$ are outcomes.  Write a function `linearModelPredict` to compute linear model predictions given data and a coefficient vector.  The function should take as it's arguments a 1d-array of coefficients `b` and the design matrix `X` as a 2d-array and return linear model predictions `yp`.

Test the function by setting 

```
X = np.array([[1,0],[1,-1],[1,2]])
b = np.array([0.1,0.3])
```

Call your function using these values. 

Report $\hat{\mathbf{y}}$. 

What is the dimensionality of the numpy-array that you get back? 

Hint:  Read the documentation for `np.dot` or the `@` operator in `numpy`.

In [None]:
# Your code here.

### Question 1.3: /15 points

Write a function `linearModelLossRSS` which computes and returns the loss function for an OLS model parameterized by $\beta$, as well as the gradient of the loss.  The function should take as its first argument a 1d-array `beta` of coefficients for the linear model, as its second argument the design matrix `X` as a 2d-array, and as its third argument a 1d-array `y` of observed outcomes.

Test the function with the values 

```
X = np.array([[1,0,1.5],[1,-1,0.5],[1,2,2.5]])
b = np.array([0.1,0.3,0.85])
y = np.array([1.3,0.3,2]) 
```

Report the loss and the gradient. 


In [None]:
# Your code here.

**Written answer**: To minimize the loss, do you need increase or decrease the value of the parameters? 

In [None]:
# Your answer here.

### Question 1.4:  /15 points. 

Now that you've implemented a loss function in question 1.3, it is now time to minimize it!

Write a function `linearModelFit` to fit a linear model.  The function should take as its first argument the design matrix `X` as a 2d-array, as its second argument a 1d-array `y` of outcomes, and as its third argument a function  `lossfcn` which returns as a tuple the value of the loss, as well as the gradient of the loss. As a result, it should return the estimated betas and the $R^2$. 

Test the function with the values: 
```
X = np.array([[1,0,1.5],[1,-1,0.5],[1,2,2.5]])
y = np.array([1.3,0.3,2])
```

Report best parameters and the fitted $R^2$ 


In [None]:
# Your code here.

### Question 1.5: /15 points

Use the above functions to fit your model to the my_data. Use $y$ as the target variable and pred_1 as the predictor. Then use your model and the fitted parameters to make predictions along a grid of equally spaced values for the pred_1 variable. Note that, these new values must be within the original range of the pred_1 variable.  

Plot the data and add a line for the predicted values. You can get these by generating a new X-matrix with 100 equally spaced values (using for example [```np.linspace```](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)). Also report the $R^2$ value for the fit. You can do this by either printing out the $R^2$ of the fit or putting it on your plot via the `annotate` function in matplotlib.


In [None]:
# Your code here.

## Part 2: LAD Regression

### Question 2.1:  /15 points

In the previous section, we worked with the squared loss.  Now, we'll implement a linear model with least absolute deviation loss.

Write a function `linearModelLossLAD` which computes the least absolute deviation loss function for a linear model  parameterized by $\beta$, as well as the gradient of the loss.  The function should take as its first argument a 1d-array `beta` of coefficients for the linear model, as its second argument the design matrix `X` as a 2d-array, and as its third argument a 1d-array `y` of observed outcomes.

Test the function with the values 

```
X = np.array([[1,0,1.5],[1,-1,0.5],[1,2,2.5]])
b = np.array([0.1,0.3,0.85])
y = np.array([1.3,0.3,2]) 
```

Report the loss and the gradient. 

In [None]:
# Your code here.

### Question 2.2: /10 points


Use the above functions to fit your LAD model. Use your model to make predictions along a grid of 100 equally spaced values for pred_1.  Once fit, add the fitted line to the scatter plot as in question 1.5.  Also report the $R^2$-value. 

**Written answer**: What is the difference in the fit obtained with an L1 as compared to the L2 cost function? How their $R^2$ values compare? Why?  

Note: If you recieve an error from the optimizer, it may be because the loss function for the LAD model is not differentiable at its minimum.  This will lead to some gradient based optimizers to fail to converge.  If this happens to you then pass `method="Powell"` to `scipy.optimize.minimize`.



In [None]:
# Your code here

Written answer: Your answer here.

### Question 2.3: /15 points

Now we will use all data for the fit. Fit an OLS model to the my_data with the `linear_model` module from the `sklearn` package by using the [`LinearRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) class.  In no more than two sentences, comment on the $R^2$ values from `sklearn` and the $R^2$ values from your models. Are they similar?

In [None]:
# Your code here

Written answer: Your answer here.