## Homework 5

In [9]:
import pandas as pd
import numpy as np

For some dimension $d$, suppose that we have data $\mathcal S = \{(\textbf{x}_i, y_i)\}_{i=1}^n$, with each $\textbf{x}_i\in\mathbb R^d$ and $y_i\in\mathbb R$. For each $i$, write the vector $\textbf{x}_i$ as 
$$\textbf{x}_i = (x_{i,1},x_{i,2},\ldots,x_{i,d}).$$
To do ridge regression (linear regression with $L_2$ regularization), we would minimize the loss function 
$$\mathcal L_{\mathcal S}(\textbf{w}, b) = \lambda|\textbf{w}|^2 + \frac{1}{n}\sum_{i=1}^n(\textbf{w}\cdot\textbf{x}_i + b - y_i)^2.$$

1. (a) Having the vector of coefficients $\textbf{w} = (w_1,w_2,\ldots,w_d)$, use the notation here to write the partial derivatives $\frac{\partial}{\partial w_j}\mathcal L_{\mathcal S}$, for $1\le j\le d$, as well as the partial derivative $\frac{\partial}{\partial b}\mathcal L_{\mathcal S}$.

1. (b) Let $I$ be the $d\times d$ identity matrix. Given $j$, with $1\le j\le d$, observe that $2\lambda w_j$ is equal to the $j^{th}$ entry in the vector $(2\lambda I)\textbf{w}$.

1. (c) Let $A$ denote the $n\times(d+1)$ "feature" matrix which has entries $(x_{i,1}, \ldots, x_{i,d}, 1)$ in row $i$. Thinking about entries in $A^TA$ and $A^T\textbf{y}$, write a matrix equation that represents the system of equations 
$$\frac{\partial}{\partial w_1}\mathcal L_{\mathcal S} = 0,\quad 
\frac{\partial}{\partial w_2}\mathcal L_{\mathcal S} = 0, \  \ldots,\  
\frac{\partial}{\partial w_d}\mathcal L_{\mathcal S}=0,\quad 
\textrm{and}\quad \frac{\partial}{\partial b}\mathcal L_{\mathcal S}=0.$$

Included with the homework notebook are two CSV files: `'train_HW5data.csv'` and `'test_HW5data.csv'`. These CSV files have a column 'x' and a column 'y'. Read each of the data sets into Python.

2. (a) If `x_train` is an array containing data in column 'x' from the train data, make an array that has 12 columns, so that the columns are `x_train**12`, `x_train**11`, ... etc. The shape of the resulting array should be `(40, 12)`.

In [21]:
train = pd.read_csv('train_HW5data.csv')
test = pd.read_csv('test_HW5data.csv')

x_train = np.array(train['x'])
y_train = np.array(train['y'])
x_test = np.array(test['x'])
y_test = np.array(test['y'])

x_train12 = np.array([x_train**i for i in range(12,0,-1)]).T
x_test12 = np.array([x_test**i for i in range(12,0,-1)]).T


2. (b) Import the Class `Ridge` from the scikit-learn submodule `sklearn.linear_model`. When initializing the class, set the `alpha` (hyper)parameter equal to 0.01 (this is what we called $\lambda$ when describing regularization in class). Read about the `solver` methods for this class on [the documentation page](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge). Set your class instance to use a solver that performs a type of gradient descent. 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Next, using `y_train` for the array with data from column 'y', train your instance of `Ridge` (use the method `.fit()`) on the array you made in (a), with `y_train` as the labels. Print out the coefficients of the resulting model and compute the Mean Squared Error on the test data.

In [31]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error 

model = Ridge(alpha = 0.01, solver = 'sag', max_iter=10000) 
model.fit(x_train12, y_train)
print('coefficients:', model.coef_)

y_pred = model.predict(x_test12)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

coefficients: [ 0.11675336  0.04255559 -0.02180575 -0.06889048 -0.0900023  -0.07683982
 -0.02546185  0.05638509  0.14086825  0.18839607  0.2252151   0.36029699]
Mean Squared Error: 0.016457457155559636


3. Import the Class `Lasso` from the scikit-learn submodule `sklearn.linear_model` (here is the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso)) and create an instance of the class with `alpha = 0.005`.  As in number 2, use the train and test data sets to fit the Lasso model to find a degree 12 polynomial on 'x' with the 'y' column as the response. For this model, which powers of 'x' have non-zero coefficients?

In [32]:
from sklearn.linear_model import Lasso 

model2 = Lasso(alpha = 0.005)
model2.fit(x_train12, y_train)
print("Coefficients:", model2.coef_)

nonzero_powers = [12 - i for i, coef in enumerate(model2.coef_) if coef != 0]
print("Non-zero coefficient powers of x:", nonzero_powers)

Coefficients: [0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.58805528 0.17271755]
Non-zero coefficient powers of x: [2, 1]
