# <font color=darkcyan> Multivariate linear regression - Lasso </font>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import colors 
from sklearn.metrics import mean_squared_error

It is assumed that for all $1\leqslant i \leqslant n$, 

$$
Y_i = X^\top_i \beta_{\star} + \varepsilon_i\,,
$$

where the $(\varepsilon_i)_{1\leqslant i\leqslant n}$ are i.i.d. random variables in $\mathbb{R}$, $X_i\in\mathbb{R}^d$ and $\beta_{\star}$ is an unknown vector in $\mathbb{R}^d$. Let $Y\in\mathbb{R}^n$ (resp. $\varepsilon\in\mathbb{R}^n$)  be the random vector such that  for all $1\leqslant i \leqslant n$, the $i$-th component of $Y$ (resp. $\varepsilon$) is $Y_i$ (resp. $\varepsilon_i$) and $X\in\mathbb{R}^{n\times d}$ the matrix with line $i$ equal to $X^\top_i$. The model is then written

$$
Y = X \beta_{\star} + \varepsilon\,.
$$

In this section, it is assumed that $\mathbb{E}[\varepsilon] = 0$ and $\mathbb{E}[\varepsilon \varepsilon^\top] = \sigma_{\star}^2 I_n$. The Lasso estimate of $\beta_{\star}$ is defined as a solution to

$$
\widehat \beta_n\in  \mathrm{argmin}_{\beta\in\mathbb{R}^d}\,\left( n^{-1}\|Y - X\beta\|_2^2 + \lambda \|\beta\|_1\right)\,,
$$

where $\lambda>0$.

<font color=darkred> Explain the coordinate-wise optimization procedure </font>

#### Import data

In [None]:
import pandas as pd

Data frames can be imported using pandas. This provides two-dimensional and heterogeneous tabular data.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

<font color=darkred>
Import data in the file BRinf using ``read_csv``, display the first rows with ``head`` and the shape of the dataframe using ``shape``.
</font>

In [None]:
# In this section, multivariate linear regression is used to predic the Brazilian inflation based on
# many observed variables, see https://github.com/gabrielrvsc/HDeconometrics/
df = pd.read_csv('BRinf.txt')
df.head()

In [None]:
# number of observations, number of variables
df.shape

<font color=darkred>
Use the ``StandardScaler`` of sklearn to preprocess the input variables.
</font>

``StandardScaler`` standardizes the input variables by removing the mean and scaling to unit variance.
We will not analyze closely standardization in this course. However, it is often very useful (even mandatory in some cases) for the stability of learning procedures.
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

<font color=darkred>
Build two datasets. 
    ``X_train`` and ``Y_train`` contain the first 140 input data and observations. ``X_test`` and ``Y_test`` contain the remaining input data and observations. We train a linear regression model using ``X_train`` and ``Y_train`` and we assess the performance of the model using ``X_test`` and ``Y_test``. 
</font>

https://pandas.pydata.org/docs/reference/frame.html

#### Lasso Regression from scractch

<font color=darkred>
Write a ``threshold_function`` function with arguments a real number ``z`` and a positive number ``$\alpha$`` which returns 
$$
\begin{cases}
z+\alpha & \text{if } z<-\alpha, \\
z-\alpha & \text{if } z>\alpha, \\
0 & \text{otherwise}. \\
\end{cases}
$$
</font>

<font color=darkred>
    Write a ``coordinate_descent_lasso`` function with arguments an initial estimate ``$\beta$``, the data ``X`` and ``y``, a penalty parameter ``$\alpha$`` and a number of iterations ``n_iter``. The function returns the parameter estimate after n_iter iterations of the coordinate-wise optimization procedure.
    </font>

To make things simpler, you can write the function with $\alpha = \lambda n /2$ in the mathematical derivation above.

<font color=darkred>
    Run the algorithm with several values of $\alpha$ using X_train and Y_train and display the number of zero coeficients of the parameter estimate and the MSE obtained on the test set.
    </font>

#### Lasso Regression with Sklearn

In [None]:
from sklearn.linear_model import Lasso, LassoCV, Ridge, RidgeCV

<font color=darkred>
Create a np array with several values of the penalty parameter (called $\alpha$ in Python)
</font>

<font color=darkred>
Use the ``fit`` function of sklearn to fit a Lasso model with for each value of $\alpha$. 
    
Store the estimated parameter, the number of zeros in the estimated parameter and the MSE on the test set after each training.
</font>

<font color=darkred>
Display the estimated parameters as a function of the penalty parameter.
</font>

<font color=darkred>
Display the number of zero coefficients of the estimated parameter as a function of the penalty parameter.
</font>

<font color=darkred>
Display the MSE on the test set as a function of the penalty parameter.
</font>