## Homework 2
## Practice Linear Regression and Hyperparameter Search


This assignment is aimed to help you get more experience with [linear models](https://scikit-learn.org/stable/modules/linear_model.html) (especially linear regression) and [hyperparameter search](https://scikit-learn.org/stable/model_selection.html) in scikit-learn library.

In [None]:
import numpy as np

import matplotlib.pyplot as plt

# Today's data

400 fotos of human faces. Each face is a 2d array [64x64] of pixel brightness.

In [None]:
from sklearn.datasets import fetch_olivetti_faces
data = fetch_olivetti_faces().images

Let's see some faces

In [None]:
# this code showcases matplotlib subplots.
fig, ax = plt.subplots(2, 2, figsize=(12, 12))
ax = ax.flatten()

for i in range(4):
    ax[i].imshow(data[i],cmap='gray')

plt.show()

# Face reconstruction problem

Let's solve the face reconstruction problem: given left halves of facex __(X)__, our algorithm shall predict the right half __(y)__. The idea of this approach is that left face half actually contains quite enough information to reconstruct the right face half (at least partially). Moreover in this task we'll also see, that scikit-learn linear models are capable of predicting multiple targets for a single object example.

Our first step is to slice the photos into X and y using slices.
__Slices in numpy:__
* In regular python, slice looks roughly like this: `a[2:5]` _(select elements from 2 to 5)_
* Numpy allows you to slice N-dimensional arrays along each dimension: [image_index, height, width]
  * `data[:10]` - Select first 10 images
  * `data[:, :10]` - For all images, select a horizontal stripe 10 pixels high at the top of the image
  * `data[10:20, :, -25:-15]` - Take images [10, 11, ..., 19], for each image select a _vetrical stripe_ of width 10 pixels, 15 pixels away from the _right_ side.

__Your task:__

Let's use slices to select all __left image halves as X__ and all __right halves as y__.

In [None]:
# select left half of each face as X, right half as Y
X = <Slice left half-images>
y = <Slice right half-images>

In [None]:
# If you did everything right, you're gonna see left half-image and right half-image drawn separately in natural order
plt.subplot(1,2,1)
plt.imshow(X[0],cmap='gray')
plt.subplot(1,2,2)
plt.imshow(y[0],cmap='gray')

assert X.shape == y.shape == (len(data), 64, 32), "Please slice exactly the left half-face to X and right half-face to Y"

In [None]:
def glue(left_half,right_half):
    # merge photos back together
    left_half = left_half.reshape([-1, 64, 32])
    right_half = right_half.reshape([-1, 64, 32])
    return np.concatenate([left_half, right_half],axis=-1)


# if you did everything right, you're gonna see a valid face
plt.imshow(glue(X, y)[99], cmap='gray')

# Machine learning stuff

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X.reshape([len(X), -1]),
                                                    y.reshape([len(y), -1]),
                                                    test_size=0.05, random_state=42)

print(X_test.shape)

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, Y_train)

measure [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error):

$$MSE(\widehat{\theta}) = \mathbf{E}_{\theta}[(\theta - \widehat{\theta})^2] $$

In [None]:
from sklearn.metrics import mean_squared_error
mse_train = mean_squared_error(Y_train, model.predict(X_train))
mae_test = mean_squared_error(Y_test, model.predict(X_test))

print(f"Train MSE: {mse_train:.3f}")
print(f"Test MSE: {mse_train:.3f}")

---

## Why train error is much smaller than test?

In [None]:
# Train predictions
pics = <YOUR CODE> # reconstruct and glue together X and Y for the train dataset
plt.figure(figsize=[16, 12])
for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.imshow(pics[i], cmap='gray')

In [None]:
# Test predictions
pics = <YOUR CODE> # reconstruct and glue together X and Y for the test dataset
plt.figure(figsize=[16, 12])
for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.imshow(pics[i], cmap='gray')

---

Remember regularisation? That is exactly what we need. There are many many linear models in sklearn package, and all of them can be found [here](https://scikit-learn.org/stable/modules/linear_model.html). We will focus on 3 of them: Ridge regression, Lasso and ElasticNet.
Idea of all of them is very simple: Add some penalty to the objective loss function to prevent overfitting.

# Ridge regression
[RidgeRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) is just a LinearRegression, with l2 regularization - penalized for $ \alpha \cdot \sum _i w_i^2$

Let's train such a model with alpha=0.5

In [None]:
from <YOUR CODE> import <YOUR CODE>

ridge = <YOUR CODE>

<YOUR CODE: fit the model on training set>

In [None]:
<YOUR CODE: predict and measure MSE on train and test>

In [None]:
# Test predictions
pics = <YOUR CODE> # reconstruct and glue together X and Y for the test dataset
plt.figure(figsize=[16, 12])
for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.imshow(pics[i], cmap='gray')

---

# Grid search

Train model with diferent $\alpha$ and find one that has minimal test MSE. It's okay to use loops or any other python stuff here.

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
def train_and_plot(model, parameter_dict):
    """This function takes a model and parameters
    dict as input and plot a graph of MSE loss VS parameter value"""
    # use GridSearchCV as before to do grid search
    gscv = GridSearchCV(<Your code>)
    <Fit your model>
    plt.errorbar(gscv.param_grid['alpha'],
                 gscv.cv_results_['mean_test_score'],
                 gscv.cv_results_['std_test_score'],
                 capsize=5, label=model.__str__().split("(")[0])
    plt.xscale("log", nonposx='clip')
    plt.xlabel("alpha")
    plt.ylabel("negative MSE")
    plt.grid()
    plt.legend()

In [None]:
plt.figure(figsize=(12, 6))

models = <YOUR CODE> # Start from Ridge regression, but feel free to add 
                     # Lasso and ElasticNet. Note that the latter two cannot
                     # be solved analytically and typically are much slower
                     # to fit than Ridge regression (so you may want to limit
                     # the number of grid points).

parameters_dicts = <YOUR CODE> # It should be a list of dicts:
                               # one parameters dict for each model
for model, parameters_dict in zip(models, parameters_dicst):
    train_and_plot(model, parameters_dict)

---

In [None]:
# Test predictions
pics = glue(X_test, <predict with your best model>)
plt.figure(figsize=[16, 12])
for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.imshow(pics[i], cmap='gray')

In [None]:
from sklearn.linear_model import Lasso, ElasticNet

# Use the code you have just done to do GridSearch for Lasso and/or ElasticNet
# models (if you haven't already). Note that Lasso and ElasticNet are much
# slower to fit, compared to Ridge.
<YOUR CODE>

---

## Bonus part

Try using `sklearn.linear_model.SGDRegressor` with `huber` loss in the code above instead of `LinearRegression`. Is it better in this case?

In [None]:
<Your code for bonus part>

P.S. This assignment is inspired by [YSDA materials](https://github.com/yandexdataschool).