# Exercises

There are three exercises in this notebook:

1. Use the cross-validation method to test the linear regression with different $\alpha$ values, at least three.
2. Implement a SGD method that will train the Lasso regression for 10 epochs.
3. Extend the Fisher's classifier to work with two features. Use the class as the $y$.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.datasets import load_iris

## 1. Cross-validation linear regression

You need to change the variable ``alpha`` to be a list of alphas. Next do a loop and finally compare the results.

In [2]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1)

x = np.c_[np.ones((15,1)),x]

I = np.identity(2)

alphas = [0.05, 0.1, 0.2, 0.4, 0.8]
results = []

for alpha in alphas:
    w_scratch = np.linalg.inv(x.T.dot(x) + alpha * I).dot(x.T).dot(y)
    w_scratch = w_scratch.ravel()

    ridge = Ridge(alpha=alpha, fit_intercept=False)
    ridge.fit(x, y)
    w_sklearn = ridge.coef_.ravel()

    if w_sklearn.shape[0] > 1:
        results.append([alpha, w_scratch[0], w_scratch[1], w_sklearn[0], w_sklearn[1]])
    else:
        results.append([alpha, w_scratch[0], w_scratch[1], w_sklearn[0], np.nan])

df = pd.DataFrame(results, columns=['Alpha', 'Scratch Intercept', 'Scratch Coefficient', 'Sklearn Intercept', 'Sklearn Coefficient'])

print(df)

   Alpha  Scratch Intercept  Scratch Coefficient  Sklearn Intercept  \
0   0.05        -130.228040             1.331150        -130.228040   
1   0.10        -101.723971             1.169788        -101.723971   
2   0.20         -70.751422             0.994451         -70.751422   
3   0.40         -43.972861             0.842856         -43.972861   
4   0.80         -25.026648             0.735600         -25.026648   

   Sklearn Coefficient  
0             1.331150  
1             1.169788  
2             0.994451  
3             0.842856  
4             0.735600  


## 2. Implement based on the Ridge regression example, the Lasso regression.

Please implement the SGD method and compare the results with the sklearn Lasso regression results. 

In [3]:
def sgd():
    # your code goes here
    pass

In [4]:
x = np.array([188, 181, 197, 168, 167, 187, 178, 194, 140, 176, 168, 192, 173, 142, 176]).reshape(-1, 1).reshape(15,1)
y = np.array([141, 106, 149, 59, 79, 136, 65, 136, 52, 87, 115, 140, 82, 69, 121]).reshape(-1, 1).reshape(15,1)

x = np.asmatrix(np.c_[np.ones((15,1)),x])

I = np.identity(2)
alpha = 0.1 


w = np.linalg.inv(x.T*x + alpha * I)*x.T*y # update this line
w=w.ravel()


## 3. Extend the Fisher's classifier

Please extend the targets of the ``iris_data`` variable and use it as the $y$.

In [5]:
iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
iris_df.head()

x = iris_df.loc[:, ['sepal width (cm)', 'sepal length (cm)']].values # change here
y = iris_data.target.reshape(-1, 1) # change here

dataset_size = np.size(x)

mean_x, mean_y = np.mean(x), np.mean(y)

SS_xy = np.sum(y * x) - dataset_size * mean_y * mean_x
SS_xx = np.sum(x * x) - dataset_size * mean_x * mean_x

a = SS_xy / SS_xx
b = mean_y - a * mean_x


y_pred = a * x + b

df = pd.DataFrame(y_pred, columns=["",""])
print(df)

                       
0    0.924785  1.051418
1    0.885212  1.035589
2    0.901042  1.019760
3    0.893127  1.011845
4    0.932700  1.043504
..        ...       ...
145  0.885212  1.178051
146  0.845640  1.146393
147  0.885212  1.162222
148  0.916871  1.138479
149  0.885212  1.114735

[150 rows x 2 columns]
