# Q3
## Q3.1
We saw regularization using L1 and L2 losses. A third commonly-used penalty is ElasticNet, which is a weighted combination of the two. For the squared error, the metric would look like:
\begin{equation}
    Err(Y, \hat{Y}) = \sum (Y-\hat{Y})^2 + \alpha ( \beta L_1 + (1-\beta) L_2 )
\end{equation}  
Here, $\alpha$ behaves as it did before: it balances the focus between the prediction error and the regularization term. The additional parameter, $\beta$ balances the relative weight between the L1 and L2 losses. $\beta$ values of 1 and 0 are the same as L1 and L2 regularization, respectively.  
  
For the noisy quadratic data that we saw in Section 3, include `sklearn`'s ElasticNet regression model in the comparison and examine the behaviour of your model with different values of $\alpha$ and $\beta$.  
Between L1, L2, and ElasticNet, which one performs best? (you may need to increase the range of x or the amount of noise to see an appreciable difference).  

Note: `sklearn` uses `l1_ratio` as the parameter name for $\beta$. 

In [None]:
# Generate 
import numpy as np
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.preprocessing import PolynomialFeatures

x_data = np.random.random((200,1))
y_label = (x_data-0.2)*(x_data-0.6) + np.random.randn(x_data.shape[0], 1)*0.10
x_curve = np.linspace(0,1,200)
y_curve = (x_curve-0.2)*(x_curve-0.6)
x_train, x_test, y_train, y_test = train_test_split(x_data, y_label, train_size=0.4)
plt.plot(x_data, y_label,'.')
plt.plot(x_curve, y_curve, '.-')

poly_feat = PolynomialFeatures(degree=4)
feat_train = poly_feat.fit_transform(x_train)
feat_test = poly_feat.fit_transform(x_test)

elastic_test_scr_list = []
elastic_train_scr_list = []
elastic_coef_mag_list = []  # Let's look at the regularization error

ridge_test_scr_list = []
ridge_train_scr_list = []
ridge_coef_mag_list = []

lass_test_scr_list = []
lass_train_scr_list = []
lass_coef_mag_list = []

for i in range(20):  ############ You can modify this value here
    poly_feat = PolynomialFeatures(degree=i)
    feat_train = poly_feat.fit_transform(x_train)
    
    # Elastic
    elastic_mdl = ElasticNet(fit_intercept=False, l1_ratio=0.5, alpha=0.001)
    elastic_mdl.fit(feat_train, y_train)
    
    # Ridge
    ridge_mdl = Ridge(fit_intercept=False, alpha=0.1)     
    ridge_mdl.fit(feat_train, y_train)
    
    # Lasso
    lass_mdl = Lasso(fit_intercept=False, alpha=0.001)
    lass_mdl.fit(feat_train, y_train)
    
    
    # Evaluation
    feat_test = poly_feat.fit_transform(x_test)
    
    elastic_train_scr_list.append(elastic_mdl.score(feat_train, y_train))
    elastic_test_scr_list.append(elastic_mdl.score(feat_test, y_test))
    elastic_coef_mag_list.append(elastic_mdl.coef_ @ elastic_mdl.coef_.T)
    
    ridge_train_scr_list.append(ridge_mdl.score(feat_train, y_train))
    ridge_test_scr_list.append(ridge_mdl.score(feat_test, y_test))
    ridge_coef_mag_list.append(np.squeeze(ridge_mdl.coef_ @ ridge_mdl.coef_.T))

    lass_train_scr_list.append(lass_mdl.score(feat_train, y_train))
    lass_test_scr_list.append(lass_mdl.score(feat_test, y_test))
    lass_coef_mag_list.append(np.squeeze(lass_mdl.coef_ @ lass_mdl.coef_.T))
    
plt.figure()
plt.plot(elastic_test_scr_list)
plt.plot(ridge_test_scr_list)
plt.plot(lass_test_scr_list)
plt.legend(['Elastic','Ridge','Lasso'])

The best performance seems to be Ridge, but we would need to perform cross validation across values of `alpha` and `l1_ratio` to verify.

## Q3.2
This question requires using a Python dictionary. A brief explanation is provided.  
  
Fit the following data using one of the regularized linear regression models:

In [None]:
import numpy as np
from matplotlib import pyplot as plt
x = (np.random.rand(500,1)-0.5)*5
y = (x-3)*(x-0.1)*(x+2)*(4*x+4) + 15*np.random.randn(*x.shape)
plt.plot(x,y,'.')

You can (should) use `sklearn.model_selection.GridSearchCV` to get the optimal value of your regularization weight.  
The function expects a dictionary for the parameter, `parameter_grid`. Python recognizes `{`braces`}` as declaring a dictionary. Values are stored in a dictionary using a key. For example, `my_dict = {'chicken': 4}` would store the data `4` under the key `chicken` in the `my_dict` dictionary. You can get to the stored data using brackets, like you would access an array with indices:  
`my_dict['chicken']`  
  
The dictionary you need to define must use the name of the parameter you're optimizing as a key, followed by a list of values across which you would like to optimize. For example, if you supplied:  
`my_param_dict = {'alpha': [0.1, 0.5, 1, 10]}`  
The `GridSearchCV.fit` method would fit your model to your data using each of those values separately. The results are then stored in your fitted object under `.cv_results_`. See the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html?highlight=gridsearchcv#sklearn.model_selection.GridSearchCV) for another example.  

In [None]:
# Dictionary example
my_dict = {'chicken': 4}
print(my_dict['chicken'])

In [None]:
# Example use of GridSearchCV
from sklearn.model_selection import GridSearchCV

my_param_dict = {'l1_ratio':[0,0.1,0.4,0.8,1]}
# pick a model
# mdl = SomeModel()

# gsv = GridSearchCV(mdl, param_grid=my_param_dict)
# [something with gsv]
# print(gsv.cv_results_)

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.preprocessing import PolynomialFeatures
# Data:
x = (np.random.rand(500,1)-0.5)*5
y = (x-3)*(x-0.1)*(x+2)*(4*x+4) + 15*np.random.randn(*x.shape)

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.6)

hyperparam_search = {'l1_ratio': np.linspace(0,1,10), 'alpha': [1e-3, 1e-3, 1e-2, 0.1, 1]}

mdl = ElasticNet(fit_intercept=False)
poly = PolynomialFeatures(degree=4)  #### Change this value
x_feat_train = poly.fit_transform(x_train)

gsv = GridSearchCV(mdl, param_grid=hyperparam_search)
gsv.fit(x_feat_train, y_train)

In [None]:
# plot our data:
plt.figure()
plt.plot(x,y,'.')
x_curve = np.linspace(np.min(x), np.max(x), np.shape(x)[0])
y_curve = (x_curve-3)*(x_curve-0.1)*(x_curve+2)*(4*x_curve+4)
plt.plot(x_curve, y_curve)
plt.legend(['Data','Generating Curve'])

In [None]:
# best split:
#'rank_test_score' returns the ranking indices for each parameter set. The minimum ('1'), means the best set.
best_ind = np.argmin(gsv.cv_results_['rank_test_score'])
print(best_ind)

In [None]:
alpha = gsv.cv_results_['param_alpha'][best_ind]
l1_ratio = gsv.cv_results_['param_l1_ratio'][best_ind]
score = gsv.cv_results_['mean_test_score'][best_ind]

print(f'alpha: {alpha}')
print(f'l1_ratio: {l1_ratio}')
print(f'score: {score}')

The optimal values for `alpha` and `l1_ratio` are displayed above. These values will change depending on the degree of the fitted polynomial, and the available data.