# Week5 Assignment - Cross-Validation.

For this assignment, you will use a generated dataset to practice cross-validation and parameter optimization.

In the first component, you will run linear regression using cross-validation.

In the second component, you will run linear regression using Lasso and cross-validation.

In the third component, you will run linear regression with nested cross-validation.

All the exercises are designed so that the solutions will need only one or a few lines of code.

Do not hesitate to contact instuctors and TA via #week5 channel on Slack if you get stuck. Join the channel first by clicking on Channels.

## Part A. Run linear regression with cross-validation.

In this component you will run linear regression with 5-fold cross-validation.

We have provided you with features X and response y. We've also provided a premade cross-validation iterator.

Build a linear model and use SKLearn's cross_val_score function to assess how well your model generalizes. Save the array from cross_val_score to a variable named cv_score.

When you run cross_val_score, make sure to:
* Use the cross-validation iterator that we provide (cv_iterator)
* Set the scoring function to "neg_mean_squared_error"


In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import make_regression
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

# Generate data
X, y = make_regression(n_samples=100, n_features=100, n_informative=10, random_state=10)

# Iterator setup
cv_iterator = KFold(n_splits=5, shuffle=True, random_state=10)

# YOUR CODE HERE
lr = LinearRegression()
cv_score = cross_val_score(lr, X=X, y=y, cv=cv_iterator, scoring="neg_mean_squared_error")
print(cv_score)
print(np.mean(cv_score))

[ -1609.45778961  -2461.33495653  -9346.72302374 -10590.87569734
  -7253.35475707]
-6252.34924486




In [2]:
assert type(cv_score) == np.ndarray
assert np.isclose(cv_score.mean(), -6327.32948027)

AssertionError: 

## Part B. Run linear regression with Lasso and cross-validation.

Similar to component (A), but this time use Lasso regression with hyperparameter alpha = 0.5.

Again, save the array from cross_val_score to a variable named cv_score.

In [None]:
from sklearn.linear_model import Lasso

cv_iterator = KFold(n_splits=5, shuffle=True, random_state=10)

# YOUR CODE HERE
llr = Lasso(alpha=0.5)
cv_score = cross_val_score(llr, X, y, cv=cv_iterator, scoring="neg_mean_squared_error")
print(cv_score)
print(np.mean(cv_score))

In [None]:
assert type(cv_score) == np.ndarray
assert np.isclose(cv_score.mean(), -4.12944552338)

## Part C. Run nested cross-validation.

Run nested cross-validation, while optimizing for parameter alpha. We have provided the grid for parameter alpha and the CV iterators. Save the final array from cross_val_score into a variable named cv_score.

In [None]:
from sklearn.linear_model import Lasso
from sklearn.model_selection import cross_val_score, KFold, GridSearchCV
from sklearn.metrics import mean_squared_error

# Parameter grid
p_grid = {
    "alpha": [0.1, 0.5, 1, 1.5]
}

# CV iterators
inner_cv_iterator = KFold(n_splits=5, shuffle=True, random_state=10)
outer_cv_iterator = KFold(n_splits=5, shuffle=True, random_state=10)

# YOUR CODE HERE
lasso = Lasso()
llr = GridSearchCV(estimator=lasso, param_grid=p_grid, cv=inner_cv_iterator)
cv_score = cross_val_score(llr, X=X, y=y, cv=outer_cv_iterator, scoring="neg_mean_squared_error")
print(cv_score)
print(np.mean(cv_score))

In [None]:
assert type(cv_score) == np.ndarray
assert np.isclose(-0.166766708646, cv_score.mean())