# Introduction

The point of the jupyter notebook is to give some toy code to play with the concepts in section 5.1 - 5.4 in Goodfellow's book.

As usual, Shift + Enter runs a block of code.

# 5.1 Learning Algorithms

$\textbf{5.1.4}$ Example: Linear Regression

Learner: $$\pmb{y} = \beta \pmb{1_m} + w_1 \pmb{x_1} + ... w_n \pmb{x_n}$$
where $\beta$ (bias) and $w_i$ are scalar parameters (to fit), and $\pmb{y, x_i} \in \mathbb{R}^{m \times 1}$ for $i$ = 1,2,..,$n$ are the target and feature columns, respectively, of the given data.

The following will build a model for predicting housing prices in the Boston area. 

We first need to import all necessary python modules with their usual nicknames:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

Now we load in our dataset for housing prices in Boston with 10 predictors. As an example, we print the second data entry (remember, Python is 0-indexed). Notice that it is an array of 13 values, so our data is 13-dimensional.

In [3]:
boston = load_boston()
boston.data[1]

array([2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,
       6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,
       1.7800e+01, 3.9690e+02, 9.1400e+00])

And view the second target value (i.e. home price) 

In [4]:
boston.target[1]

21.6

Now we separate our data and target into easier variable names, and use a built-in `sklearn` function to split our data for training and testing

In [5]:
X = boston.data
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

Now that we've loaded and split our data, we need to define our model. Luckily, `sklearn` has an easy syntax for this.

In [6]:
# we create an instance of a Linear Regressor
linreg = linear_model.LinearRegression(fit_intercept=True, normalize=True)

$$ \nabla_{\pmb{w}} \text{MSE}_\text{train} = 0 $$

$$ \Rightarrow \nabla_{\pmb{w}} \, \frac{1}{m} \, || \, \widehat{\pmb{y}}^\text{(train)} - \pmb{y}^\text{(train)} \, ||_2^2 = 0 $$

$$ \Rightarrow \pmb{w} = (\pmb{X}^{\text{(train)} \, \top} \pmb{X}^\text{(train)})^{-1} \pmb{X}^{\text{(train)} \, \top} \pmb{y}^\text{(train)} $$

But the above model is untrained (i.e. it is just a theoretical model with arbitrary parameters). We now train our model on our specific training dataset. Note: If you are using the online version of the jupyter notebook, this fitting might take a couple seconds).

In [8]:
#fit the model using the training data
linreg.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

So now we have our machine learning algorithm! Given a new 13-dimensional datapoint, we could predict what the target might be. But as with any algorithm, it always helps to check how accurate you expect it to be. Run the following code to generate a mean squared error on both the training data (we expect this to be nice always) and the test data (this could be bad if our model is overfitted).

In [9]:
y_train_pred = linreg.predict(X_train)
y_test_pred = linreg.predict(X_test)
print("MSE between prediction and target")
print("(Mean Squared) Train Error: ", mean_squared_error(y_train, y_train_pred))
print("(Mean Squared) Test Error: ", mean_squared_error(y_test, y_test_pred), " (estimated Generalization Error)")

MSE between prediction and target
(Mean Squared) Train Error:  21.555648194527308
(Mean Squared) Test Error:  24.31823830917051  (estimated Generalization Error)


# 5.2 Capacity, Overfitting, and Underfitting

Increased Capacity (e.g. quadratic features): $$\pmb{y} = \beta \pmb{1_m} + w_1 \pmb{x_1} + ... w_n \pmb{x_n} + \sum_{i}^{n} \sum_{j}^{n} w_{in + j} \pmb{x_i x_j}$$

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
n = 2
poly2 = PolynomialFeatures(n,include_bias=False)

In [None]:
poly2_X_train = poly2.fit_transform(X_train)
poly2_X_test = poly2.fit_transform(X_test)

In [None]:
linreg.fit(poly2_X_train, y_train)

In [None]:
y_train_pred = linreg.predict(poly2_X_train)
y_test_pred = linreg.predict(poly2_X_test)
print("MSE between prediction and target")
print("Train Error: ", mean_squared_error(y_train, y_train_pred))
print("Test Error: ", mean_squared_error(y_test, y_test_pred), " (estimated Generalization Error)")

In [None]:
poly_n_err = np.empty(shape=(0,3))

poly_order = 8
for i in range(1,poly_order):
    poly = PolynomialFeatures(i,include_bias=False)

    poly_X_train = poly.fit_transform(X_train)
    poly_X_test = poly.fit_transform(X_test)

    linreg.fit(poly_X_train, y_train)
    
    y_train_pred = linreg.predict(poly_X_train)
    y_test_pred = linreg.predict(poly_X_test)

    train_err = mean_squared_error(y_train, y_train_pred)
    test_err = mean_squared_error(y_test, y_test_pred)
    score = r2_score(y_test, y_test_pred)

    poly_n_err = np.append(poly_n_err, np.array([[train_err, test_err, score]]), axis=0)

In [None]:
print(poly_n_err[:, :2])

fig = plt.figure()
ax = fig.add_subplot(111)

ax.spines['top'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.tick_params(labelcolor='w', top=False, bottom=False, left=False, right=False)

plt.subplots_adjust(wspace=0.7)

ax1 = fig.add_subplot(131)
ax1.plot(range(1,poly_order), poly_n_err[:,0])
plt.xticks(range(1,poly_order))
plt.title('Train')
plt.ylim([0, 25])

ax2 = fig.add_subplot(132)
ax2.plot(range(1,poly_order), poly_n_err[:,1],'orange')
plt.xticks(range(1,poly_order))
plt.yticks(range(0,23,5))
plt.title('Test')
plt.ylim([0, 25])

ax3 = fig.add_subplot(133)
ax3.plot(range(1,poly_order), poly_n_err[:,1],'orange')
plt.xticks(range(1,poly_order))
plt.title('Test')

ax.set_xlabel('polynomial order')
ax.set_ylabel('mean-squared error')

plt.show()

# 5.3 Hyperparameters and Validation Sets

Regularization: Minimmize $$J(\pmb{w}) = \text{MSE}_\text{train} + \lambda \pmb{w}^T \pmb{w} $$

In [None]:
# import linear regressor with l2 regularization
from sklearn.linear_model import Ridge

In [None]:
# alpha = lambda = 1.0
linreg_Ridge = linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=True, tol=1e-4)

In [None]:
# fit the 7th order polynomial
linreg_Ridge.fit(poly_X_train,y_train)

In [None]:
y_train_pred = linreg_Ridge.predict(poly_X_train)
y_test_pred = linreg_Ridge.predict(poly_X_test)
print("MSE between prediction and target for 7th degree features")
print("Train Error: ", mean_squared_error(y_train, y_train_pred))
print("Test Error: ", mean_squared_error(y_test, y_test_pred), " (estimated Generalization Error)")

In [None]:
reg_err = np.empty(shape=(0,3))

reg_max = 3.0
for i in np.arange(0,reg_max,0.1):
    
    linreg_Ridge = linear_model.Ridge(alpha=i, fit_intercept=True, normalize=True, tol=1e-4)

    linreg_Ridge.fit(poly_X_train, y_train)
    
    y_train_pred = linreg_Ridge.predict(poly_X_train)
    y_test_pred = linreg_Ridge.predict(poly_X_test)

    train_err = mean_squared_error(y_train, y_train_pred)
    test_err = mean_squared_error(y_test, y_test_pred)

    reg_err = np.append(reg_err, np.array([[i,train_err, test_err]]), axis=0)
    
min_test_err_id = np.where(reg_err[:,2]==min(reg_err[:,2]))[0][0]
alpha_min = reg_err[min_test_err_id,0]
print(reg_err)

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111)

ax.spines['top'].set_color('none')
ax.spines['bottom'].set_color('none')
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.tick_params(labelcolor='w', top=False, bottom=False, left=False, right=False)

plt.subplots_adjust(wspace=0.7)

ax1 = fig.add_subplot(131)
ax1.plot(reg_err[:,0], reg_err[:,1])
plt.xticks([0, 1, 2, 3])
plt.yticks(range(5,30,2))
plt.title('Train')

ax2 = fig.add_subplot(132)
ax2.plot(reg_err[:,0], reg_err[:,2],'orange')
plt.xticks([0, 1, 2, 3])
#plt.yticks(range(5,30,2))
plt.title('Test')
plt.ylim([0, 29])

ax2 = fig.add_subplot(133)
ax2.plot(reg_err[:,0], reg_err[:,2],'orange')
plt.xticks([0, 1, 2, 3])
#plt.yticks(range(5,30,2))
plt.title('Test')

ax.set_xlabel('regularization parameter')
ax.set_ylabel('mean-squared error')

plt.show()

K-fold Cross-Validation

In [None]:
from sklearn.model_selection import KFold

In [None]:
split_num = 10
n_errs = np.empty(shape=(1,split_num))

kf = KFold(n_splits = split_num)
kf.get_n_splits(poly_X_train)

linreg_Ridge_opt = linear_model.Ridge(alpha = alpha_min, fit_intercept=True, normalize=True, tol=1e-4)

i = 0
for train_index, test_index in kf.split(poly_X_train, y_train):
    linreg_Ridge_opt.fit(poly_X_train[train_index],y_train[train_index])
    n_errs[0,i] = mean_squared_error(y_train[test_index], linreg_Ridge_opt.predict(poly_X_train[test_index]))
    i = i+1

In [None]:
print(n_errs)
print(sum(n_errs[0])/len(n_errs[0]))