# Learning Curves

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize as op
import scipy.io as sio

We will perform some Bias and Variance Testing using the data below. It relates the flowrate of a water leak to the changes in water level in a tank.

First, we will access the data in the cell below.

In [None]:
mat_contents = sio.loadmat("LC.mat")

#print(mat_contents)

#Assign values to matrices

#Training set
X = mat_contents['X']
y = mat_contents['y'].flatten()

#Validation set
X_val = mat_contents['Xval']
y_val = mat_contents['yval'].flatten()

#Test set
X_test = mat_contents['Xtest']
y_test = mat_contents['ytest'].flatten()

print(len(y), len(y_val))

It is always a good idea to plot the data.

In [None]:
#Plot the training data

plt.scatter(X,y)
plt.xlabel('Change in water level (L)')
plt.ylabel('Water flowing out (L)')
plt.show()

A good place to start is doing linear regression. Code the cost function for a regularized linear regression below.

In [None]:
#Regularized Linear Regression

###BEGIN SOLUTION

###END SOLUTION

Now code a function for the gradient.

In [None]:
def Gradient(theta, X,y, l):
  '''Gradient of cost function
    Inputs:
  X = features
  y = training data
  theta = parameters
  l = regularization parameter
  Output:
  grad = gradient of cost function
  '''
###BEGIN SOLUTION

###END SOLUTION

With our cost and gradient functions, let's use optimization tools to find the best parameters for a simple line on our data.

In [None]:
#Optimize
m = len(y)
ones = np.ones(m)

#X_new is our feature matrix with the added column of ones.
X_new = np.vstack((ones,X.T)).T
l = 0
initial_theta = np.array([0,0])
Result = op.minimize(fun = cost, x0=initial_theta, args = (X_new,y,l), method='TNC', jac = Gradient)

Plot the hypothesis along with the data.

In [None]:
x_line = np.linspace(np.min(X),np.max(X),50)

theta_opt = Result.x
y_line = theta_opt[0] + theta_opt[1]*x_line

plt.plot(x_line,y_line)
plt.scatter(X,y)
plt.xlabel('Change in water level (L)')
plt.ylabel('Water flowing out (L)')
plt.show()

Clearly, a simple line is not good enough for the data. What model should we use? To start answering that question, we need to use Learning Curves. Finish the function below which takes in the testing and validation sets along with regularization parameters and returns the error as a function of the number of data points.

In [None]:
#Learning Curves

def learningcurve(X,y,X_val,y_val,l):
  '''
  Function to calculate training and validation error
  Inputs:
  X=training set features (assumes already has columns of 1)
  y=training set data
  X_val = validation features (assumes already has columns of 1)
  y_val = validation data
  l = regularization parameter

  Output:
  [error_train, error_val] = array with training and validation errors as a function of the number of data points
  '''
###BEGIN SOLUTION

###END SOLUTION

With our Learning Curves function coded it up, we can see the results for our system. What kind of issues do the learning curves reveal for this model and data?

In [None]:
#Test learning curve function

m_val = len(y_val)
ones = np.ones(m_val)

#X_new is our feature matrix with the added column of ones.
X_val_new = np.vstack((ones,X_val.T)).T


error_train,error_val = learningcurve(X_new,y,X_val_new,y_val,0)

plt.plot(error_train)
plt.plot(error_val)
plt.show()

Clearly a simple line is not enough. Adding polynomial features is something we could try, but that means we need to alter our feature matrix accordingly. What is the function below doing? Add the appropriate comments.

In [None]:
#Say we want to try a polynomial of degree 8.

#Need to build the appropriate feature matrices.
def build_feat(X,p):
  X_feat = X.copy()
  for i in range(2,p+1):
    X_feat = np.vstack((X_feat.T,X_feat[:,1]**i)).T
  return X_feat

X_feat = build_feat(X_new,8)

Adding polynomial features also means we are changing the scale of the features. Standardization in this case becomes necessary if we want to use optimization algorithms. Complete the function below.

In [None]:
#Now need to normalize (standardize) the data

## Write function to standardize data

def featureNormalize(X):
  '''Function that takes as input an array of data and outputs a standardized dataset
  '''
  X_norm = X.copy()
  ###BEGIN SOLUTION

  ###END SOLUTION
  return X_norm

norm = featureNormalize(X_feat)

print(norm, np.mean(norm,0),np.std(norm,0))

Now use the cell below to find the optimal thetas for our new hypothesis (set regularization to 0). Plot the data with your optimal hypothesis.

In [None]:
###BEGIN SOLUTION


###END SOLUTION

With our data and model plotted, plot the learning curves for the model.

In [None]:
###BEGIN SOLUTION

###END SOLUTION

Use a regularization value of 1 and determine the optimal parameters. Plot the data along with the final hypothesis.

In [None]:
###BEGIN SOLUTION

###END SOLUTION

Plot the learning curves

In [None]:
###BEGIN SOLUTION

###END SOLUTION

Repeat the process (finding optimal thetas and plotting the learning curves) but using a regularization value of 100.

In [None]:
###BEGIN SOLUTION

###END SOLUTION

In [None]:
###BEGIN SOLUTION

###END SOLUTION

How should we pick lambda? Using the concepts that we have discussed in lecture (learning curves), approximately what value of lambda should we choose?

In [None]:
#How to pick lambda?

l_vec = np.array([0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10])

###BEGIN SOLUTION

###END SOLUTION