Underfitting vs. Overfitting
============================

Adapted from: http://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html

This example demonstrates the problems of underfitting and overfitting and
how we can use linear regression with polynomial features to approximate
nonlinear functions. The plot shows the function that we want to approximate,
which is a part of the cosine function. In addition, the samples from the
real function and the approximations of different models are displayed. The
models have polynomial features of different degrees. We can see that a
linear function (polynomial with degree 1) is not sufficient to fit the
training samples. This is called **underfitting**. A polynomial of degree 4
approximates the true function almost perfectly. However, for higher degrees
the model will **overfit** the training data, i.e. it learns the noise of the
training data.
We evaluate quantitatively **overfitting** / **underfitting** by using
cross-validation. We calculate the mean squared error (MSE) on the validation
set, the higher, the less likely the model generalizes correctly from the
training data.



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge

In [None]:
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

np.random.seed(0)

n = 30

# generating a train dataset
X = np.random.rand(n,1)
y = true_fun(X) + np.random.randn(n,1) * 0.1  # second term is noise

plt.scatter(X,y)
plt.show()

X

In [None]:
polynomial_features = PolynomialFeatures(degree=2, include_bias=False)
polynomial_features.fit_transform(X)

In [None]:
# try 1, 4, 15
degree = 15

polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)
linear_regression = LinearRegression()
ridge_regression = Ridge(alpha=.05) # regression with regularization. 
                                    # alpha is the "lambda" used in the slides
    
X_poly = polynomial_features.fit_transform(X)
linear_regression.fit(X_poly, y)
ridge_regression.fit(X_poly, y)

X_test = np.linspace(0, 1, 100).reshape(100,1) # 100 linearly spaced numbers from 0 to 1.
X_test_poly = polynomial_features.fit_transform(X_test)
y_test = true_fun(X_test)

y_test_regression_pred = linear_regression.predict(X_test_poly)
y_test_ridge_pred = ridge_regression.predict(X_test_poly)

# (1)
plt.plot(X_test, y_test_regression_pred, "b") # Blue: model prediction on test set
# (2)
# plt.plot(X_test, y_test_ridge_pred, "b")

plt.plot(X_test, y_test, "r")      # Red: test set generated using true function
plt.scatter(X, y)             # Blue dots: training instances

plt.xlabel("x")
plt.ylabel("y")
plt.xlim(0,1)
plt.ylim(-1.5,1.5)
plt.title("Degree {}".format(degree))
plt.show()

What we observe
--

When we set the polynomial features highest degree to 15, linear regression produces an overfitted line, which fits the training data (blue dots) well but does a poor job on the test set (red line).

Now, comment out (1) and uncomment (2) in the python code. Ridge regression will be used now. This is regression using regularization as explained in class. 
What we observe now is that overfitting is almost eliminated. 

Experiments
--

You can experiment in the above code with different polynomial features degrees, and other alpha (lambda) values, such as the default, one, or 0.1, 0.01, 0.0001, etc. 