# Overfitting and Undefitting in Linear Regression

Let's start by importing *numpy*.

In [1]:
import numpy as np

### 1. Function for fitting $w$ using closed form

$w = (X^T X)^{-1} X^T y$


In [2]:
def fit(X, y):
    w = np.linalg.inv(X.T@X)@X.T@y
    return w

### 2. Function for predicting given $w$

$\hat{y} = Xw$


In [3]:
def predict(X, w):
    return X@w

### 3. Function for computing Mean Squared Error

$MSE = \frac{1}{M}\sum^{M}_{i=1} {(y_i, \hat{y}_i)^2}$

In [4]:
def mse(y_pred, y_true):
    return np.mean((y_pred-y_true)**2)

### 4. Defining the training data

We use the provided data and we add a column of *ones* that will allow us to learn the bias term.

In [5]:
X_train = np.array([
    [1, -0.8, 2.8],
    [1, 0.3, -2.2],
    [1, 1.5, 1.1]
])
y_train = np.array([
    [-8.5],
    [12.8],
    [3.8]
])

In [6]:
X_test = np.array([
    [1, -2,2],
    [1, -4,15]
])
y_test = np.array([
    [-7],
    [-63]
])

### 5. Model that overfits:

We need a complex model, in this case, with a lot of parameters/input features. We can create new features by multiplying $x_1$ (first column features) with $x_2$ (second column features), and squaring $x_1$, $x_2$. So our new features are:

- $x_1 = x_1$
- $x_2 = x_2$
- $x_3 = x_1 \cdot x_2$
- $x_4 = x_1^2$
- $x_5 = x_2^2$

Thefore, we need to find the weights of a polynomial model ($w_0$ is the bias):

$\hat{y} = w_0 +w_1\cdot x_1+w_2\cdot x_2+w_3\cdot x_3+w_4\cdot x_4+w_5\cdot x_5 $

$\hat{y} = w_0 +w_1\cdot x_1+w_2\cdot x_2+w_3\cdot x_1 \cdot x_2+w_4\cdot x_1^2+w_5\cdot x_2^2 $


In [7]:
def transform_features (X):

    #transform
    x3 = np.multiply(X[:,1],X[:,2]) #element-wise multiplication
    x4 = X[:,1]**2
    x5 = X[:,2]**2

    #reshape, from column vectors (with shape [3])
    n_samples = X.shape[0]
    x3 = x3.reshape(n_samples,1)
    x4 = x4.reshape(n_samples,1)
    x5 = x5.reshape(n_samples,1)

    new_X = np.concatenate((X, x3 ,x4,x5), axis=1)
    return new_X

In [8]:
X_train_new = transform_features(X_train)

print("Matrix size before transformation:", X_train.shape)
print("Matrix size after transformation:", X_train_new.shape)

Matrix size before transformation: (3, 3)
Matrix size after transformation: (3, 6)


Now we find $w$, predict and  compute MSE for train:

In [9]:
w = fit(X_train_new, y_train)
y_pred = predict(X_train_new, w)
mse_train = mse(y_pred, y_train)
print("MSE for train:", mse_train)

MSE for train: 4.633360052177744


Compute the MSE for test:

In [10]:
X_test_new = transform_features(X_test)
y_pred = predict(X_test_new, w)
mse_test = mse(y_pred, y_test)
print("MSE for test:", mse_test)

MSE for test: 1500.3313803053643


### 6. Model that underfits:

We need a simple model. In this case, we choose a model that has only one variable $x_1$.


Therefore, we need to find the weights of a polynomial model:

$\hat{y} = w_0 + w_1\cdot x_1$

In [11]:
def transform_features (X):

    #transform
    new_X = X[:,:2] # selects the first two columns
    return new_X

In [12]:
X_train_new = transform_features(X_train)
w = fit(X_train_new, y_train)
y_pred = predict(X_train_new, w)
mse_train = mse(y_pred, y_train)
print("MSE for train:", mse_train)

X_test_new = transform_features(X_test)
y_pred = predict(X_test_new, w)
mse_test = mse(y_pred, y_test)
print("MSE for test:", mse_test)

MSE for train: 52.78806045340051
MSE for test: 943.0735529696908


### 7. Model that fits well:

We need a simple model. In this case, we choose a model that is neither too simple nor too complex. In this case we simply do not apply any transformation.


Thefore, we need to find the weights of a polynomial model:

$\hat{y} = w_0 + w_1\cdot x_1 + w_1\cdot x_2$

In [13]:
def transform_features (X):

    #transform
    new_X = X
    return new_X

In [14]:
X_train_new = transform_features(X_train)
w = fit(X_train_new, y_train)
y_pred = predict(X_train_new, w)
mse_train = mse(y_pred, y_train)
print("MSE for train:", mse_train)

X_test_new = transform_features(X_test)
y_pred = predict(X_test_new, w)
mse_test = mse(y_pred, y_test)
print("MSE for test:", mse_test)

MSE for train: 5.850718380389171e-30
MSE for test: 2.1423923486767538


In [15]:
print(w)

[[ 3.91121495]
 [ 2.62616822]
 [-3.68224299]]


### 8. Analysis

We now summarize the results in the following table:


|              | Train Error | Test Error |
|-------------------|:---------:|:---------:|
| Overfitted Model |    4.63 | 1500.33    |
| Underfitted Model         |   52.79  |   943.07 |
| Well-fitted Model        |   0  |   2.14  |


We can notice that the overfitted model has (comparatively) a low train error but a high test error. The underfitted model has high train error and test error, while the well-fitted model has low train and test error.

### Notes
- The definition of high and low error is relative, so it depends always on the problem. Therefore, we compare among different model versions (models with different complexities) to understand what is high and what is low.
- In this exercise, we knew the ground truth model. The true function was: $y = 5 + 2x_1 - 4x_2$.
- Please do not confuse the bias term $w_0$ (or intercept) with high bias. They are different concepts.
