## Polynomials

A **polynomial** $p(x)$ is a sum of weighted powers of $x$.

The **degree** of $p$ is its highest power.

Example:
$$
1+x
$$
is a polynomial of degree 1, and
$$ 
-2.1x + 3.4x^3
$$
is a polynomial of degree 3.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

X = np.linspace(-1, 1, 50)

polynomial_1 = lambda x: 1 + x
polynomial_2 = lambda x: - 2.1 * x  + 3.4 * np.power(x, 3)

fig, axs = plt.subplots(nrows=2, ncols=1)
axs[0].plot(X, polynomial_1(X))
axs[0].text(0, 1.25, "1+x")
axs[1].plot(X, polynomial_2(X))
axs[1].text(0.25, -0.15, "-2.1x + 3.4x^3")
plt.show()


**Remark**: Notice how polynomials of higher degree can "wiggle" more


## Polynomials as Machine Learning Models

We can attempt to find the weights that provide the best fit for a given training dataset.

Example: For a polynomial $a_0 + a_1 x$ of degree 1, this means that we attempt to find weights $a_0$ and $a_1$ that gives the best fit.

**Remark**: We will not go into details about how weights of models are calculated. 

**The main takaway**:
- The number of weights in a polynomial model is equal to its degree + 1.
- Therefore, higher degrees result in more "complex" models.

## Training Data

In [None]:
from src.polynomial_data import get_polynomial_training_data, true_function
from src.polynomial_visualization import visualize_model

# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)

# plot the data together with the true function from which it is sampled:
visualize_model(X_train, y_train, None, true_function=true_function)

**Goal** Train a polynomial model to learn the underlying trend/pattern/rule in the data. The true function reveals the correct trend/pattern/rule. 

## Training a Polynomial Model

In [None]:
import numpy as np
from src.polynomial_visualization import visualize_model
from src.polynomial_model import get_polynomial_model
from src.polynomial_data import get_polynomial_training_data, true_function


# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)

# We initialize an untrained polynomial model of degree 1
model = get_polynomial_model(degree=1)

# We train the model
model.fit(X_train[:, np.newaxis], y_train)

# We plot the trained model together with the raw data and the true function/trend
visualize_model(X_train, y_train, model=model, true_function=true_function)


**Observation**: We can intuitively see that this model is too simple for our data.

## Exercises

In [None]:
from src.polynomial_model import get_polynomial_model_with_regularization
"""
Exercise:
You can press play to interactively train and plot polynomial models for different degrees. 

a) How does the model with degree=50 perform for X values in between 0.8 and 1?

b) How does the model with degree=7 perform for X values in between 0.8 and 1?

c) What degree (number of weights) do you think gives the "best" fit?
"""

import numpy as np
from ipywidgets import interactive, fixed
from src.polynomial_model import get_polynomial_model, get_polynomial_model_with_regularization
from src.polynomial_data import get_polynomial_training_data, true_function
from src.polynomial_visualization import visualize_model

def fit_and_plot_polynomial_from_degree(degree: int, X: np.ndarray, y: np.ndarray, regularization=False):
    # We initialize the polynomial model
    if regularization:
        model = get_polynomial_model_with_regularization(degree=degree)
    else:    
        model = get_polynomial_model(degree=degree)

    # We train the model
    model.fit(X[:, np.newaxis], y)

    # We plot the trained model together with the raw data and the true function/trend
    visualize_model(X, y, model=model, true_function=true_function)


# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)


w = interactive(
    fit_and_plot_polynomial_from_degree,
    degree=(1, 50),
    X=fixed(X_train),
    y=fixed(y_train),
    regularization=fixed(False),
)

w

## Underfitting and Overfitting

Underfitting occurs when the model is too simplistic to learn the underlying pattern or rule in the data. Its typical symptom is poor performance on both the training data and new, unseen data.

Overfitting, on the other hand, occurs when the model is too complex and learns too much from the training data, such that it fits to the noise of the training data. Its typical symptom is good performance on the training data but poor performance on new and unseen data.

In [None]:
"""
Exercise continued:

d) For which degrees do you think the model is 
    i) underfitting?
    ii) Overfitting?
"""

## Measuring Underfitting and Overfitting

In simple models, it's possible to identify underfitting and overfitting by visually inspecting the training and test set performance. However, for more complex models such as visual classification models, it's not easy to visually discern under- and overfitting.

**The goal** is to measure these phenomena quantitatively using numerical metrics.

## Unseen Test Data

The following code includes new and unseen data to evaluate the trained models.

In [None]:
from src.polynomial_data import get_polynomial_test_data, true_function
from src.polynomial_visualization import visualize_model

# We import some new, unseen test data
X_test, y_test = get_polynomial_test_data(number_of_points= 30)

# We plot the test data together with the true function from which it is sampled:
visualize_model(X_test, y_test, None, true_function=true_function)


## Measuring Model Performance

There are several methods to evaluate the performance of a trained machine learning model on test data.

In this exercise, we will use the Mean Absolute Error (MAE) metric from the scikit-learn metrics library, which is a quantitative measure of how far the predictions are from the actual values. A MAE score of 0 indicates a perfect fit for the model, while a large error score indicates a bad model fit or poor performance.

**Disclaimer**: Other metrics may also be useful for measuring a model's performance on test data.

In [None]:
import numpy as np
from sklearn.metrics import mean_absolute_error
from src.polynomial_model import get_polynomial_model
from src.polynomial_data import get_polynomial_training_data, get_polynomial_test_data

# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)

# We import some new, unseen test data
X_test, y_test = get_polynomial_test_data(number_of_points= 30)

# We initialize a polynomial model of degree 7
model = get_polynomial_model(degree=7)

# We train the model on the original test data
model.fit(X_train[:, np.newaxis], y_train)

# We calculate predicted y values from the original training data
y_predicted_train = model.predict(X_train[:, np.newaxis])

# We calculate predicted y values from the new test data
y_predicted_test = model.predict(X_test[:, np.newaxis])

# We measure the error when predicting on original training data
train_error = mean_absolute_error(y_train, y_predicted_train)

# We measure the error when predicting on new test data
test_error = mean_absolute_error(y_test, y_predicted_test)

print(f"Train error: {train_error}")
print(f"Test error: {test_error}")
print(f"Difference: {abs(test_error - train_error)}")

## Exercise

In [None]:
"""
Exercise:
You can press play to plot test and training errors against the model degree, 
and also visualize trained models interactively, as in the previous exercise.

How can you use the relationship between test and train errors to determine 

a) Underfitting?
b) Overfitting?
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error
from src.polynomial_model import get_polynomial_model, fit_and_plot_polynomial_from_degree
from src.polynomial_data import get_polynomial_training_data, get_polynomial_test_data


# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)

# We import some new, unseen test data
X_test, y_test = get_polynomial_test_data(number_of_points= 30)

degrees = np.linspace(1, 22).astype(int)
train_errors = []
test_errors = []
for degree in degrees:
    # We initialize the polynomial model
    model = get_polynomial_model(degree=degree)

    # We train the model on the original test data
    model.fit(X_train[:, np.newaxis], y_train)

    # We calculate predicted y values from the original training data
    y_predicted_train = model.predict(X_train[:, np.newaxis])

    # We calculate predicted y values from the new test data
    y_predicted_test = model.predict(X_test[:, np.newaxis])

    # We append the train error
    train_errors.append(mean_absolute_error(y_train, y_predicted_train))

    # We append the test error
    test_errors.append(mean_absolute_error(y_test, y_predicted_test))

plt.plot(degrees, train_errors, color="blue", label="Train error")
plt.plot(degrees, test_errors, color="red", label="Test error")
plt.xlabel("Degree")
plt.ylabel("Error")
plt.legend(loc="best")
plt.show()

from ipywidgets import interactive, fixed

w = interactive(
    fit_and_plot_polynomial_from_degree,
    degree=(1, 25),
    X=fixed(X_train),
    y=fixed(y_train),
    regularization=fixed(False),
)

w

# Some Strategies to Handle Overfitting

## Strategy 1: Exercise

In [None]:
"""
Exercise:
You can press play to visualize trained models interactively, as in the previous exercises.

How can we avoid overfitting with the training data and polynomial models we have used so far?
"""

from ipywidgets import interactive, fixed
from src.polynomial_data import get_polynomial_training_data
from src.polynomial_model import fit_and_plot_polynomial_from_degree

# We import the training data for this demo 
X_train, y_train = get_polynomial_training_data(30)

w = interactive(
    fit_and_plot_polynomial_from_degree,
    degree=(1, 50),
    X=fixed(X_train),
    y=fixed(y_train),
    regularization=fixed(False),
)

w

## Strategy 2: More Training Data

**Intuition**: More training data makes it easier to determine more weights!

**Disclaimer**: The training data still has to be representative of the trend/pattern/rule we would like to learn!

## Exercise

In [None]:
"""
Exercise:
You can press play to interactively train and plot polynomial models for
different degrees and training dataset sizes.

a) Fix degree=50 while changing the training dataset size. What do you see?
b) How does increasing training dataset size affect overfitting?
c) Does the added training data capture the trend/pattern/rule we would like to learn?
"""

from ipywidgets import interactive
from src.polynomial_data import get_polynomial_training_data
from src.polynomial_model import fit_and_plot_polynomial_from_degree

def plot_model_from_degree_and_training_size(degree: int, training_data_size: int):
    X_train, y_train = get_polynomial_training_data(number_of_points=training_data_size)
    fit_and_plot_polynomial_from_degree(
        degree=degree,
        X=X_train,
        y=y_train
    )


w = interactive(
    plot_model_from_degree_and_training_size,
    degree=(1, 100),
    training_data_size=(3, 250),
)

w


## Handling Data in Real-World ML 

While it is easy to add more training data in our exercise because it's created from code, it's generally **not** the case for real-world ML applications.

**Data Augmentation**:
Create more training data from existing training data. This is done by augmenting existing training data, e.g. rotating images in somehting like visual classification.

**Key Takeaway**:

There are methods for adding more training data from existing training data.

## Strategy 3: Regularization

Add rules/constraints to the ML model's weights.

We will have a look at how restricting the size of a model's weights affect under- and overfitting.

**Disclaimer**:

There are many methods for regularization.

## Exercise

In [None]:
"""
Exercise:
You can press play to interactively train and plot polynomial models, 
with regularization, for different degrees.

How does the resulting models differ from the ones trained without regularization?
"""

from ipywidgets import interactive
from src.polynomial_data import get_polynomial_training_data
from src.polynomial_model import fit_and_plot_polynomial_from_degree

def plot_model_from_degree_with_regularzation(degree: int):
    X_train, y_train = get_polynomial_training_data(30)
    fit_and_plot_polynomial_from_degree(
        degree,
        X_train,
        y_train,
        regularization=True
    )


w = interactive(
    plot_model_from_degree_with_regularzation,
    degree=(1, 100),
)

w


## Summary

Here is a summary of key takeaways from this exercise:

- A model that performs well on training data but poorly on new data is overfit.
- We can determine overfitting by comparing error/accuracy between training and new data.
    - Relatively worse performance on new data implies overfitting.
    - There are many methods for measuring accuracy/error!
- Three common strategies for addressing overfitting include:
    1. Starting with a simple model and then gradually increasing complexity.
    2. Adding more training data, provided it captures what we want to learn.
    3. Regularization techniques can be employed to manage overfitting.
        - There are many methods for regularization!

## Exercise

In [None]:
"""
Exercise:

For the working example used in this interactive demo,
which strategy do you think gives the best result?
"""