# AI512: Introduction to Machine Learning
## University of Southern Denmark - IMADA
### Fall 2024 - Melih Kandemir

---
# Exercise 04
---
- Vapnik-Chervonenkis (VC) Dimension
- Bias-Variance Decomposition



## VC Dimension
1. The goal of this question is to calculate the VC dimension of the hypothesis set of axis aligned rectangles for a feature space of two dimension. Follow the steps bellow:
    - Find a pattern of inputs that would maximize the combinations the hypothesis set could provide.
    - Consider this dataset: `[(0, 1), (1, 0), (2, 1), (1,2), (1,1)]`. For a subset of size 2 from this dataset respectively, generate a list of all possible label combinations for binary classification using `itertools`.

    - Assign labels to datapoints and divide points into two groups `(0,1)` based on their current labeling.

    - Create a rectangle containing one group labels.
    - Calculate the growth function for this subset.
    - Check if the hypothesis set can shatter this subset.
    - Calculate the VC dimension for this subset.
    - Repeat for subset sizes 3, 4, 5.

    - For a subset of size 4, plot 16 different combinations of labels, points, rectangle and Check visually if the hypothesis set can shatter this subset.
    
  



In [None]:
# Solution here

## Bias-Variance Decomposition

We will do the bias-variance decomposition for polynimial curve fitting. Please follow the steps below:

- Generate a data set by using the function $f(x)=\sin(2\pi x)$ and add Gaussian noise with standard deviation $\sigma=0.3$ to the data. Fill the `true_function` and `generate_data` functions below.
    - `L` = 100 (number of training datasets)
    - `n_train` = 25 (number of data points for training)
    - `n_test` = 100 (number of data points for testing)
    - Generate `x_train`s from uniform distribution between 0 and 1.
    - Generate `x_test` with linspace between 0 and 1.
- Import `PolynomialModel` from lecture notes `01_Basic_Concepts`. 
    - `num_polynomial_degrees` = 10
    - As regularization parameter, use $\lambda\in [0, 0.001, 0.01, 0,1, 1, 10, 100, 1000]$.
- For each $\lambda$:
    - For each dataset:
        - Fit the model to the data.
        - Predict the test data.
    - Calculate the mean of the predictions.
    - Calculate the $\text{bias}^2$ and variance of the predictions. (keep it on a list for later use in plotting)
        - $\text{bias}^2$ = $\frac{1}{n_{test}}\sum_{i=1}^{n_{test}}(\bar{y}_i - f(x_i))^2$ where $\bar{y}_i$ is the mean of the predictions for $x_i$ and $f(x_i)$ is the true function value for $x_i$.
        - variance = $\frac{1}{n_{test}}\sum_{i=1}^{n_{test}}(\bar{y}_i - \bar{\bar{y}})^2$
    - Calculate the mean squared error of the predictions. (keep it on a list for later use in plotting)
    - Plot a figure:
        - x axis: x_test
        - Left panel: Predictions from fitted polynomial models. 
            - Use first 20 dataset predictions from training datasets.
        - Right panel: 
            - Mean of the predictions blue line.
            - True function green line.
- Plot a figure:
    - x axis: $\lambda$
    - Plot the $\text{bias}^2$ as a blue line.
    - Plot the variance as a red line.
    - Plot the mean squared error as a black line.
    - Plot the $\text{bias}^2$ + variance as a magenta line.
    - Use log scale for x axis.
- **Discussion**:
    - As $\lambda$ increases, to which value do the model predictions tend and why? Relate this to the bias-variance decomposition.

**Note**: This exercise is adapted from the book "Pattern Recognition and Machine Learning" by Christopher M. Bishop.

In [None]:
# Solution here
import numpy as np
import matplotlib.pyplot as plt

def true_function(x):
    """
        Input: x
        Output: sin(2*pi*x)
    """
    # FILL THIS IN
    
    return 

def generate_data(f, L=100, n_train=25, n_test=100, noise=0.3):
    """
        Input: 
            f: true function
            L: number of training sets
            n_train: number of training data points
            n_test: number of test data points
            noise: standard deviation of noise
        Output:
            train_data: list of training data [(x_train_1, y_train_1), ..., (x_train_L, y_train_L)]
            test_data: test data (x_test, y_test)
    """
    # FILL THIS IN
    return 

train_data, test_data = generate_data(true_function, L=100, n_train=25, n_test=100, noise=0.3)
regularization_factors = [0, 1e-3, 1e-2, 1e-1, 1, 10, 100, 1000]
    