# Module 4: Hyperparameters

In [None]:
# Setup the matplotlib styling
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import pandas as pd
import numpy as np

try:
    # Try to use the BI style sheet for plots
    plt.style.use('matplotlibrc')
    plt.rcParams['axes.prop_cycle'] = plt.cycler(color=[(136/256, 76/256, 255/256), (60/256, 170/256, 207/256), (12/256, 229/256, 177/256)]) 
    
    colors = [(0.53125, 0.296875, 0.99609375), (0.453125, 0.3984375, 0.9453125), (0.375, 0.4921875, 0.89453125), (0.3046875, 0.578125, 0.8515625), (0.234375, 0.6640625, 0.80859375), (0.16015625, 0.75390625, 0.76171875), (0.09375, 0.8359375, 0.72265625), (0.046875, 0.89453125, 0.69140625), (0.0, 0.875, 0.6640625)]
    bicmap = LinearSegmentedColormap.from_list(name='BIcmp', 
                                                colors=colors,
                                                N=len(colors))
    cm_bright = ListedColormap([(0.53125, 0.296875, 0.99609375), (12/256, 229/256, 177/256)])
except:
    bicmap = plt.cm.BuGn 
    colors = ['r', 'g', 'b']

## **Exercise 4.1:**
Fill in the blanks in the following sentences:

In the case of underfitting, the ___________ error is high.  
In the case of overfitting, the ___________ error is low but the ___________ error is high.  

## **Exercise 4.2:**

You are a Data Scientist in an advertising company. Given a single continuous parameter x, you have to predict a continuous target variable y. You decided to use a polynomial regression model (remember that you can use make_pipeline from sklearn to do that if you want). 

### **Exercise 4.2.1:** 
Write two functions in python:

fit_model(x, y, degree)
Given a a single feature ‘x’ and a target variable ‘y’ and the degree for the polynomial regression, this function should compute the actual model and return it.

evaluate_model(model, x, y)
Given a model and a dataset with a single feature ‘x’ and a target variable ‘y’, this functions should compute the MAE (Mean Absolute Error) of the model and return it.

In [None]:
# Import necessary modules
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_absolute_error

def fit_model(x, y, degree):
    # TODO
    pass

def evaluate_model(model, x, y):
    # TODO
    pass

### **Exercise 4.2.2:** 

For each value of the hyperparameter ‘d’ in 1,2,3,…,20,
1. Train a model on the given data in train.csv
2. Evaluate the model on test.csv

Which value for the hyperparameter ‘d’ is best suited for putting the model into production?


In [None]:
train = pd.read_csv('data/train.csv')
train_x = train[['x']].values
train_y = train['y'].values
train.head()

In [None]:
test = pd.read_csv('data/test.csv')
test_x = test[['x']].values
test_y = test['y'].values
test.head()

In [None]:
# TODO

### **Exercise 4.2.3:**

In [None]:
val = pd.read_csv('data/val.csv')
val_x = val[['x']].values
val_y = val['y'].values
val.head()


Since there are 1000 ad requests coming in each hour, the hourly company profit for a real-world application is computed as 1000*(1-MAE) where MAE is the mean average error.

1.	Retrain the model on the whole dataset using this best suitable degree d. We simulate that you have put the model into production by providing a lot of real-world data (val.csv) that comes in after putting the model in production.
What is the company profit (hourly and yearly) of that model on the real-world data?

In [None]:
# Create the whole dataset
x = np.append(train_x, test_x, axis=0)
y = np.append(train_y, test_y)

In [None]:
# Retrain the model on the chosen degree
# TODO

In [None]:
# Evaluate the model on the validation dataset
# TODO

In [None]:
# Calculate the profit of the model
# TODO

2.	Retrain the model with degree d=3 on the whole dataset and evaluate it on val.csv. What is the company profit (hourly and yearly) of that model on the real-world data?

In [None]:
# Retrain the model with degree 3
# TODO

In [None]:
# Evaluate the model
# TODO

In [None]:
# Calculate the profits of the model
# TODO

3.	Explain your finding and decide which model should be put into production.

TODO

### **Exercise 4.2.4**

Provide a method how to evaluate the performance of a hyperparameter more reliably.

TODO

In [None]:
# TODO

## **Bonus 1**

Let's try some automated Hyperparameter Optimization.

- Run the different HPO methods
- What is the best value you can achieve with a SVM model?
- Make some modifications to the methods to speed up the search
- What method is the most efficient for this problem?

### The problem dataset

In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

x, y = make_classification(
    n_samples=200,
    n_features=20,
    n_informative=10,
    n_redundant=2,
    n_repeated=2,
    n_classes=2,
    n_clusters_per_class=4,
    flip_y=0.01,
    class_sep=0.5)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

We will try to optimize a support vector machine which has two important parameters for the parameters "C" and "gamma".

In [None]:
from sklearn.svm import SVC

svc = SVC()

### Grid Search

In [None]:
# Import necessary requirements
from sklearn.model_selection import GridSearchCV
import scipy
from sklearn.utils.fixes import loguniform

In [None]:
# Define the search space for the Grid Search
params = [
  {'C': [0.1, 1, 10, 100, 1000, 2000], 'gamma': [0.005, 0.001 , 0.0001], 'kernel': ['rbf']},
 ]

In [None]:
# Initialize GridSearch
gcv = GridSearchCV(
    estimator=svc,
    param_grid=params,
    n_jobs=-1,
)

In [None]:
# fit 
gcv.fit(x_train, y_train)

In [None]:
# Get the parameters that achieved the best performance
gcv.best_params_

In [None]:
# Evaluate the model on unseen data
g_model = SVC(**gcv.best_params_)
g_model.fit(x_train, y_train)
g_model.score(x_test, y_test)

### Random Search

Our random search will be allowed to have as many runs as the GridSearch had.

In [None]:
num_runs = len(gcv.cv_results_['params'])

In [None]:
# Import the necessary requirements
from sklearn.model_selection import RandomizedSearchCV

In [None]:
# Define the search space
params = {
    'C': loguniform(1e-1, 2e3),
    'gamma': loguniform(1e-4, 1e-3),
    'kernel': ['rbf']
}

In [None]:
# Initialize the Random Search
rcv = RandomizedSearchCV(
    estimator=svc,
    param_distributions=params,
    n_iter=20
)

In [None]:
# Perform the fit
rcv.fit(x_train, y_train)

In [None]:
# Get the hyperparameters that created the best performance
rcv.best_params_

In [None]:
# Evaluate model
r_model = SVC(**rcv.best_params_)
r_model.fit(x_train, y_train)
r_model.score(x_test, y_test)

### Bayesian Optimization

Let's use a more intelligent method to find the best hyperparameters.

In [None]:
# First install scikit-optimize
import sys
!{sys.executable} -m pip install scikit-optimize

In [None]:
# Import the necessary libraries
from skopt import gp_minimize
from skopt.space import Real, Categorical, Integer
from skopt.utils import use_named_args

In [None]:
# Redefine the search space from Random Search as dimensions
space = [
    Real(low=1e-1, high=2e3, prior='log-uniform', name='C'),
    Real(low=1e-4, high=1e-3, prior='log-uniform', name='gamma'),
    Categorical(categories=['rbf'], transform='identity', prior=None, name='kernel')
]

In [None]:
# Since Bayesian Optimization CV is currently broken, we will use the gp_minimize() function

@use_named_args(dimensions=space)
def func(**params):
    svc = SVC(**params)
    svc.fit(x_train, y_train)
    score = svc.score(x_test, y_test)
    # gp_minimize minimizes, so convert accuracy to inverse
    return 1 - score

In [None]:
# Start the search procress with the same number of runs
res = gp_minimize(func, dimensions=space, n_calls=num_runs, n_jobs=-1)

In [None]:
# Get the best parameters
res.x

In [None]:
# Evaluate model
bo_model = SVC(**dict(zip(['C', 'gamma', 'kernel'], res.x)))
bo_model.fit(x_train, y_train)
bo_model.score(x_test, y_test)

### **Bonus 2:**
Perform the same optimization on the dataset, but this time use the MLPClassifier.

In [None]:
from sklearn.neural_network import MLPClassifier

- What are problems you run into?
- What can you do to solve them?
- What is the best accuracy you can achieve with a MLP model?

In [None]:
# TODO