# Exercise 5: Hyperparameter Tuning

This exercise is about hyperparameter tuning. To get familiar with hyperparameter tuning in scikit-learn, refer to the respective [part in the documentation](https://scikit-learn.org/stable/modules/grid_search.html).

We again use the data set of the Data Mining Cup 2006. Remember: the task is to predict the attribute `gms_greater_avg` as precisely as possible. This time, we use the F1-measure of the class `1` as main performance metric.

## Task 1: Warm-up

In [None]:
import numpy as np
import pandas as pd

RANDOM_STATE = 42  # use this random state to make your experiments consistent
np.random.seed(RANDOM_STATE)

In [None]:
# Use the pandas library to import the training data similarly to exercise 2.

# --- TODO ---

In [None]:
# Create a 50:50 train-test-split and assign the results to the variables X_train, X_test and y_train, y_test

# --- TODO ---

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

estimators = {
    'Naive Bayes': GaussianNB().fit(X_train, y_train),
    'K-NN': KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train),
    'SVC': SVC(random_state=RANDOM_STATE).fit(X_train, y_train)
}

# Implement the `evaluate_estimators` function so that it returns precision, recall, and F1-measure
# of the class 1 on the test set for the classifiers given in `estimators`.

def evaluate_estimators(estimators, X, y_true):
    # TODO
    pass
        
evaluate_estimators(estimators, X_test, y_test)

## Task 2: Grid Search

In [None]:
%%time

tune_params = {
    'K-NN': {
        'n_neighbors': [1, 3, 5, 10]
    },
    'SVC': {
        'C': [.001, .01, .1, 1, 10, 100],
        'gamma': ['scale', 'auto'],
        'tol': [1e-2, 1e-3, 1e-4],
        'class_weight': ['balanced', None],
    }
}

# Run a grid search with the parameters given in `tune_params` with F1-measure as optimization objective.
# For the best estimator, print the parameters and evaluate it with the `evaluate_estimators` function.
# HINT: Take a look at https://scikit-learn.org/stable/modules/grid_search.html for infos about grid search.

# --- TODO ---

## Task 3: Successive Halving

In [None]:
%%time

# Now run a successive halving grid search with the parameters given in `tune_params` with F1-measure as objective.
# Use a `min_resources` of 200 and a `factor` of 2.
# Again, print parameters of the best estimator and evaluate it with the `evaluate_estimators` function.
# HINT: Examples for halving grid search: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.HalvingGridSearchCV.html#sklearn.model_selection.HalvingGridSearchCV
# HINT: To use successive halving, you need a scikit-learn version of 0.24.1 or higher
#       -> run a cell with `!pip install scikit-learn==0.24.1` and restart the notebook.

# --- TODO ---

## Task 4: Bayesian Optimization

In [None]:
%%time

bayes_tune_params = {
    'K-NN': {
        'n_neighbors': (1, 10)
    },
    'SVC': {
        'C': (1e-3, 1e+3, 'log-uniform'),
        'gamma': ['scale', 'auto'],
        'tol': (1e-4, 1e-2, 'log-uniform'),
        'class_weight': ['balanced', None],
    }
}

# Now run a bayesian search with the parameters given in `bayes_tune_params` with F1-measure as objective.
# Use a `n_iter` of 15.
# Again, print parameters of the best estimator and evaluate it with the `evaluate_estimators` function.
# HINT: Use scikit-optimize for bayesian search (https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html)
# HINT: Currently, BayesSearchCV does not work with scikit-learn version of 0.24.1. Use version 0.23.2 instead.
#       -> run a cell with `!pip install scikit-learn==0.23.2` and restart the notebook.

# --- TODO ---