--- 
## L09 - Hyperparameters and GridSearch 

In this assignment we will explain GridSearch and tune Hyperparameters. We will try to use both the GridSearch and also the Random Search by using an SGD classifier. At last, we will join the search-quest competetion and attempt to find the best model+hyperparameters for the MNIST dataset.

### Qa) Explain GridSearch

<u>__Review of code cell 1 (Function setup)__</u>

We will describe most of the functions in the code cell.

`SearchReport`

Gives a detailed descriptions of a given model. This includes the best parameters combitinations, the best score and index. 

`GetBestModelCTOR`

Takes a model object and its best parameters and constructs a string representation of it that is later used. 

`ClassificationReport`

Basically prints a detailed classification report using the `classification_report` function given by the `sklearn.metrics` library.

`FullReport`

Prints the result of both the `SearchReport` and `ClassificationReport` 

`LoadAndSetupData`

Loads data based on a given mode and returns the data.

`TryKerasImport`

Checks whether `keras` and `tenserflow.keras` has been imported successfully.

<u>__Review of code cell 2 (The actual grid-search)__</u>

In code cell 2, the data is being loaded by using one of the functions created in the previous code cell (`LoadAndSetupData`).

We set the mode to 'iris' and defines tuning parameters. We also define the CV (Which is the Cross-validation) and Verbose set to 0 so we can create a SVC model.

__`__GridSearchCV__`__

We use the all of the created variables to create a `GridSearchCV` object. It has the `n_jobs` set to -1, so that it uses all available cores.

It uses the scoring method `f1_micro`, which calculate metrics globally by counting the total true positives, false negatives and false positives (Source: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score)

We make the grid search by calling the `fit` function on the grid search model, and records the execution time (Which later gets printed alongside of the FullReport).

In [2]:
from time import time
import numpy as np
import sys

from sklearn import svm
from sklearn.linear_model import SGDClassifier

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import classification_report, f1_score
from sklearn import datasets

from libitmal import dataloaders as itmaldataloaders # Needed for load of iris, moon and mnist

currmode="N/A" # GLOBAL var!

def SearchReport(model): 
    
    def GetBestModelCTOR(model, best_params):
        def GetParams(best_params):
            ret_str=""          
            for key in sorted(best_params):
                value = best_params[key]
                temp_str = "'" if str(type(value))=="<class 'str'>" else ""
                if len(ret_str)>0:
                    ret_str += ','
                ret_str += f'{key}={temp_str}{value}{temp_str}'  
            return ret_str          
        try:
            param_str = GetParams(best_params)
            return type(model).__name__ + '(' + param_str + ')' 
        except:
            return "N/A(1)"
        
    print("\nBest model set found on train set:")
    print()
    print(f"\tbest parameters={model.best_params_}")
    print(f"\tbest '{model.scoring}' score={model.best_score_}")
    print(f"\tbest index={model.best_index_}")
    print()
    print(f"Best estimator CTOR:")
    print(f"\t{model.best_estimator_}")
    print()
    try:
        print(f"Grid scores ('{model.scoring}') on development set:")
        means = model.cv_results_['mean_test_score']
        stds  = model.cv_results_['std_test_score']
        i=0
        for mean, std, params in zip(means, stds, model.cv_results_['params']):
            print("\t[%2d]: %0.3f (+/-%0.03f) for %r" % (i, mean, std * 2, params))
            i += 1
    except:
        print("WARNING: the random search do not provide means/stds")
    
    global currmode                
    assert "f1_micro"==str(model.scoring), f"come on, we need to fix the scoring to be able to compare model-fits! Your scoreing={str(model.scoring)}...remember to add scoring='f1_micro' to the search"   
    return f"best: dat={currmode}, score={model.best_score_:0.5f}, model={GetBestModelCTOR(model.estimator,model.best_params_)}", model.best_estimator_ 

def ClassificationReport(model, X_test, y_test, target_names=None):
    assert X_test.shape[0]==y_test.shape[0]
    print("\nDetailed classification report:")
    print("\tThe model is trained on the full development set.")
    print("\tThe scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = y_test, model.predict(X_test)                 
    print(classification_report(y_true, y_pred, target_names=target_names))
    print()
    
def FullReport(model, X_test, y_test, t):
    print(f"SEARCH TIME: {t:0.2f} sec")
    beststr, bestmodel = SearchReport(model)
    ClassificationReport(model, X_test, y_test)    
    print(f"CTOR for best model: {bestmodel}\n")
    print(f"{beststr}\n")
    return beststr, bestmodel
    
def LoadAndSetupData(mode, test_size=0.3):
    assert test_size>=0.0 and test_size<=1.0
    
    def ShapeToString(Z):
        n = Z.ndim
        s = "("
        for i in range(n):
            s += f"{Z.shape[i]:5d}"
            if i+1!=n:
                s += ";"
        return s+")"

    global currmode
    currmode=mode
    print(f"DATA: {currmode}..")
    
    if mode=='moon':
        X, y = itmaldataloaders.MOON_GetDataSet(n_samples=5000, noise=0.2)
        itmaldataloaders.MOON_Plot(X, y)
    elif mode=='mnist':
        X, y = itmaldataloaders.MNIST_GetDataSet(load_mode=0)
        if X.ndim==3:
            X=np.reshape(X, (X.shape[0], -1))
    elif mode=='iris':
        X, y = itmaldataloaders.IRIS_GetDataSet()
    else:
        raise ValueError(f"could not load data for that particular mode='{mode}', only 'moon'/'mnist'/'iris' supported")
        
    print(f'  org. data:  X.shape      ={ShapeToString(X)}, y.shape      ={ShapeToString(y)}')

    assert X.ndim==2
    assert X.shape[0]==y.shape[0]
    assert y.ndim==1 or (y.ndim==2 and y.shape[1]==0)    
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=0, shuffle=True
    )
    
    print(f'  train data: X_train.shape={ShapeToString(X_train)}, y_train.shape={ShapeToString(y_train)}')
    print(f'  test data:  X_test.shape ={ShapeToString(X_test)}, y_test.shape ={ShapeToString(y_test)}')
    print()
    
    return X_train, X_test, y_train, y_test

def TryKerasImport(verbose=True):
    
    kerasok = True
    try:
        import keras as keras_try
    except:
        kerasok = False

    tensorflowkerasok = True
    try:
        import tensorflow.keras as tensorflowkeras_try
    except:
        tensorflowkerasok = False
        
    ok = kerasok or tensorflowkerasok
    
    if not ok and verbose:
        if not kerasok:
            print("WARNING: importing 'keras' failed", file=sys.stderr)
        if not tensorflowkerasok:
            print("WARNING: importing 'tensorflow.keras' failed", file=sys.stderr)

    return ok
    
print(f"OK(function setup" + ("" if TryKerasImport() else ", hope MNIST loads works because it seems you miss the installation of Keras or Tensorflow!") + ")")

OK(function setup)


In [3]:
# Setup data
X_train, X_test, y_train, y_test = LoadAndSetupData(
    'iris')  # 'iris', 'moon', or 'mnist'

# Setup search parameters
model = svm.SVC(
    gamma=0.001
)  # NOTE: gamma="scale" does not work in older Scikit-learn frameworks,
# FIX:  replace with model = svm.SVC(gamma=0.001)

tuning_parameters = {
    'kernel': ('linear', 'rbf'), 
    'C': [0.1, 1, 10]
}

CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = GridSearchCV(model,
                          tuning_parameters,
                          cv=CV,
                          scoring='f1_micro',
                          verbose=VERBOSE,
                          n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)')

DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)

SEARCH TIME: 4.64 sec

Best model set found on train set:

	best parameters={'C': 1, 'kernel': 'linear'}
	best 'f1_micro' score=0.9714285714285715
	best index=2

Best estimator CTOR:
	SVC(C=1, gamma=0.001, kernel='linear')

Grid scores ('f1_micro') on development set:
	[ 0]: 0.962 (+/-0.093) for {'C': 0.1, 'kernel': 'linear'}
	[ 1]: 0.371 (+/-0.038) for {'C': 0.1, 'kernel': 'rbf'}
	[ 2]: 0.971 (+/-0.047) for {'C': 1, 'kernel': 'linear'}
	[ 3]: 0.695 (+/-0.047) for {'C': 1, 'kernel': 'rbf'}
	[ 4]: 0.952 (+/-0.085) for {'C': 10, 'kernel': 'linear'}
	[ 5]: 0.924 (+/-0.097) for {'C': 10, 'kernel': 'rbf'}

Detailed classification report:
	The model is trained on the full development set.
	The scores are computed on the full evaluation set.

              precision    recall  f1-score  

### Qb) Hyperparameter Grid Search using an SGD classifier

In this exercise we will replace the `svm.SVC` model with an `SGDClassifier` and a suitable set of the hyperparameters for that model 

In [4]:
from sklearn.linear_model import SGDClassifier

# Load the same dataset as before
X_train, X_test, y_train, y_test = LoadAndSetupData('iris')

# Create SGDClassifier instance
model = SGDClassifier(loss='hinge')


# Tuning parameters found in documentation (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier)
# and values with Gemini chatbot:
tuning_parameters = {
    'alpha': [10**x for x in range(-7, 2)],
    'penalty': ('l1', 'l2', 'elasticnet'), 
    'learning_rate': ('constant', 'optimal', 'invscaling', 'adaptive'),
    'eta0': [0.001, 0.005, 0.01, 0.05, 0.1],   
    'max_iter': [1000, 2000, 5000],       
    'power_t': [0.25, 0.5, 0.75],
    'tol': [1e-4, 1e-5]  # Making the stopping criterion very small
}

CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = GridSearchCV(model,
                          tuning_parameters,
                          cv=CV,
                          scoring='f1_micro',
                          verbose=VERBOSE,
                          n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(grid_tuned, X_test, y_test, t)
print('OK(grid-search)') 

DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)

SEARCH TIME: 66.14 sec

Best model set found on train set:

	best parameters={'alpha': 1e-07, 'eta0': 0.05, 'learning_rate': 'adaptive', 'max_iter': 1000, 'penalty': 'l1', 'power_t': 0.25, 'tol': 1e-05}
	best 'f1_micro' score=0.9904761904761905
	best index=811

Best estimator CTOR:
	SGDClassifier(alpha=1e-07, eta0=0.05, learning_rate='adaptive', penalty='l1',
              power_t=0.25, tol=1e-05)

Grid scores ('f1_micro') on development set:
	[ 0]: 0.724 (+/-0.140) for {'alpha': 1e-07, 'eta0': 0.001, 'learning_rate': 'constant', 'max_iter': 1000, 'penalty': 'l1', 'power_t': 0.25, 'tol': 0.0001}
	[ 1]: 0.743 (+/-0.155) for {'alpha': 1e-07, 'eta0': 0.001, 'learning_rate': 'constant', 'max_iter': 1000, 'penalty': 'l1', 'power_t': 0.25, 'tol': 1e-05}
	[ 2]: 0.695 (+/-0.047) for {'alp

### Qc) Hyperparameter Random Search using an SDG classifier

In this exercise we will be adding code to run a `RandomizedSearchCV` instead.
We will be using some given default parameters, and two new parameters `n_iter` and `random_state` have been added.

__`n_iter`__

n_iter is the parameter that controls how many iterations the model will run. By definition it'll tradeoff runtime vs quality of the solution, becuase the fewer iterations the quicker it runs, but the solution might not be the best.  

__Comparison of the best-tuned parameter set and best scoring for the two methods__

Based on our tests, we can not compare the models on time. Because the dataset was to small and in the RandomizedSearchCV we only ran 20 iterations. 

Where we can compare them is on the score itself. Here we see that with the GridSearch model we get a score of 0.9905 and the RandomizedSearchCV model gets a score of 0.9810. Based only on the scores we can conclude that the GridSearch model is the best, but the RandomSearchCV does get close.

As for the best tuned parameters on the RandomSearchCV we can see that it gets close on all parameters. 

__Implementation of a random search for the SGD classifier__

In [5]:
from sklearn.linear_model import SGDClassifier

# Load the same dataset as before
X_train, X_test, y_train, y_test = LoadAndSetupData('iris')

# Create SGDClassifier instance
model = SGDClassifier(loss='hinge')


# Tuning parameters found in documentation (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier)
# and values with Gemini chatbot:
tuning_parameters = {
    'alpha': [10**x for x in range(-7, 2)],
    'penalty': ('l1', 'l2', 'elasticnet'), 
    'learning_rate': ('constant', 'optimal', 'invscaling', 'adaptive'),
    'eta0': [0.001, 0.005, 0.01, 0.05, 0.1],   
    'max_iter': [1000, 2000, 5000],       
    'power_t': [0.25, 0.5, 0.75],
    'tol': [1e-4, 1e-5]  # Making the stopping criterion very small
}

CV = 5
VERBOSE = 0

random_tuned = RandomizedSearchCV(
    model, 
    tuning_parameters, 
    n_iter=20, 
    random_state=42, 
    cv=CV, 
    scoring='f1_micro', 
    verbose=VERBOSE, 
    n_jobs=-1
)

start = time()
random_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b0, m0 = FullReport(random_tuned, X_test, y_test, t)
print('OK(grid-search)') 

DATA: iris..
  org. data:  X.shape      =(  150;    4), y.shape      =(  150)
  train data: X_train.shape=(  105;    4), y_train.shape=(  105)
  test data:  X_test.shape =(   45;    4), y_test.shape =(   45)



SEARCH TIME: 0.31 sec

Best model set found on train set:

	best parameters={'tol': 0.0001, 'power_t': 0.5, 'penalty': 'l2', 'max_iter': 5000, 'learning_rate': 'adaptive', 'eta0': 0.1, 'alpha': 0.001}
	best 'f1_micro' score=0.980952380952381
	best index=2

Best estimator CTOR:
	SGDClassifier(alpha=0.001, eta0=0.1, learning_rate='adaptive', max_iter=5000,
              tol=0.0001)

Grid scores ('f1_micro') on development set:
	[ 0]: 0.695 (+/-0.047) for {'tol': 0.0001, 'power_t': 0.75, 'penalty': 'elasticnet', 'max_iter': 2000, 'learning_rate': 'invscaling', 'eta0': 0.05, 'alpha': 0.1}
	[ 1]: 0.971 (+/-0.047) for {'tol': 0.0001, 'power_t': 0.5, 'penalty': 'elasticnet', 'max_iter': 5000, 'learning_rate': 'adaptive', 'eta0': 0.05, 'alpha': 1e-07}
	[ 2]: 0.981 (+/-0.047) for {'tol': 0.0001, 'power_t': 0.5, 'penalty': 'l2', 'max_iter': 5000, 'learning_rate': 'adaptive', 'eta0': 0.1, 'alpha': 0.001}
	[ 3]: 0.762 (+/-0.200) for {'tol': 1e-05, 'power_t': 0.25, 'penalty': 'l2', 'max_iter': 1000

### Qd) MNIST Search Quest II

In the final exercise, we will see if we can find the best model+hyperparameters for the MNIST dataset.

We've chosen to use the GridSearch model to handle the mnist dataset and our tuning parameters can be seen down in the code cell. We've decided to maximize the number of iterations and the learning rate, this we believe is the best way to get a good result. 


In [8]:
from sklearn.linear_model import SGDClassifier

# Load the same dataset as before
X_train, X_test, y_train, y_test = LoadAndSetupData('mnist')

# Create SGDClassifier instance
model = SGDClassifier(loss='hinge')

# Tuning parameters found in documentation (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier)
# and values with Gemini chatbot and extended our with our own ideas:
tuning_parameters = {
    'alpha': [10**x for x in range(-7, 2)],
    'penalty': ('l1', 'l2', 'elasticnet'), 
    'learning_rate': ('constant', 'optimal', 'invscaling', 'adaptive'),
    'eta0': [0.000001, 0.00001, 0.0001, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2],   
    'max_iter': [1000, 2000, 5000, 10000, 20000, 30000, 40000],       
    'power_t': [0.0075, 0.125, 0.25, 0.5, 0.75, 1.00],
    'tol': [1e-4, 1e-5]  # Making the stopping criterion very small
}

CV = 5
VERBOSE = 0

# Run GridSearchCV for the model
grid_tuned = GridSearchCV(model,
                          tuning_parameters,
                          cv=CV,
                          scoring='f1_micro',
                          verbose=VERBOSE,
                          n_jobs=-1)

start = time()
grid_tuned.fit(X_train, y_train)
t = time() - start

# Report result
b1, m1 = FullReport(grid_tuned, X_test, y_test, t)
print(b1)
print('OK(grid-search)') 

DATA: mnist..
  org. data:  X.shape      =(70000;  784), y.shape      =(70000)
  train data: X_train.shape=(49000;  784), y_train.shape=(49000)
  test data:  X_test.shape =(21000;  784), y_test.shape =(21000)

