# Machine Learning with Python

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## 3.2 Hyperparameter search

To improve performance of a particular model, we will need to tune its *hyperparameter values* - i.e. the parameters that are not learned from the data but specified independently. Cross-validation allows us to make a sweep of the possible hyperparameter space and find combinations of hyperparameters that work well for the training data as a whole.

Once we have decided on the best values for hyperparameters, we can train a final model on the *entire* training dataset and evaluate on the testing data for an independent assessment of performance.

Let's look at another classification dataset. Here we are attempting to distinguish between nasal and oral vowel sounds, using the amplitudes of the first five harmonics.

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler

phoneme = fetch_openml(name='phoneme', version=1, parser='auto')
X, y = phoneme.data.to_numpy(), phoneme.target.to_numpy()

In [None]:
X.shape

We could try a KNN classifier:

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, random_state=10)

# We will use a smaller training set to make the problem harder
X_train_ = X_train[:100]
y_train_ = y_train[:100]
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train_, y_train_)

Let's assess using the [F1 score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score).

In [None]:
y_pred = knn.predict(X_test)

from sklearn.metrics import f1_score
f1_score(y_test, y_pred, pos_label='1')

Looks good - but could we do better with a different value of *k*?

We can do an exhaustive search of the hyperparameter space using [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer

parameters = {'n_neighbors':[1, 2, 5, 10, 20, 50]}
predictor = KNeighborsClassifier()
gs = GridSearchCV(predictor, 
                  parameters, 
                  cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=40),
                  scoring=make_scorer(f1_score, pos_label='1')
                  )
gs.fit(X_train_,y_train_)

This gives us some detailed results for each of the 5 splits:

In [None]:
gs.cv_results_

It will also report the best parameter values found:

In [None]:
gs.best_params_

In [None]:
gs.best_score_

### Multi-parameter searches

Let's try a more complex example on the same dataset. Support Vector Machines have several hyperparameters that could be varied - for example, in addition to the kernel function itself, we have a regularisation parameter `C` to tune (a positive real value).

`GridSearchCV` makes it easy for us to explore the space of possible hyperparameter values and choose the best combination.

In [None]:
from sklearn.svm import SVC

parameters = {'kernel':('linear', 'rbf', 'poly'), 'C':[0.01, 0.1, 1, 10, 100]}
predictor = SVC()
gs = GridSearchCV(predictor, 
                  parameters, 
                  cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=40),
                  scoring=make_scorer(f1_score, pos_label='1')
                  )
gs.fit(X_train_,y_train_)


In [None]:
gs.cv_results_['params']

In [None]:
gs.cv_results_['mean_test_score']

In [None]:
gs.best_params_

In [None]:
gs.best_score_

Then to train the final model we could do

In [None]:
final = SVC(**gs.best_params_)
final.fit(X_train_,y_train_)


In [None]:
y_pred = final.predict(X_test)
f1_score(y_test, y_pred, pos_label='1')

Slightly disappointing? Notice that the cross-validation can still overestimate performance on onseen data - this is why it is important to have a final test dataset available to obtain a convincing assessment.

### Using GridSearchCV with a pipeline

When we have preprocessing steps to consider, the process becomes a little more complex. Remember that we will have to learn the transformations from *each split* in the training data. The pipeline can help here.

We will go back to the *autoMpg* regression dataset.

In [None]:
from sklearn.datasets import fetch_openml
mpg = fetch_openml(name='autoMpg', version=1, parser='auto')

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(mpg.data, mpg.target, random_state=0)


This time we will add a LASSO predictor to the workflow.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso

# Defines preprocessing transformations for specified columns
ct = ColumnTransformer([ ('encode', OneHotEncoder(), ['origin']),
                         ('impute', IterativeImputer(), ['horsepower'])
                       ],
                       remainder='passthrough') 

# Defines individual steps in a workflow
pipe = Pipeline([('preprocessing', ct),
                 ('scaling', StandardScaler()),
                 ('predict', Lasso())])



In [None]:
# Note how we link the hyperparameter to the specific pipeline step
parameters = {'predict__alpha':[0.001,0.01, 0.1, 1, 10, 100]}

gs = GridSearchCV(pipe, 
                  parameters, 
                  cv=5,
                  scoring='r2'
                  )
gs.fit(X_train,y_train)

In [None]:
gs.best_params_

In [None]:
gs.best_score_

### Preprocessing steps can also have hyperparameters

The problem is currently fairly easy as there are only seven features to consider:

In [None]:
X_train

Let's add a load of noisy random features to make things more difficult:

In [None]:
n_samples, n_features = X_train.shape
random_state = np.random.RandomState(12)
random_data = random_state.randn(n_samples, 300 * n_features)
X = pd.concat([X_train.reset_index(drop=True), 
               pd.DataFrame(random_data)], 
               axis=1)
X.columns = X.columns.astype(str)
X.head()

In [None]:
gs.fit(X,y_train)

In [None]:
gs.best_score_

In this situation, a dimensionality reduction step would help to reduce the noise. 
Let's include a PCA step in the pipeline:

In [None]:
from sklearn.decomposition import PCA

pipe2 = Pipeline([('preprocessing', ct),
                  ('scaling', StandardScaler()),
                  ('reduce', PCA()),
                  ('predict', Lasso())])


The number of PCA components is now a hyperparameter, so let's include it in the grid search:

In [None]:
# Note how we link the hyperparameter to the specific pipeline step
parameters = {'predict__alpha':[0.001,0.01,0.1,1,10,100],
              'reduce__n_components':[2,3,4,5,6,7,8,9]}

gs = GridSearchCV(pipe2, 
                  parameters, 
                  cv=5,
                  scoring='r2'
                  )
gs.fit(X_train,y_train)

In [None]:
gs.cv_results_

In [None]:
gs.best_score_

In [None]:
gs.best_params_

### Exercise


Multi-layer perceptrons are sensitive to feature scaling, so it is highly recommended to scale your data. Using a pipeline, investigate whether scaling affects performance of an [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#mlpregressor) on the `wine_quality_white` dataset.


In [None]:
from sklearn.datasets import fetch_openml
w = fetch_openml(name='wine-quality-white',version=1,parser='auto')

The MLPRegressor has a lot of tunable hyperparameters. 

* `hidden_layer_sizes`
* `activation`
* `solver`
* `alpha`
* `learning_rate`
* ...

Use GridSearchCV to try to optimise its performance on this dataset (choose a few of the parameters to explore).

Note that when different solvers have different parameter options, we can provide `GridSearchCV` with a list of dictionaries instead of a single dictionary. See [this example](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_training_curves.html#sphx-glr-auto-examples-neural-networks-plot-mlp-training-curves-py) for details.