## Hyperparameter-Tuning each Classifier at every Prediction

Just like in [classification.ipynb](./classification.ipynb), we will use the UCI ML Breast Cancer Wisconsin dataset again.

As seen in the previous example, ``LTFMSelector`` can use any predicton model as long as it comes with ``fit`` and ``predict`` call functions. This allows one to also use tools like scikit-learn's ``GridSearchCV`` to carry out hyperparameter tuning by grid-searching a user-defined range of hyperparameters per prediction model.

In [1]:
from ltfmselector import LTFMSelector

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer

Pre-processing the datasets

In [2]:
# Load breast cancer dataset
cancer_dataset = load_breast_cancer()

# Get data
X = cancer_dataset['data']

# Get target
y = cancer_dataset['target']
# - 0: malignant tumor
# - 1: benign tumor

# Get feature names
feature_names = cancer_dataset['feature_names']

# Get description
dataset_description = cancer_dataset['DESCR']

# Convert data into pandas DataFrame
cancer_df = pd.DataFrame(
    np.c_[X, y], columns = np.append(feature_names, ['target'])
)

# Split the dataset for training and test
X_df = cancer_df.drop(['target'], axis=1)
y_df = cancer_df['target']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_df, y_df, test_size=0.2, random_state=5)

y_train = y_train.reset_index(drop=True)
y_test  = y_test.reset_index(drop=True)

# Carry out feature scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = pd.DataFrame(sc.fit_transform(X_train), columns=feature_names)
X_test  = pd.DataFrame(sc.transform(X_test), columns=feature_names)

Users are allowed to pass their own choice of prediction models as a list.

It is only important that each classifier/regressor have a ``fit`` and ``predict`` call-function.

scikit-learn's ``GridSearchCV`` validates each configuration of hyperparameter with a 5-fold cross validation by default.

In [3]:
from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

myown_prediction_models = [
    GridSearchCV(
        KNeighborsClassifier(), param_grid={'n_neighbors': [3, 4, 5]},
        n_jobs=-1 # use all processors
    ),
    GridSearchCV(
        SVC(kernel="linear", random_state=42), param_grid={'C': [0.025, 1, 10]},
        n_jobs=-1 # use all processors
    ),
    GridSearchCV(
        DecisionTreeClassifier(random_state=42), param_grid={'max_depth': [3, 4, 5]},
        n_jobs=-1 # use all processors
    )
]

# Initialize agent
AgentSelector = LTFMSelector(50, pType='classification', pModels=myown_prediction_models) # If you got time, go for 1300

In [4]:
# Fit
doc = AgentSelector.fit(X_train, y_train, agent_neuralnetwork=None, lr=1e-5)



=== Episode 1 === === ===

Correct prediction
True Output: 1.0 | Prediction: 1.0

Episode terminated:
- Iterations                 : 43
- Features selected          : 21.0
- Accumulated reward         : -14.28
- Prediction model           : 2
- Prediction model #(change) : 7


=== Episode 2 === === ===

Correct prediction
True Output: 1.0 | Prediction: 1.0

Episode terminated:
- Iterations                 : 66
- Features selected          : 27.0
- Accumulated reward         : -33.32
- Prediction model           : 1
- Prediction model #(change) : 5


=== Episode 3 === === ===

Correct prediction
True Output: 0.0 | Prediction: 0.0

Episode terminated:
- Iterations                 : 33
- Features selected          : 18.0
- Accumulated reward         : -13.19
- Prediction model           : 1
- Prediction model #(change) : 1


=== Episode 4 === === ===

Correct prediction
True Output: 1.0 | Prediction: 1.0

Episode terminated:
- Iterations                 : 35
- Features selected         

In [5]:
# To test the trained agent on the test dataset
y_pred, doc_test = AgentSelector.predict(X_test)



=== Test sample 0 === === ===
Prediction: 0.0

Episode terminated:
- Iterations                 : 66
- Features selected          : 13.0
- Accumulated reward         : -5.6
- Prediction model           : 0
- Prediction model #(change) : 47


=== Test sample 1 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 12.0
- Accumulated reward         : -6.85
- Prediction model           : 2
- Prediction model #(change) : 73


=== Test sample 2 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 14.0
- Accumulated reward         : -2.89
- Prediction model           : 0
- Prediction model #(change) : 75


=== Test sample 3 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 12.0
- Accumulated reward         : -4.87
- Prediction model           : 2
- Prediction model #(change) : 75


=== Test sample 4 === === =

In [6]:
# Let's check out the classification accuracy
from sklearn.metrics import accuracy_score

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc}")

Accuracy: 0.9385964912280702
