## Using Your Own Prediction Models for Classification

Just like in [classification.ipynb](./classification.ipynb), we will use the UCI ML Breast Cancer Wisconsin dataset again.

In [1]:
from ltfmselector import LTFMSelector

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer

Pre-processing the datasets

In [2]:
# Load breast cancer dataset
cancer_dataset = load_breast_cancer()

# Get data
X = cancer_dataset['data']

# Get target
y = cancer_dataset['target']
# - 0: malignant tumor
# - 1: benign tumor

# Get feature names
feature_names = cancer_dataset['feature_names']

# Get description
dataset_description = cancer_dataset['DESCR']

# Convert data into pandas DataFrame
cancer_df = pd.DataFrame(
    np.c_[X, y], columns = np.append(feature_names, ['target'])
)

# Split the dataset for training and test
X_df = cancer_df.drop(['target'], axis=1)
y_df = cancer_df['target']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_df, y_df, test_size=0.2, random_state=5)

y_train = y_train.reset_index(drop=True)
y_test  = y_test.reset_index(drop=True)

# Carry out feature scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = pd.DataFrame(sc.fit_transform(X_train), columns=feature_names)
X_test  = pd.DataFrame(sc.transform(X_test), columns=feature_names)

Users are allowed to pass their own choice of prediction models as a list.

It is only important that each classifier/regressor have a ``fit`` and ``predict`` call-function.

In [3]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

myown_prediction_models = [
    AdaBoostClassifier(random_state=42),
    KNeighborsClassifier(3),
    LinearDiscriminantAnalysis(),
    SVC(kernel="linear", C=0.025, random_state=42),
    DecisionTreeClassifier(max_depth=5, random_state=42)
]

# Initialize agent
AgentSelector = LTFMSelector(50, pType='classification', pModels=myown_prediction_models) # If you got time, go for 1300

In [4]:
# Fit
doc = AgentSelector.fit(X_train, y_train, agent_neuralnetwork=None, lr=1e-5)



=== Episode 1 === === ===

Correct prediction
True Output: 1.0 | Prediction: 1.0

Episode terminated:
- Iterations                 : 43
- Features selected          : 17.0
- Accumulated reward         : -17.25
- Prediction model           : 0
- Prediction model #(change) : 8


=== Episode 2 === === ===

Correct prediction
True Output: 0.0 | Prediction: 0.0

Episode terminated:
- Iterations                 : 9
- Features selected          : 6.0
- Accumulated reward         : -1.07
- Prediction model           : 2
- Prediction model #(change) : 1


=== Episode 3 === === ===

Correct prediction
True Output: 0.0 | Prediction: 0.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 25.0
- Accumulated reward         : -39.52
- Prediction model           : 4
- Prediction model #(change) : 27


=== Episode 4 === === ===

Incorrect prediction
True Output: 1.0 | Prediction: 0.0

Episode terminated:
- Iterations                 : 5
- Features selected          

In [5]:
# To test the trained agent on the test dataset
y_pred, doc_test = AgentSelector.predict(X_test)



=== Test sample 0 === === ===
Prediction: 0.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 12.0
- Accumulated reward         : -1.9
- Prediction model           : 4
- Prediction model #(change) : 78


=== Test sample 1 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 9.0
- Accumulated reward         : -4.87
- Prediction model           : 2
- Prediction model #(change) : 78


=== Test sample 2 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 92
- Features selected          : 15.0
- Accumulated reward         : -5.86
- Prediction model           : 1
- Prediction model #(change) : 71


=== Test sample 3 === === ===
Prediction: 1.0

Episode terminated:
- Iterations                 : 18
- Features selected          : 12.0
- Accumulated reward         : -0.17
- Prediction model           : 1
- Prediction model #(change) : 5


=== Test sample 4 === === ===

In [7]:
# Let's check out the classification accuracy
from sklearn.metrics import accuracy_score

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc}")

Accuracy: 0.9473684210526315
