# Example of using a custom learner.

This document shows how to define a custom learner for classification. In this example, the meta-learner will be the custom learner.

### Load the dataset.

First, let's load the dataset into a pandas data frame and display the first rows.
The feature names have a prefix of **v1_*** or **v2_***.* The features prefixed with v1_ are mel frequency cepstral coefficients extracted from audio signals. Features prefixed with v2_ are summary statistics extracted from accelerometer signals. Note that column names can be anything. But to make things easier, in this case a prefix was added so we can get the corresponding views' column indices.


In [None]:
import pandas as pd
import numpy as np
from multiviewstacking import load_example_data

(X_train,y_train,X_test,y_test,ind_v1,ind_v2,le) = load_example_data()

X_train.head()

### Defining the custom learner

A custom learner needs to implement three methods: `fit()`, `predict()`, and `predict_proba()`. For demonstration purposes we will not implement a learner from scratch but will use a RandomForest behind the scenes. The following class initializes a RandomForest classifier. Then, the `fit()` function passes the parameters to the `fit()` function of the RandomForest and returns this class (`self`). The `predict()` and `predict_proba()` functions call the corresponding functions of the RandomForest.

In [None]:
from sklearn.ensemble import RandomForestClassifier

class MyLearner():
    
    def __init__(self):
        self.learner = RandomForestClassifier(random_state=123)
    
    def fit(self, X, y):
        self.learner.fit(X, y)
        return self
    
    def predict(self, X):
        return self.learner.predict(X)
    
    def predict_proba(self, X):
        return self.learner.predict_proba(X)    

### Defining the first-level-learners

Let's define the first level learners for each of the views and the meta-learner. The `multiviewstacking` library supports most of `scikit-learn` classifiers. A `MultiViewStacking` model is not limited to a single type of model but supports heterogenous types of models. For example, if you know that a KNN classifier is more suitable for audio classification and Gaussian Naive Bayes is better for the accelerometer view, you can specify a different model for each view.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

# Define the first-level-learner for the audio view.
# In this case, a KNN classifier with k=3. 
m_v1 = KNeighborsClassifier(n_neighbors=3)

# Define the first-level-learner for the accelerometer view.
# In this case, a Naive Bayes classifier.
m_v2 = GaussianNB()

### Defining the custom meta-learner

Now we will instantiate our custom learner.

In [None]:
m_meta = MyLearner()

### Create the MultiViewStacking classifier

Now we are ready to create our `MultiViewStacking` classifier. We first pass the `views_indices` parameter as a list of lists. The first list is the list of indices of the first view (audio), the second list is the list of indices of the second view (accelerometer). Then, we pass a list of `first_level_learners`. **Note that the order of the views for all parameters must be the same.** That is, if in `view_indices` you pass the indices of some view $A$ and view $B$ then in the `first_level_learners` you must pass a list with the corresponding models for view $A$ and view $B$ in the same order.

Then, we specify the `meta_learner` and `k`. The parameter `k` specifies the number of folds in the internal cross-validation of the Multi-View Stacking algorithm. See [here](https://enriquegit.github.io/behavior-free/ensemble.html#stacked-generalization) for details of the algorithm.

Finally we set the `random_state` parameter for reproducibility. The `random_state` value is passed to the internal cross-validation procedure that splits the data into folds. This parameter is optional with a default value of `123`.

In [None]:
from multiviewstacking import MultiViewStacking

model = MultiViewStacking(views_indices = [ind_v1, ind_v2],
                      first_level_learners = [m_v1, m_v2],
                      meta_learner = m_meta,
                      k = 10,
                      random_state = 100)

### Train the model

Once the model has been created, we can proceed to train it.

In [None]:
# Now it's time to fit the model with the training data.
model.fit(X_train, y_train)

### Test the model

Now you can test your model by making predictions on the test set and computing the accuracy.

In [None]:
predictions = model.predict(X_test)

# Print accuracy.
print(np.sum(y_test == predictions) / len(y_test))

### Convert predictions to original strings

You can use the `LabelEncoder` to convert the integer predictions back to strings with its method `inverse_transform()`.

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

str_predictions = le.inverse_transform(predictions)
str_groundtruth = le.inverse_transform(y_test)

cm = confusion_matrix(str_groundtruth, str_predictions)

disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))

disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()