# Use-case: Classification of digits with 6 views.

This document shows how to use the `multiviestacking` library on a dataset with $6$ views.

### Load the dataset.

First, let's load the dataset into a pandas data frame and display the first rows.
The first column is the **class** and the remaining ones are the features. The feature names have a prefix of **v1_***, **v2_***,...,**v6_***.* Each view is described below.

1. 76 Fourier coefficients of the character shapes; 
2. 216 profile correlations; 
3. 64 Karhunen-Love coefficients; 
4. 240 pixel averages in 2 x 3 windows; 
5. 47 Zernike moments; 
6. 6 morphological features.


In [None]:
import pandas as pd
import numpy as np

# Read file.
df = pd.read_csv('multiview-digits.csv')

# Display the first rows of the data.
df.head()

In [None]:
df.shape

Now let's store the class in the variable `y` and the features in the variable `X`. We also store the column names so later we can get the column indices.

In [None]:
y = df["class"]
X = df.drop(["class"], axis = 1)

# Store column names.
colnames = list(X.columns)

### Split the dataset

Now we split the dataset into train and test sets.

In [None]:
from sklearn.model_selection import train_test_split

# Split into train, and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    train_size = 0.5,
                                                    stratify = y,
                                                    random_state = 123)

### Get column indices

The `MultiViewStacking` object expects the features of each view to be passed as indices. Either as ranges as a tuple or the complete list of indices. In this case we will pass the complete list of indices. Since we stored the column names, we can extract their indices by searching for those that start with "v1_", "v2_", etc.

In [None]:
# Get column indices for each view.
ind_v1 = [colnames.index(x) for x in colnames if "v1_" in x]
ind_v2 = [colnames.index(x) for x in colnames if "v2_" in x]
ind_v3 = [colnames.index(x) for x in colnames if "v3_" in x]
ind_v4 = [colnames.index(x) for x in colnames if "v4_" in x]
ind_v5 = [colnames.index(x) for x in colnames if "v5_" in x]
ind_v6 = [colnames.index(x) for x in colnames if "v6_" in x]

print(ind_v1)

### Visualize the data

The dimension of the data will be reduced to 2 so it can be plotted. Multidimensional Scaling (MDS) is used here to reduce the dimension.

In [None]:
import matplotlib.pyplot as plt
from sklearn.manifold import MDS

# Select first n data points, so the procedure runs faster.
n = 500
X_sample = X_train[:n]
y_sample = y_train[:n]

mds = MDS(n_components = 2, random_state = 10)

# Plot view.
selected_view = ind_v1
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 1")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_1.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Plot view.
selected_view = ind_v2
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 2")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_2.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Plot view.
mds = MDS(n_components = 2, random_state = 10)
selected_view = ind_v3
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 3")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_3.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Plot view.
mds = MDS(n_components = 2, random_state = 10)
selected_view = ind_v4
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 4")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_4.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Plot view.
selected_view = ind_v5
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 5")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_5.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Plot view.
selected_view = ind_v6
X_reduced = mds.fit_transform(X_sample.iloc[:, selected_view])
plt.figure(figsize=(7, 5))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y_sample, cmap=plt.cm.get_cmap("jet", 10))
plt.colorbar(label='Class', ticks=range(10))
plt.title("MDS View 6")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.savefig("mds_6.pdf", format="pdf", bbox_inches="tight")
plt.show()

### Defining the first-level-learners and meta-learner

Let's define the first level learners for each of the views and the meta-learner. The `multiviewstacking` library supports most of `scikit-learn` classifiers. A `MultiViewStacking` model is not limited to a single type of model but supports heterogenous types of models.

In [None]:
from sklearn.ensemble import RandomForestClassifier

m_v1 = RandomForestClassifier(n_estimators=50, random_state=123)
m_v2 = RandomForestClassifier(n_estimators=50, random_state=123)
m_v3 = RandomForestClassifier(n_estimators=50, random_state=123)
m_v4 = RandomForestClassifier(n_estimators=50, random_state=123)
m_v5 = RandomForestClassifier(n_estimators=50, random_state=123)
m_v6 = RandomForestClassifier(n_estimators=50, random_state=123)

m_meta = RandomForestClassifier(n_estimators=50, random_state=123)

### Create the MultiViewStacking classifier

Now we are ready to create our `MultiViewStacking` classifier. We first pass the `views_indices` parameter as a list of lists.

In [None]:
from multiviewstacking import MultiViewStacking

model = MultiViewStacking(views_indices = [ind_v1, ind_v2, ind_v3, ind_v4, ind_v5, ind_v6],
                      first_level_learners = [m_v1, m_v2, m_v3, m_v4, m_v5, m_v6],
                      meta_learner = m_meta,
                      k = 10,
                      random_state = 100)

### Train the model

Once the model has been created, we can proceed to train it.

In [None]:
# Now it's time to fit the model with the training data.
model.fit(X_train, y_train)

### Test the model

Now you can test your model by making predictions on the test set and computing the accuracy.

In [None]:
predictions = model.predict(X_test)

# Print accuracy.
print(np.sum(y_test == predictions) / len(y_test))

In [None]:
# Confusion  matrix multi-view stacking.
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
str_predictions = predictions
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix Multi-view Stacking')
plt.savefig("cm_0.pdf", format="pdf", bbox_inches="tight")
plt.show()

### Testing the first-level-learners individually

The fitted `MultiViewStacking` model has several attributes including `fitted_first_level_learners_`. You can use this to test the performance of the individual models.

In [None]:
fitted_view1 = model.fitted_first_level_learners_[0]
fitted_view2 = model.fitted_first_level_learners_[1]
fitted_view3 = model.fitted_first_level_learners_[2]
fitted_view4 = model.fitted_first_level_learners_[3]
fitted_view5 = model.fitted_first_level_learners_[4]
fitted_view6 = model.fitted_first_level_learners_[5]


In [None]:
predictions_v1 = fitted_view1.predict(X_test.values[:,ind_v1])
print(np.sum(y_test == predictions_v1) / len(y_test))

predictions_v2 = fitted_view2.predict(X_test.values[:,ind_v2])
print(np.sum(y_test == predictions_v2) / len(y_test))

predictions_v3 = fitted_view3.predict(X_test.values[:,ind_v3])
print(np.sum(y_test == predictions_v3) / len(y_test))

predictions_v4 = fitted_view4.predict(X_test.values[:,ind_v4])
print(np.sum(y_test == predictions_v4) / len(y_test))

predictions_v5 = fitted_view5.predict(X_test.values[:,ind_v5])
print(np.sum(y_test == predictions_v5) / len(y_test))

predictions_v6 = fitted_view6.predict(X_test.values[:,ind_v6])
print(np.sum(y_test == predictions_v6) / len(y_test))

In [None]:
# Confusion matrix 1.
str_predictions = predictions_v1
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 1')
plt.savefig("cm_1.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Confusion matrix 2.
str_predictions = predictions_v2
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 2')
plt.savefig("cm_2.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Confusion  matrix 3.
str_predictions = predictions_v3
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 3')
plt.savefig("cm_3.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Confusion  matrix 4.
str_predictions = predictions_v4
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 4')
plt.savefig("cm_4.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Confusion  matrix 5.
str_predictions = predictions_v5
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 5')
plt.savefig("cm_5.pdf", format="pdf", bbox_inches="tight")
plt.show()

# Confusion  matrix 6.
str_predictions = predictions_v6
str_groundtruth = y_test
cm = confusion_matrix(str_groundtruth, str_predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, 
                              display_labels=np.unique(str_groundtruth))
disp.plot(xticks_rotation = 'vertical', cmap=plt.cm.Blues)
plt.title('Confusion Matrix View 6')
plt.savefig("cm_6.pdf", format="pdf", bbox_inches="tight")
plt.show()

### Model with concatenated features

The views' features can also be combined by concatenating them. That is, training a single model with all the views.

In [None]:
m_concat = RandomForestClassifier(n_estimators=50, random_state=123)
m_concat.fit(X_train, y_train)

In [None]:
predictions = m_concat.predict(X_test)
print(np.sum(y_test == predictions) / len(y_test))

Here, we can see that multi-view stacking perfomred better (0.988), compared to just concatenating all views. 