# Grid search with scikit-learn

This example shows how to use the `multiviestacking` library with scikit-learn grid search functionality to find the best hyperparameters. We will use the `GridSearchCV` class that tries every possible combination defined in the grid of hyperparameters and returns the best combination based on the specified score (in this example it is *accuracy*). Here the hyperparameters are the meta-learner and the first-level-learnvers. The hyperparameters are specified as a dictionary where the key is the name of the hyperparameter and the value is the list of possible values. In this example we will try a KNN and a Random Forest as the meta-learner. The first-level-learners will be either two SVMs or two KNN classifiers. The `cv` parameter of `GridSearchCV` specifies the number of folds for the cross-validation procedure used to find the best hyperparameters. The `scoring` parameter specifies the metric to be optimized. Since this procedure can be computationally expensive, the number of estimators for random forest was set to 20 as opposed to 50 like in the previous examples.

### Load the dataset.

First, let's load the dataset into a pandas data frame and display the first rows.
The feature names have a prefix of **v1_*** or **v2_***.* The features prefixed with v1_ are mel frequency cepstral coefficients extracted from audio signals. Features prefixed with v2_ are summary statistics extracted from accelerometer signals. Note that column names can be anything. But to make things easier, in this case a prefix was added so we can get the corresponding views' column indices.


In [None]:
import pandas as pd
import numpy as np
from multiviewstacking import MultiViewStacking
from multiviewstacking import load_example_data

(X_train,y_train,X_test,y_test,ind_v1,ind_v2,le) = load_example_data()

X_train.head()

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn import metrics

model = MultiViewStacking(views_indices = [ind_v1, ind_v2], k = 10, 
                          random_state = 100)

grid_params = [{'meta_learner': [KNeighborsClassifier(n_neighbors = 3), 
                                 RandomForestClassifier(n_estimators = 20,
                                                        random_state = 1)],
                'first_level_learners': [[SVC(probability = True,
                                              random_state = 1),
                                          SVC(probability = True, random_state = 1)],
                                         [KNeighborsClassifier(n_neighbors = 3), 
                                          KNeighborsClassifier(n_neighbors = 3)]
                                        ]}]

grid_search = GridSearchCV(model, grid_params, cv = 5, scoring='accuracy')

grid_search.fit(X_train, y_train)

In [None]:
# Print the best hyper-parameters.

print(grid_search.best_params_)
print(grid_search.best_score_)