# COMP47590: Advanced Machine Learning
# Assignment 1: The Super Learner

## Import Packages Etc

In [63]:
from sklearn.base import BaseEstimator, ClassifierMixin, clone
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from IPython.display import display, HTML, Image
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import tree, metrics, ensemble
from sklearn.model_selection import train_test_split, cross_val_score,GridSearchCV, KFold
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from scipy.stats import mode,pearsonr

%matplotlib inline
#%qtconsole

## Define Super Learner Classifier

The *Super Learner* is a heterogeneous stacked ensemble classifier. This is a classification model that uses a set of base classifiers of different types, the outputs of which are then combined in another classifier at the stacked layer. The Super Learner was described in [(van der Laan et al, 2007)](https://pdfs.semanticscholar.org/19e9/c732082706f39d2ba12845851309714db135.pdf) but the stacked ensemble idea has been around for a long time. 

Figure 1 shows a flow diagram of the Super Learner process (this is from (van der Laan et al, 2007) and the process is also described in the COMP47590 lecture "[COMP47590 2017-2018 L04 Supervised Learning Ensembles 3](https://www.dropbox.com/s/1ksx94nxtuyn4l8/COMP47590%202017-2018%20L04%20Supervised%20Learning%20Ensembles%203.pdf?raw=1)"). The base classifiers are trained and their outputs are combined along with the training dataset labels into a training set for the stack layer classifier. To avoid overfitting the generation of the stacked layer training set uses a k-fold cross validation process (described as V-fold in Figure 1). To further add variety to the base estimators a bootstrapping selection (as is used in the bagging ensemble approach).
 
![Super Learner Process Flow](SuperLearnerProcessFlow.png "Logo Title Text 1")
Figure 1: A flow diagram for the Super Learner


### Define the SuperLearnerClassifier Class

In [None]:
class SuperLearnerClassifier(BaseEstimator, ClassifierMixin):
    """An ensemble classifier that uses heterogeneous models at the base layer and
        a aggregatnio model at the aggregation layer. A k-fold cross validation is
        used to gnerate training data for the stack layer model.
    """
    option1 = ["Decision Tree", "Random Forest", "KNN", "Logistic Regression", "Navie Bayes", "SVM"]
    option2 = ["Decision Tree", "Random Forest", "KNN", "Navie Bayes", "Logistic Regression", "GDBT"]
    option3 = ["Decision Tree", "Random Forest", "AdaBoost", "Navie Bayes", "Logistic Regression", "GDBT"]
    # Constructor for the classifier object
    def __init__(self, training_type = True, layer_type = "Decision Tree", estimators = option1):
        """Setup a SuperLearner classifier
        Parameters
        training_type - True- Label-based  False- Probability-based
        layer_type - choose a models for stack layer(Decision Tree, Random Forest, SVM, Navie Bayes or KNN)
        estimators - choose a combination of the base learners
        """
        
        self.training_type = training_type
        self.layer_type = layer_type
        self.estimators = estimators
        self.superlearner = None
        self.baselearners = {"Decision Tree" : DecisionTreeClassifier(), "Random Forest": RandomForestClassifier(),
                 "SVM" :SVC(probability=True), "KNN": KNeighborsClassifier(), "Navie Bayes": GaussianNB(),
                 "GDBT": GradientBoostingClassifier(), "Logistic Regression": LogisticRegression(), 
                 'AdaBoost':ensemble.AdaBoostClassifier()}
        self.models = {} # To the store the fitted classifiers
    

    def fit_baselearner(self, X, y):
        """In K-fold, fit the base learners using the training set (X, y)
        and use the base learnes to do some prediction(prediction data is for
        traning stack layer model.
        Parameters
        ----------
        X : array-like, shape = [n_samples, n_features]
            The training input samples. 
        y : array-like, shape = [n_samples] 
            The target values (class labels) as integers or strings.
        Returns
        -------
        final_training_data : used for training stack layer model
        """     
        # use K-fold method to train the base classifier
        training_data = []
        true_label = []
        transformed_data = {}
        for key in self.estimators:
            self.models[key] = []
            transformed_data[key] = []
        print("Begin to fit the base learners")
        print("**************************************")
        for k, (train_index, test_index) in enumerate(kfold.split(X, y)):
            true_label.append(y[test_index])  # append to label of test data to list
            for key in self.estimators:
                if key in self.baselearners:
                    model = clone(self.baselearners[key]) #clone
                    # print(model.__hash__())
                    model = model.fit(X[train_index], y[train_index])
                    print("Fold-{0}: Fit {1}".format(k, key))
                    self.models[key].append(model)
        
                if self.training_type == True:
                    pred = model.predict(X[test_index])
                    transformed_data[key].append(pred)

                if self.training_type == False:
                    proba = model.predict_proba(X[test_index])
                    transformed_data[key].append(proba)
        
        # Construct data
        label = np.concatenate(true_label)
        for key1 in transformed_data:
            transformed_data[key1] = np.concatenate(transformed_data[key1])
            training_data.append(transformed_data[key1])
            
        if self.training_type == True:
            training_data_handling = np.array(training_data).T
            final_training_data = np.c_[training_data_handling, X]
            final_training_data = np.c_[final_training_data, label]
            
        if self.training_type == False:
            training_data_handling = training_data[0]
            for i in range(1,len(training_data)):
                training_data_handling = np.concatenate([training_data_handling,training_data[i]], axis = 1)
                final_training_data = np.c_[training_data_handling, X]
                final_training_data = np.c_[final_training_data, label]
                
        print("**************************************")
        print("Base Learners are fitted")
        print("the training data for stack layer learner is generated ")
        print("Begin to fit stack layer learner")
        return final_training_data

    def fit(self, X, y):
        """Build a SuperLearner classifier from the training set (X, y).
        Parameters
        ----------
        X : array-like, shape = [n_samples, n_features]
            The training input samples. 
        y : array-like, shape = [n_samples] 
            The target values (class labels) as integers or strings.
        Returns
        -------
        self : object
        """     
        training_data = self.fit_baselearner(X, y)
        if self.layer_type == "Decision Tree":
            self.layer_type = self.baselearners["Decision Tree"]
            
        if self.layer_type == "Logistic Regression":
            self.layer_type = self.baselearners["Logistic Regression"]
            
        if self.layer_type == "Navie Bayes":
            self.layer_type = self.baselearners["Navie Bayes"]
            
        if self.layer_type == "KNN":
            self.layer_type = self.baselearners["KNN"]
        
        if self.layer_type == "Random Forest":
            self.layer_type = self.baselearners["Random Forest"]
            
        self.superlearner = self.layer_type.fit(training_data[:,:(training_data.shape[1]-1)], 
                                                training_data[:,training_data.shape[1]-1])
        print("The stack layer model has been fitted")
        print("*****************DONE*****************")
        return self

    def generate_test_data(self, X):
        """Construct the testing data for superleaner.
        Parameters
        ----------
        X : array-like, shape = [n_samples, n_features]
            The training input samples. 
        Returns
        -------
        final_testing_data : used for testing super learner
        """ 
        # Label-based data
        if self.training_type == True:
            testing_data = {}
            alist = []
            for key in self.models:
                testing_data[key] = []
                for model in self.models[key]:
                    testing_data[key].append(model.predict(X))
                    data_handling = np.array(testing_data[key]).T
                majority_voting = mode(data_handling, axis=-1)[0] # using majority voting method
                alist.append(majority_voting)
                final_testing_data = np.concatenate(alist, axis = 1)
                final_testing_data = np.c_[final_testing_data, X]
        
        # Proba-based data
        if self.training_type  == False:
            testing_data = {}
            alist = []
            for key in self.models:
                testing_data[key] = []
                for model in self.models[key]:
                    testing_data[key].append(model.predict_proba(X))
            for key in testing_data:
                data_handling = testing_data[key]
                data_ = data_handling[0]
                for i in range(1,len(data_handling)):
                    data_ = data_ +data_handling[i]
                final_ = data_/ k   # compute the average of each row
                alist.append(final_)
            final_testing_data = alist[0]
            for a in range(1, len(alist)):
                final_testing_data = np.concatenate([final_testing_data, alist[a]], axis=1)
            final_testing_data = np.c_[final_testing_data, X]
        return final_testing_data

    def predict(self, X):
        """Predict class labels of the input samples X.
        Parameters
        ----------
        X : array-like matrix of shape = [n_samples, n_features]
            The input samples. 
        Returns
        -------
        pred : predicted label
        """
        pred = self.superlearner.predict(self.generate_test_data(X))
        return pred
    
    def predict_proba(self, X):
        """Predict class labels of the input samples X.
        Parameters
        ----------
        X : array-like matrix of shape = [n_samples, n_features]
            The input samples. 
        Returns
        -------
        proba : predicted label 
        """ 
        proba = self.superlearner.predict(self.generate_test_data(X))
        return proba
    
    def compute_correlation(self, X, y):
        """Compute th correlation between base learner and superlearner.
        Parameters
        ----------
        X : array-like matrix of shape = [n_samples, n_features]
            The input samples. 
        y : array-like, shape = [n_samples] 
            The target values (class labels) as integers or strings.
        Returns
        -------
        proba : predicted label 
        """ 
        base_learner_prediction = self.fit_baselearner(X, y)
        super_learner_prediction = self.predict(X)
        shape = base_learner_prediction.shape
        print(shape[1])
        print("The correlation between base learners and superleaner is as below ")
        print("Order:" + str(self.estimators))
        for i in range(0, shape[1]-1- X.shape[1]):
            correlation = pearsonr(base_learner_prediction[:,i], super_learner_prediction)
            print(correlation)


### Test the SuperLearnerClassifier

Perform a simple test using the SuperLearnClassifier on the Iris dataset

In [3]:
from sklearn.datasets import load_iris
k = 5
kfold = KFold(k)
clf = SuperLearnerClassifier()
iris = load_iris()
clf.fit(iris.data, iris.target)
cross_val_score(clf, iris.data, iris.target, cv=5)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


array([ 0.93333333,  1.        ,  0.93333333,  0.93333333,  1.        ])

## Load & Partition Data

### Setup - IMPORTANT

Take only a sample of the dataset for fast testing

In [4]:
data_sampling_rate = 0.05

Setup the number of folds for all grid searches (should be 5 - 10)

In [5]:
models = {}
k = 5
kfold = KFold(k)

### Load Dataset

Load the dataset and explore it.

In [6]:
dataset = pd.read_csv('fashion-mnist_train.csv')
dataset = dataset.sample(frac=data_sampling_rate) #take a sample from the dataset so everyhting runs smoothly
num_classes = 10
classes = {0: "T-shirt/top", 1:"Trouser", 2: "Pullover", 3:"Dress", 4:"Coat", 5:"Sandal", 6:"Shirt", 7:"Sneaker", 8:"Bag", 9:"Ankle boot"}

In [7]:
dataset.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
27795,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
35792,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
45348,3,0,0,0,0,0,0,0,0,0,...,101,0,0,0,0,0,0,0,0,0
53366,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
25871,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
dataset.head().values

array([[8, 0, 0, ..., 0, 0, 0],
       [5, 0, 0, ..., 0, 0, 0],
       [3, 0, 0, ..., 0, 0, 0],
       [7, 0, 0, ..., 0, 0, 0],
       [8, 0, 0, ..., 0, 0, 0]], dtype=int64)

### Pre-process & Partition Data

Perform data pre-processing and manipulation as required

In [9]:
X = dataset[dataset.columns[1:]].values
y = np.array(dataset["label"])

In [10]:
type(X)

numpy.ndarray

In [11]:
# Normalise the data
X = X/255

In [12]:
#Split the data into a training set, a vaidation set, and a test set
X_train_plus_valid, X_test, y_train_plus_valid, y_test \
    = train_test_split(X, y, random_state=0, \
                                    train_size = 0.7)
X_train, X_valid, y_train, y_valid \
    = train_test_split(X_train_plus_valid, \
                                        y_train_plus_valid, \
                                        random_state=0, \
                                        train_size = 0.5/0.7)



## Train and Evaluate a Simple Model

Train a Super Learner Classifier using the prepared dataset

Case -1 label based training data, Stack layer classifier is Decision Tree

In [13]:
clf1 = SuperLearnerClassifier()
superlearner1 = clf1.fit(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


In [14]:
pred1 = superlearner1.predict(X_test)

Evaluate the trained classifier

In [15]:
accuracy1 = metrics.accuracy_score(pred1, y_test)
accuracy1

0.77333333333333332

## Cross Validation Experiment (Task 2)

Perfrom a 10-fold cross validation experiment to evaluate the performance of the SuperLearnerClassifier

In [16]:
score = cross_val_score(clf1, X_train_plus_valid, y_train_plus_valid, cv=10)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes


In [17]:
score

array([ 0.76851852,  0.75813953,  0.78604651,  0.77934272,  0.77990431,
        0.75480769,  0.7815534 ,  0.77184466,  0.7815534 ,  0.83980583])

## Comparing the Performance of Different Stack Layer Approaches (Task 5)

Compare the performance of the ensemble when a label based stack layer training set and a probability based stack layer training set is used.

Case -2 probability based training data, Stack layer classifier is Decision Tree

In [18]:
clf2 = SuperLearnerClassifier(training_type = False)
superlearner2 = clf2.fit(X_train_plus_valid, y_train_plus_valid)
pred2 = superlearner2.predict_proba(X_test)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


In [19]:
accuracy2 = metrics.accuracy_score(pred2, y_test)
accuracy2

0.76777777777777778

Case 3 - label based training data, Stack layer classifier is Logistic Regression 

In [20]:
clf3 = SuperLearnerClassifier(layer_type = "Logistic Regression")
superlearner3 = clf3.fit(X_train_plus_valid, y_train_plus_valid)
pred3 = superlearner3.predict(X_test)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


In [21]:
accuracy3 = metrics.accuracy_score(pred3, y_test)
accuracy3

0.54666666666666663

Case 4 - probability based training data, Stack layer classifier is Logistic Regression 

In [22]:
clf4 = SuperLearnerClassifier(training_type = False, layer_type = "Logistic Regression")
superlearner4 = clf4.fit(X_train_plus_valid, y_train_plus_valid)
pred4 = superlearner4.predict_proba(X_test)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


In [23]:
accuracy4 = metrics.accuracy_score(pred4, y_test)
accuracy4

0.80555555555555561

## Grid Search Through SuperLearnerClassifier Architectures & Parameters (Task 7)

Perfrom a grid search experiment to detemrine the optimal architecture and hyper-parameter values for the SuperLearnClasssifier for the MNIST Fashion classification problem.

In [24]:
option1 = ["Decision Tree", "Random Forest", "KNN", "Logistic Regression", "Navie Bayes", "SVM"]
option2 = ["Decision Tree", "Random Forest", "KNN", "Navie Bayes", "Logistic Regression", "GDBT"]
option3 = ["Decision Tree", "Random Forest", "AdaBoost", "Navie Bayes", "Logistic Regression", "GDBT"]
param_grid = {'training_type': [True, False], 
             'layer_type':('Decision Tree','Logistic Regression', 'KNN', 'Navie Bayes'),
            'estimators':[option1, option2, option3]}

In [25]:
my_tuned_model = GridSearchCV(SuperLearnerClassifier(param_grid['training_type'], param_grid['layer_type'], 
                                                     param_grid['estimators']), param_grid, cv=5, verbose = 2)

In [26]:
my_tuned_model.fit(X_train_plus_valid, y_train_plus_valid)

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Decision Tree, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
******************

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.1min remaining:    0.0s


Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic R

Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Decision Tree, training_type=False, total=  44.0s
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Decision Tree, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-

Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Logistic Regression, training_type=True, total=  42.4s
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Logistic Regression, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision 

[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Logistic Regression, training_type=False, total=  42.7s
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=KNN, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4

Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=KNN, training_type=Fal

Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Logistic Regression
Fold-4: Fit Navie Bayes
Fold-4: Fit SVM
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Navie Bayes, training_type=True, total=  42.9s
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Navie Bayes, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit

[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Navie Bayes, training_type=False, total=  43.6s
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], layer_type=Navie Bayes, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Logistic Regression
Fold-0: Fit Navie Bayes
Fold-0: Fit SVM
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Logistic Regression
Fold-1: Fit Navie Bayes
Fold-1: Fit SVM
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Logistic Regression
Fold-2: Fit Navie Bayes
Fold-2: Fit SVM
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Logistic Regression
Fold-3: Fit Navie Bayes
Fold-3: Fit SVM
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-

Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie

Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=True, total= 3.9min
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=True 
B

Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=False, total= 3.7min
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit De

[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=True, total= 3.7min
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
F

Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie

Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Navie Bayes, training_type=False, total= 4.2min
[CV] estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Navie Bayes, training_type=False 
Begin to fit the base learners
*

Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Decision Tree, training_type=True, total= 4.3min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Decision Tree, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold

Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Decision Tree, training_type=False, total= 3.9min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Decision Tree, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold

Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=True, total= 3.8min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Logistic Regression, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decisio

Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=True, total= 3.8min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest


Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=False, total= 3.8min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=KNN, training_type=False 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: F

[CV]  estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Navie Bayes, training_type=True, total= 3.9min
[CV] estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'], layer_type=Navie Bayes, training_type=True 
Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tr

Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************
[CV]  estimators=['Decision Tree', 'Rand

[Parallel(n_jobs=1)]: Done 120 out of 120 | elapsed: 365.1min finished


Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*****************


GridSearchCV(cv=5, error_score='raise',
       estimator=SuperLearnerClassifier(estimators=[['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], ['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], ['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT']],
            layer_type=('Decision Tree', 'Logistic Regression', 'KNN', 'Navie Bayes'),
            training_type=[True, False]),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'training_type': [True, False], 'layer_type': ('Decision Tree', 'Logistic Regression', 'KNN', 'Navie Bayes'), 'estimators': [['Decision Tree', 'Random Forest', 'KNN', 'Logistic Regression', 'Navie Bayes', 'SVM'], ['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'], ['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT']]},
       pre_dispatch='2*n_jobs', refit=True, re

In [33]:
my_tuned_model.best_params_

{'estimators': ['Decision Tree',
  'Random Forest',
  'AdaBoost',
  'Navie Bayes',
  'Logistic Regression',
  'GDBT'],
 'layer_type': 'Logistic Regression',
 'training_type': False}

Evaluate the performance of the model selected by the grid search on a hold-out dataset

In [28]:
pred5 = my_tuned_model.predict(X_test)

In [31]:
accuracy5 = metrics.accuracy_score(pred5, y_test)

In [32]:
accuracy5

0.82777777777777772

## Evaluating the Impact of Adding Original Descriptive Features at the Stack Layer (Task 8)

Evaluate the impact of adding original descriptive features at the stack layer.

In [42]:
my_tuned_model.best_params_

{'estimators': ['Decision Tree',
  'Random Forest',
  'AdaBoost',
  'Navie Bayes',
  'Logistic Regression',
  'GDBT'],
 'layer_type': 'Logistic Regression',
 'training_type': False}

In [53]:
clf_final = SuperLearnerClassifier(training_type = False, 
                                   layer_type = "Logistic Regression", 
                                   estimators = ["Decision Tree", "Random Forest", "AdaBoost", "Navie Bayes", "Logistic Regression", "GDBT"])

In [59]:
clf_final.fit(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*********

SuperLearnerClassifier(estimators=['Decision Tree', 'Random Forest', 'AdaBoost', 'Navie Bayes', 'Logistic Regression', 'GDBT'],
            layer_type=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
            training_type=False)

In [60]:
pred6 = clf_final.predict(X_test)

In [61]:
accuracy6 = metrics.accuracy_score(pred6, y_test)
accuracy6

0.84333333333333336

## Explore the Ensemble Model (Task 9)

Perform an analysis to investigate the strength of the base estimators and the strengths of the correlations between them.

case1 - estimators = ["Decision Tree", "Random Forest", "AdaBoost", "Navie Bayes", "Logistic Regression", "GDBT"], layer_type = "Logistic Regression", traning_type = False - poor weakly correlated with each other

In [64]:
clf_final.compute_correlation(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit AdaBoost
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit AdaBoost
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit AdaBoost
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit AdaBoost
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit AdaBoost
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
845
The correlation between base learners and s

In [66]:
accuracy6

0.82333333333333336

case2 - estimators = ["Decision Tree", "Random Forest", "KNN", "Navie Bayes", "Logistic Regression", "GDBT"], layer_type = "Logistic Regression"
traning_type = False

In [72]:
clf_case2 = SuperLearnerClassifier(training_type = False, 
                                   layer_type = "Logistic Regression", 
                                   estimators = ["Decision Tree", "Random Forest", "KNN", "Navie Bayes", "Logistic Regression", "GDBT"])

In [73]:
clf_case2.fit(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*************

SuperLearnerClassifier(estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'],
            layer_type=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
            training_type=False)

In [74]:
clf_case2.compute_correlation(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
845
The correlation between base learners and superleaner is as below 
O

In [75]:
pred7 = clf_case2.predict(X_test)

In [76]:
accuracy7 = metrics.accuracy_score(pred7, y_test)
accuracy7

0.83333333333333337

case3 - estimators = option3 = ["Decision Tree", "Random Forest", "AdaBoost", "Navie Bayes", "Logistic Regression", "GDBT"], layer_type = "Logistic Regression"
traning_type = False

In [77]:
clf_case3 = SuperLearnerClassifier(training_type = False, 
                                   layer_type = "Logistic Regression",
                                   estimators = ["Decision Tree", "Random Forest", "KNN", "Navie Bayes", "Logistic Regression", "GDBT"])

In [78]:
clf_case3.fit(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
The stack layer model has been fitted
*****************DONE*************

SuperLearnerClassifier(estimators=['Decision Tree', 'Random Forest', 'KNN', 'Navie Bayes', 'Logistic Regression', 'GDBT'],
            layer_type=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False),
            training_type=False)

In [79]:
clf_case3.compute_correlation(X_train_plus_valid, y_train_plus_valid)

Begin to fit the base learners
**************************************
Fold-0: Fit Decision Tree
Fold-0: Fit Random Forest
Fold-0: Fit KNN
Fold-0: Fit Navie Bayes
Fold-0: Fit Logistic Regression
Fold-0: Fit GDBT
Fold-1: Fit Decision Tree
Fold-1: Fit Random Forest
Fold-1: Fit KNN
Fold-1: Fit Navie Bayes
Fold-1: Fit Logistic Regression
Fold-1: Fit GDBT
Fold-2: Fit Decision Tree
Fold-2: Fit Random Forest
Fold-2: Fit KNN
Fold-2: Fit Navie Bayes
Fold-2: Fit Logistic Regression
Fold-2: Fit GDBT
Fold-3: Fit Decision Tree
Fold-3: Fit Random Forest
Fold-3: Fit KNN
Fold-3: Fit Navie Bayes
Fold-3: Fit Logistic Regression
Fold-3: Fit GDBT
Fold-4: Fit Decision Tree
Fold-4: Fit Random Forest
Fold-4: Fit KNN
Fold-4: Fit Navie Bayes
Fold-4: Fit Logistic Regression
Fold-4: Fit GDBT
**************************************
Base Learners are fitted
the training data for stack layer learner is generated 
Begin to fit stack layer learner
845
The correlation between base learners and superleaner is as below 
O

In [80]:
pred8 = clf_case3.predict(X_test)

In [81]:
accuracy8 = metrics.accuracy_score(pred8, y_test)
accuracy8

0.83111111111111113