# Ensemble Models Introduction
## 1. Simple ensemble methods: Voting and avareging
Ensemble models, combine the decisions from multiple models, to improve the overall performance. 

ensemble learning methods employ a group of models where the combined result out of them is almost always better in terms of prediction accuracy as compared to using a single model.

> recommended reading:
> * https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2
* https://towardsdatascience.com/holy-grail-for-bias-variance-tradeoff-overfitting-underfitting-7fad64ab5d76

In [241]:
from pandas import read_csv
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier, BaggingClassifier, \
    AdaBoostClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.utils import shuffle
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [242]:
def classification_results(y, y_pred, name='', classes=['no', 'yes'], add_rep=False):
    acc = accuracy_score(y, y_pred)
                        
    cm = pd.DataFrame(confusion_matrix(y, y_pred), 
                      index=classes, 
                      columns=classes)

    print(name + ' accuracy: ', round(acc,4),'\n')
    print(cm,'\n')
    if (add_rep):
        print(classification_report(y, y_pred))


## Simple Ensemble techniques
To demonstrate simple ensemble methods, we will run few models on the same data, and use their predictions, to hopefully create a better prediction.
### 1. Voting ensemble
The Idea is to choose the label that got more votes from the classifiers. In statistical terms, it called taking the **mode** of the results.

In [243]:
df = read_csv("spambase_csv.csv")

X = df[df.columns[:-1]]
y = df[df.columns[-1]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [244]:
X_train.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_%3B,char_freq_%28,char_freq_%5B,char_freq_%21,char_freq_%24,char_freq_%23,capital_run_length_average,capital_run_length_longest,capital_run_length_total
1458,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.653,0.0,0.0,8.0,38,80
2641,0.0,0.39,0.19,0.0,0.19,0.09,0.0,0.0,0.0,0.0,...,0.0,1.353,0.08,0.0,0.016,0.0,0.0,1.679,17,178
67,0.0,0.0,0.0,0.0,0.0,0.0,1.47,0.0,0.0,1.47,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,1.214,3,17
406,0.0,0.0,1.16,0.0,3.48,0.0,0.0,0.58,0.58,0.0,...,0.0,0.0,0.082,0.0,0.165,0.082,0.0,2.17,12,102
440,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.07,0.0,0.0,...,0.0,0.0,0.19,0.0,0.19,0.38,0.0,3.6,16,72


Let's train 3 different classifiers.

In [245]:
clf1 = LogisticRegression(solver='warn')
clf2 = DecisionTreeClassifier(max_depth=5)
clf3 = SVC(gamma='auto', probability=True)

classifiers = [('LR', clf1), ('DT', clf2), ('SVM', clf3)]

In [246]:
predictions = y_train.to_frame()

for clf_name, clf in classifiers:
    clf.fit(X_train, y_train)
    predictions[clf_name] = clf.predict(X_train)
    
    classification_results(y_train, clf.predict(X_train), name=clf_name + ' on Train:')
    classification_results(y_test, clf.predict(X_test), name=clf_name + ' on Test:')
    

LR on Train: accuracy:  0.9248 

       no   yes
no   1863   101
yes   141  1115 

LR on Test: accuracy:  0.9327 

      no  yes
no   782   42
yes   51  506 

DT on Train: accuracy:  0.9286 

       no   yes
no   1874    90
yes   140  1116 

DT on Test: accuracy:  0.9095 

      no  yes
no   781   43
yes   82  475 

SVM on Train: accuracy:  0.9463 

       no   yes
no   1908    56
yes   117  1139 

SVM on Test: accuracy:  0.8233 

      no  yes
no   702  122
yes  122  435 



In [247]:
predictions[::5].head()

Unnamed: 0,class,LR,DT,SVM
1458,1,1,1,1
3900,0,0,0,0
3202,0,0,0,0
1652,1,1,1,1
3060,0,0,0,0


### VotingClassifier 
The simple ensemble classifier, is implemented in sk-learn, and it has 2 options:
* Hard: Choosing the label with maximal votes.
* Soft: Choosing the label with maximal probability

> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html 

The ensemble classifier is implemented by the [_VotingClassifier_][1] class. The voting itself may be **hard**, which has the obvious meaning of voting or it could be **soft**, which then predicts the class label based on the argmax of the sums of the predicted probalities.

[1]: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html "VotingClassifier class"

In [248]:
classifiers = [('LR', clf1), ('DT', clf2), ('SVM', clf3)]

In [249]:
clf_voting = VotingClassifier(estimators=classifiers,
                              voting='hard')
clf_voting.fit(X_train, y_train)

VotingClassifier(estimators=[('LR', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('DT', Decision...',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='hard', weights=None)

In [250]:
classification_results(y_train, clf_voting.predict(X_train), name='Voting on Train:')
classification_results(y_test, clf_voting.predict(X_test), name='Voting on Test:')


Voting on Train: accuracy:  0.9575 

       no   yes
no   1928    36
yes   101  1155 

Voting on Test: accuracy:  0.9363 

      no  yes
no   798   26
yes   62  495 



In [251]:
predictions['Voting'] = clf_voting.predict(X_train)
predictions[::5].head()

Unnamed: 0,class,LR,DT,SVM,Voting
1458,1,1,1,1,1
3900,0,0,0,0,0
3202,0,0,0,0,0
1652,1,1,1,1,1
3060,0,0,0,0,0


### 'Soft' Voting

In [252]:
for clf_name, clf in classifiers:
    clf.fit(X_train, y_train)
    
    predictions[clf_name+' prob'] = clf.predict_proba(X_train)[:,1]

In [254]:
clf_voting = VotingClassifier(estimators=classifiers,voting='soft')
clf_voting.fit(X_train, y_train)
classification_results(y_train, clf_voting.predict(X_train), name='Voting Prob. on Train:')
classification_results(y_test, clf_voting.predict(X_test), name='Voting Prob. on Test:')

Voting Prob. on Train: accuracy:  0.9581 

       no   yes
no   1924    40
yes    95  1161 

Voting Prob. on Test: accuracy:  0.9327 

      no  yes
no   796   28
yes   65  492 



In [255]:
predictions['Voting Prob'] = clf_voting.predict(X_train)
predictions[::5].head()

Unnamed: 0,class,LR,DT,SVM,Voting,LR prob,DT prob,SVM prob,Voting Prob
1458,1,1,1,1,1,0.761384,0.943396,0.946395,1
3900,0,0,0,0,0,0.21669,0.049026,0.067627,0
3202,0,0,0,0,0,0.020728,0.049026,0.067632,0
1652,1,1,1,1,1,0.999523,0.753846,0.865944,1
3060,0,0,0,0,0,0.202393,0.049026,0.032665,0


### Lets try mean probability

In [261]:
predictions['mean_prob']=predictions[['LR prob','DT prob','SVM prob']].mean(axis=1)
predictions[::15].head()


Unnamed: 0,class,LR,DT,SVM,Voting,LR prob,DT prob,SVM prob,Voting Prob,mean_prob
1458,1,1,1,1,1,0.761384,0.943396,0.946395,1,0.883725
1652,1,1,1,1,1,0.999523,0.753846,0.865944,1,0.873104
1301,1,1,1,1,1,0.933224,0.943396,0.932931,1,0.936517
2901,0,0,0,0,0,0.00076,0.049026,0.067593,0,0.039126
655,1,1,1,1,1,0.99771,0.943396,0.713683,1,0.88493


In [264]:
predictions['mean_pro_pred'] = np.where(predictions['mean_prob']>0.5,1,0)
predictions[::15].head()

Unnamed: 0,class,LR,DT,SVM,Voting,LR prob,DT prob,SVM prob,Voting Prob,mean_prob,mean_pro_pred
1458,1,1,1,1,1,0.761384,0.943396,0.946395,1,0.883725,1
1652,1,1,1,1,1,0.999523,0.753846,0.865944,1,0.873104,1
1301,1,1,1,1,1,0.933224,0.943396,0.932931,1,0.936517,1
2901,0,0,0,0,0,0.00076,0.049026,0.067593,0,0.039126,0
655,1,1,1,1,1,0.99771,0.943396,0.713683,1,0.88493,1


In [268]:
test_predictions = y_test.to_frame()
for clf_name, clf in classifiers:
    clf.fit(X_train, y_train)
    
    test_predictions[clf_name+' prob'] = clf.predict_proba(X_test)[:,1]

In [270]:
test_predictions['mean_prob']=test_predictions[['LR prob','DT prob','SVM prob']].mean(axis=1)
test_predictions['mean_pro_pred'] = np.where(test_predictions['mean_prob']>0.5,1,0)

test_predictions[::15].head()

Unnamed: 0,class,LR prob,DT prob,SVM prob,mean_prob,mean_pro_pred
1108,1,0.997546,0.943396,0.447621,0.796188,1
435,1,0.986773,0.943396,0.888919,0.939696,1
1354,1,0.99061,0.943396,0.686403,0.87347,1
2933,0,0.002047,0.049026,0.061354,0.037476,0
511,1,0.727505,0.916667,0.898651,0.847608,1


In [271]:
classification_results(y_train, predictions['mean_pro_pred'], name='Mean Prob. on Train:')
classification_results(y_test, test_predictions['mean_pro_pred'], name='Mean Prob. on Test:')


Mean Prob. on Train: accuracy:  0.9584 

       no   yes
no   1924    40
yes    94  1162 

Mean Prob. on Test: accuracy:  0.9334 

      no  yes
no   796   28
yes   64  493 

