# Ensembles

**Ensemble models** are frequently winning Machine Learning competitions. Here is how you can build an Ensemble built from multiple classification models. The models are trained separately and are then averaged or vote to produce a prediction:



In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier

models = [
          ('logreg', LogisticRegression()),
          ('tree', DecisionTreeClassifier()),
          ('svm', SVC(kernel='rbf'))
]
m = VotingClassifier(models)

m.fit(X, y)
m.score(X, y)

### Exercise: Train an Ensemble

Build a `VotingClassifier` to predict malign vs. benign tumors:

- try a single classifier first
- combine multiple classifiers with a sklearn.ensemble.VotingClassifier to build a consensus prediction
- check whether the ensemble outperforms a single classifier

#### 1. Read Data into Pandas

#### 2. Split data into X and y

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

X, y = load_breast_cancer(return_X_y=True)
print(X.shape, y.shape)

(569, 30) (569,)


#### 3. Split X data into training and testing sets

In [2]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, 
                                                random_state=42)

#### 4. Exploratory Data Analysis 
#### 5. Feature Engineering
#### 6. Build a model


In [3]:
m = LogisticRegression()

#### 7. Fit/train model i.e. calculate parameters

`m.fit(X, y)`

In [4]:
m.fit(X,y)
m.score(X,y)



0.9595782073813708

In [5]:
m.fit(Xtrain, ytrain)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

In [6]:
m.score(Xtrain, ytrain)

0.9577464788732394

In [7]:
m.score(Xtest, ytest)

0.965034965034965

#### 8. Cross-validation 

#### 9. Calculate model quality - i.e. recall_score, precision_score, accuracy_score

In [8]:
from sklearn.model_selection import cross_val_score

In [9]:
scores = cross_val_score(m, Xtrain, ytrain, cv=5)



In [10]:
scores.mean()

0.9482281284606866

In [11]:
scores.std()

0.02652264430551088

- combine multiple classifiers with a sklearn.ensemble.VotingClassifier to build a consensus prediction
- check whether the ensemble outperforms a single classifier

In [23]:
models = [
    ('logreg', LogisticRegression()),
    ('tree', DecisionTreeClassifier()),
    ('random_forest', RandomForestClassifier()),
    ('linear_svc', SVC(kernel='linear')),
    ('svm', SVC(kernel='rbf'))
]
m = VotingClassifier(models)

m.fit(Xtrain, ytrain)
m.score(Xtrain, ytrain)



1.0

In [13]:
m.score(Xtest, ytest)

0.965034965034965

In [14]:
m.get_params

<bound method VotingClassifier.get_params of VotingClassifier(estimators=[('logreg', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('tree', De...f', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='hard', weights=None)>

#### 10. Optimise hyperparameters 

In [25]:
from sklearn.model_selection import GridSearchCV
import numpy as np

In [26]:
# Create regularization penalty space
penalty = ['l1', 'l2']

# Create regularization hyperparameter space
C = np.logspace(0, 4, 10)

# Create hyperparameter options
hyperparameters = dict(C=C, penalty=penalty)

In [31]:
grid = GridSearchCV(LogisticRegression(), hyperparameters, cv=5, verbose=0)

In [32]:
grid.fit(Xtrain, ytrain)









GridSearchCV(cv=5, error_score='raise-deprecating',
       estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': array([1.00000e+00, 2.78256e+00, 7.74264e+00, 2.15443e+01, 5.99484e+01,
       1.66810e+02, 4.64159e+02, 1.29155e+03, 3.59381e+03, 1.00000e+04]), 'penalty': ['l1', 'l2']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [33]:
final_model = grid.best_estimator_

In [34]:
final_model.fit(Xtrain, ytrain)
final_model.score(Xtrain, ytrain)



0.9882629107981221

In [35]:
final_model.score(Xtest, ytest)

0.965034965034965

In [36]:
final_model

LogisticRegression(C=21.544346900318832, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='warn', n_jobs=None, penalty='l1', random_state=None,
          solver='warn', tol=0.0001, verbose=0, warm_start=False)

### Gradient Boosting
Gradient Boosting is an ensemble learning strategy that trains many weak models. Each model attempts to correctly predict the data points that previous models got wrong.

Gradient Boosting has been used for models with extraordinary accuracy. A popular implementation is the AdaBoost algorithm.