#### ENSEMBLE LEARNING
A group of predictors is called an __ensemble__

Aggregating the predictions of a group of predictors is __Ensemble Learning__.

An Ensemble Learning algorithm is called an __Ensemble method__.



## Voting Classifiers

<img src="../notes_images/ensemble1.png" width=400 height=350>

A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classifier is called a hard voting classifier.

Voting classifier often achieves a higher accuracy than the best classifier in the ensemble. In fact, even if each classifier is a weak learner (meaning it does only slightly better than random guessing), the ensemble can still be a strong learner (achieving high accuracy), provided there are a sufficient number of weak learners and they are sufficiently diverse.


#### Hard Voting Classifier
A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classifier is called a hard voting classifier

#### Soft Voting Classifier
If all classifiers are able to estimate class probabilities (i.e., they have a predict_proba() method), then you can tell Scikit-Learn to predict the class with the highest class probability, averaged over all the individual classifiers. This is called soft voting.

In [26]:
# example of voting classifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
data = load_wine()

In [62]:
df = pd.DataFrame(data['data'])
df.columns = data['feature_names']
df['target'] = data['target']
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,:len(df.columns)-1],df.iloc[:,len(df.columns)-1], test_size=0.33)


Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0,2
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0,2
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0,2
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0,2


In [89]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(kernel='rbf', C=100, probability=True)

voting_clf = VotingClassifier(
estimators = [('lc',log_clf), ('rc', rnd_clf), ('svc', svm_clf)],
    voting= 'soft'#'hard'
)

pipe_clf = Pipeline(
[('scalar', StandardScaler()),('vot_clf',voting_clf)])

pipe_clf.fit(X_train, y_train)

Pipeline(steps=[('scalar', StandardScaler()),
                ('vot_clf',
                 VotingClassifier(estimators=[('lc', LogisticRegression()),
                                              ('rc', RandomForestClassifier()),
                                              ('svc',
                                               SVC(C=100, probability=True))],
                                  voting='soft'))])

In [90]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, pipe_clf):
    scalar = StandardScaler()
    X_train_scaled =  scalar.fit_transform(X_train)
    clf.fit(X_train_scaled, y_train)
    X_test_scaled = scalar.transform(X_test)
    y_pred = clf.predict(X_test_scaled)
    print(clf.__class__, accuracy_score(y_test, y_pred))

<class 'sklearn.linear_model._logistic.LogisticRegression'> 0.9830508474576272
<class 'sklearn.ensemble._forest.RandomForestClassifier'> 0.9830508474576272
<class 'sklearn.svm._classes.SVC'> 0.9830508474576272
<class 'sklearn.pipeline.Pipeline'> 1.0


In [91]:
svm_classifier = SVC(kernel='rbf', C=100, probability=True)
svm_classifier.fit(X_train,y_train)

y_pred = svm_classifier.predict_proba(X_test)


In [92]:
pd.DataFrame({'y':pipe_clf.predict(X_test), 'y_true':y_test})
y_pred

array([[0.01196076, 0.83379765, 0.15424159],
       [0.00662472, 0.92475823, 0.06861706],
       [0.01787197, 0.71416273, 0.2679653 ],
       [0.84252593, 0.12780008, 0.02967399],
       [0.40475602, 0.21024496, 0.38499902],
       [0.02523919, 0.68669689, 0.28806392],
       [0.08594689, 0.33031732, 0.58373579],
       [0.00998081, 0.84421051, 0.14580868],
       [0.87523394, 0.06930813, 0.05545793],
       [0.00524639, 0.93589352, 0.05886009],
       [0.74942495, 0.24910006, 0.00147499],
       [0.76831637, 0.15446479, 0.07721884],
       [0.00695286, 0.87447781, 0.11856933],
       [0.82257576, 0.1730821 , 0.00434214],
       [0.05830503, 0.38513649, 0.55655848],
       [0.00698239, 0.90894507, 0.08407254],
       [0.90217358, 0.048742  , 0.04908442],
       [0.00504048, 0.92265245, 0.07230707],
       [0.12264159, 0.32539103, 0.55196738],
       [0.17153853, 0.22585115, 0.60261032],
       [0.88757684, 0.09518146, 0.0172417 ],
       [0.83675168, 0.12057407, 0.04267425],
       [0.

In [93]:
log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(kernel='rbf', C=100, probability=True)

std_scalar = StandardScaler()
X_train_scaled = std_scalar.fit_transform(X_train)
log_clf.fit(X_train_scaled,y_train)
rnd_clf.fit(X_train_scaled,y_train)
svm_clf.fit(X_train_scaled,y_train)

X_test_scaled = std_scalar.transform(X_test)

print(log_clf.predict_proba(X_test_scaled))
print(rnd_clf.predict_proba(X_test_scaled))
print(svm_clf.predict_proba(X_test_scaled))

[[4.50002901e-04 9.98906359e-01 6.43638144e-04]
 [5.55700469e-04 9.99260716e-01 1.83583303e-04]
 [1.95797421e-03 5.41542916e-04 9.97500483e-01]
 [9.21570906e-01 6.55610942e-02 1.28679997e-02]
 [8.99291499e-01 9.93767934e-02 1.33170810e-03]
 [3.63437539e-04 9.98499363e-01 1.13719918e-03]
 [2.38825452e-03 1.38173738e-04 9.97473572e-01]
 [2.08627583e-04 9.99351280e-01 4.40092573e-04]
 [4.66097296e-01 5.33198557e-01 7.04146920e-04]
 [2.90825716e-03 9.96917407e-01 1.74335788e-04]
 [9.99961516e-01 4.73818703e-06 3.37456057e-05]
 [9.14234330e-01 7.41631603e-02 1.16025098e-02]
 [2.89766066e-04 2.97111083e-03 9.96739123e-01]
 [9.97794175e-01 1.47804298e-03 7.27781766e-04]
 [3.07784620e-03 4.52739233e-04 9.96469415e-01]
 [6.32474347e-03 9.85906907e-01 7.76834969e-03]
 [5.35898389e-01 4.63634986e-01 4.66625595e-04]
 [1.77274719e-02 9.81981476e-01 2.91052625e-04]
 [1.27284167e-01 8.67725178e-01 4.99065517e-03]
 [3.48211211e-03 2.31106289e-04 9.96286782e-01]
 [9.94439486e-01 2.99971613e-03 2.560798

## BAGGING AND PASTING
Using the same training algorithm for every predictor(e.g. regressor), but to train them on different random subsets of the training set. 
When sampling is performed 
1. with replacement, this method is called bagging(short forbootstrap aggregating 
2. without replacement, it is called pasting.

__Bagging__ takes the model with high variance and low bias and reduces the variance without affecting the bias.

__Boosting__ takes the model with low variance and high bias and reduces the bias without affecting the variance.

Once all predictors are trained, the ensemble can make a prediction for a new instance by simply aggregating the predictions of all predictors. 

The aggregation function  typically :
1. The statistical mode (i.e., the most frequent prediction, just like a hard voting classifier) for classification
2. The average for regression.

Each individual predictor has a higher bias than if it were trained on the original training set, but
aggregation reduces both bias and variance.

Generally, the net result is that the ensemble has a similar bias but a lower variance than a single predictor trained on the original training set.

All predictors can run on different core therefore they are faster.

#### Single Decision Tree vs a bagging ensemble of 500 trees
<img src="../notes_images/bagging_res.png">

__It can be clearly seen that bias remains the same but the variance is reduced.__


### Out Of Bag Evaluation
In bagging some instances may be sampled several times for any given predictor, while others may not be sampled at all. By default a BaggingClassifier samples m training instances with replacement ( bootstrap=True ), where m is the size of the training set. This means that only about 63% of the training instances are sampled on average for each predictor. 6 The remaining 37% of the training instances that are not sampled are called out-of-bag (oob) instances. Note that they are not the same 37% for all predictors.

### Random Patches and Random Subspaces

The BaggingClassifier class supports sampling the features as well. This is controlled by two hyperparameters: max_features and bootstrap_features . They work the same way as max_samples and bootstrap , but for feature sampling instead of instance sampling. Thus, each predictor will be trained on a random subset of the input features.

This is particularly useful when you are dealing with high-dimensional inputs (such as images). Sampling both training instances and features is called the __Random Patches__ method. Keeping all training instances (i.e., bootstrap=False and max_sam ples=1.0 ) but sampling features (i.e., bootstrap_features=True and/or max_fea
tures smaller than 1.0) is called the __Random Subspaces method__.

Trades more bias for the lower variance.


In [102]:
from sklearn.datasets import load_breast_cancer
import pandas as pd
data = load_breast_cancer()

In [121]:
df = pd.DataFrame(data['data'], columns=data['feature_names'].tolist())
data['target']

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [125]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df,data['target'], test_size=0.20)


In [147]:
from sklearn.ensemble import BaggingClassifier
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
DecisionTreeClassifier(), n_estimators=500,
    max_samples=150, bootstrap=True, n_jobs=-1,oob_score=True
)

pipe_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('baggedDtreeClf', bag_clf)
])

pipe_clf.fit(X_train, y_train)

Pipeline(steps=[('scaler', StandardScaler()),
                ('baggedDtreeClf',
                 BaggingClassifier(base_estimator=DecisionTreeClassifier(),
                                   max_samples=150, n_estimators=500, n_jobs=-1,
                                   oob_score=True))])

In [152]:
# out of bag score, average of all the scores on the data not included with
pipe_clf['baggedDtreeClf'].oob_score_
pipe_clf['baggedDtreeClf'].oob_decision_function_ #gives the class probabilities

array([[0.05633803, 0.94366197],
       [0.19774011, 0.80225989],
       [0.92021277, 0.07978723],
       [0.        , 1.        ],
       [0.        , 1.        ],
       [0.00534759, 0.99465241],
       [0.08791209, 0.91208791],
       [0.02739726, 0.97260274],
       [0.        , 1.        ],
       [0.27019499, 0.72980501],
       [0.47540984, 0.52459016],
       [1.        , 0.        ],
       [0.48209366, 0.51790634],
       [1.        , 0.        ],
       [0.03116147, 0.96883853],
       [0.99430199, 0.00569801],
       [0.95786517, 0.04213483],
       [0.47252747, 0.52747253],
       [0.01101928, 0.98898072],
       [0.02808989, 0.97191011],
       [0.00533333, 0.99466667],
       [0.06779661, 0.93220339],
       [1.        , 0.        ],
       [1.        , 0.        ],
       [0.        , 1.        ],
       [0.85555556, 0.14444444],
       [0.99714286, 0.00285714],
       [0.29096045, 0.70903955],
       [0.94198895, 0.05801105],
       [0.        , 1.        ],
       [0.

In [136]:
y_cap = pipe_clf.predict(X_test)
pd.DataFrame({'y_cap': y_cap, 'y':y_test})

Unnamed: 0,y_cap,y
0,1,1
1,1,1
2,1,1
3,0,0
4,0,0
...,...,...
109,0,0
110,1,1
111,0,0
112,1,1


In [144]:
from sklearn.metrics import f1_score, accuracy_score, precision_score, recall_score

print('accuracy score: %s, f1_score: %s, precision_score: %s, recall_score: %s' %(accuracy_score(y_test, y_cap),
f1_score(y_test, y_cap), precision_score(y_test, y_cap), recall_score(y_test, y_cap)))

accuracy score: 0.9736842105263158, f1_score: 0.9793103448275862, precision_score: 0.9594594594594594, recall_score: 1.0


In [145]:
# same model without bagging 
pipe_clf_unbag = Pipeline([
    ('scaler', StandardScaler()),
    ('decision_clf', DecisionTreeClassifier())
])
pipe_clf_unbag.fit(X_train, y_train)
y_cap = pipe_clf_unbag.predict(X_test)

In [146]:
print('accuracy score: %s, f1_score: %s, precision_score: %s, recall_score: %s' %(accuracy_score(y_test, y_cap),
f1_score(y_test, y_cap), precision_score(y_test, y_cap), recall_score(y_test, y_cap)))

accuracy score: 0.9298245614035088, f1_score: 0.9428571428571428, precision_score: 0.9565217391304348, recall_score: 0.9295774647887324
