Content : 
- Bagging
- Boosting
- Stacking

# Ensemble Intro

If you wanted to buy a new phone, will you just go by yourself into a store, and make up your choice at the moment ? I doubt you will. Most people will ask for others opinion, read experts reviews, (and you should too). The point is you can get better decisions by having more opinions. This is the concept of ensemble. 
In most cases, there are three types of ensemble modelling: 
- Bagging: Decrease model's variance
- Boosting: Decrease model's bias
- Stacking: Increase the predictive force of classifier


## Bagging
Stands for Bootstrap Aggregating. Bootstrapping is resampling data with the same cardinality from the dataset, that could help reducing the model's variance. How bootstrapping could reduce the variance? Well, by splitting our data into several small samples and feed them into homogenous classifier, we would likely to create less-overfitted models. With less overfit means, is, a lower variance. 

![](res/bagging.jpeg)
Source: [Medium1](https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de)

This method is effective if you're working with limited data, since the resample methods allows you to estimeate the score by aggregating them (the classifiers's result). 

## Boosting

The main idea of boosting is to make a "teamwork" between models. Unlike Bagging where each model runs independently and then aggregated at the end, Boosting will add the models sequentially, one after another. A new base-learner model will be trained from the error of preivious model, hence it will boost the knowledge of the whole model gradually. 

![](res/boosting.png)

If you notice that the process of boosting will make the learner better and better (at least, that's what we hoped for). This will make the model to fit more into our data, wich leading to smaller bias. Here's a visualization of how Boosting affect the model's fitness over a number of learner. Larger number of learner resulting in better fitness of the model. You will see that as the number of learner increase, the model will fit better to the data. 

![](res/bosting.gif)

## Stacking

The previous two ensemble methods, Bagging and Boosting uses **the same** model learner. If it's decision tree, then all the learner are decision tree. Unlike both of them, Stacking usually uses **differents** learner to train. The other difference is Stacking creates a Meta Model as it's final model to predict the output. Unlike Bagging where each models generates outputs, and then followed by aggregating the output (usually by average), Stacking will create a **meta-model** (usually a neural network) on the last process to model the output of model learner (some people call it base learner).


![](res/stacking.png)

I wish you understand this method or at least get familiar with it, since it's very similiar to Neural Network wich we will learn later.

# Random Forest

Random forest is an example of Bagging method. The forest means a bunch of decision trees (it's homogen), and the random means there are several randomness in the model that makes this model can perform well. Two randomness of random forests are :
- Each tree is build from random sample of data
- At each tree node, a subset of feature are randomly selected to generate the best split. 



In [19]:
import pandas as pd
import time
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split


We will now try to classify a pulsar star from the data of pulsar star classification, hosted in [kaggle](https://www.kaggle.com/pavanraj159/predicting-a-pulsar-star)

In [85]:
pstar = pd.read_csv('data/pulsar-star/pulsar_stars.csv')
pstar.head()

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve,target_class
0,140.5625,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225,0
1,102.507812,58.88243,0.465318,-0.515088,1.677258,14.860146,10.576487,127.39358,0
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909,0
3,136.75,57.178449,-0.068415,-0.636238,3.642977,20.95928,6.896499,53.593661,0
4,88.726562,40.672225,0.600866,1.123492,1.17893,11.46872,14.269573,252.567306,0


In [21]:
pstar.describe()

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve,target_class
count,17898.0,17898.0,17898.0,17898.0,17898.0,17898.0,17898.0,17898.0,17898.0
mean,111.079968,46.549532,0.477857,1.770279,12.6144,26.326515,8.303556,104.857709,0.091574
std,25.652935,6.843189,1.06404,6.167913,29.472897,19.470572,4.506092,106.51454,0.288432
min,5.8125,24.772042,-1.876011,-1.791886,0.213211,7.370432,-3.13927,-1.976976,0.0
25%,100.929688,42.376018,0.027098,-0.188572,1.923077,14.437332,5.781506,34.960504,0.0
50%,115.078125,46.947479,0.22324,0.19871,2.801839,18.461316,8.433515,83.064556,0.0
75%,127.085938,51.023202,0.473325,0.927783,5.464256,28.428104,10.702959,139.309331,0.0
max,192.617188,98.778911,8.069522,68.101622,223.39214,110.642211,34.539844,1191.000837,1.0


In [22]:
pstar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17898 entries, 0 to 17897
Data columns (total 9 columns):
 Mean of the integrated profile                  17898 non-null float64
 Standard deviation of the integrated profile    17898 non-null float64
 Excess kurtosis of the integrated profile       17898 non-null float64
 Skewness of the integrated profile              17898 non-null float64
 Mean of the DM-SNR curve                        17898 non-null float64
 Standard deviation of the DM-SNR curve          17898 non-null float64
 Excess kurtosis of the DM-SNR curve             17898 non-null float64
 Skewness of the DM-SNR curve                    17898 non-null float64
target_class                                     17898 non-null int64
dtypes: float64(8), int64(1)
memory usage: 1.2 MB


Assuming we don't need data scaling (wich is, true), let's tru to cross-validate our model

In [86]:
X_train, X_valid, y_train, y_valid = train_test_split(pstar.loc[:, pstar.columns != 'target_class'],
                                                      pstar['target_class'],
                                                      test_size = 0.2,
                                                      random_state = int(time.time())
                                                     )

In [87]:
pstar.loc[:, pstar.columns != 'target_class']

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve
0,140.562500,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225
1,102.507812,58.882430,0.465318,-0.515088,1.677258,14.860146,10.576487,127.393580
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909
3,136.750000,57.178449,-0.068415,-0.636238,3.642977,20.959280,6.896499,53.593661
4,88.726562,40.672225,0.600866,1.123492,1.178930,11.468720,14.269573,252.567306
...,...,...,...,...,...,...,...,...
17893,136.429688,59.847421,-0.187846,-0.738123,1.296823,12.166062,15.450260,285.931022
17894,122.554688,49.485605,0.127978,0.323061,16.409699,44.626893,2.945244,8.297092
17895,119.335938,59.935939,0.159363,-0.743025,21.430602,58.872000,2.499517,4.595173
17896,114.507812,53.902400,0.201161,-0.024789,1.946488,13.381731,10.007967,134.238910


In [88]:
pstar['target_class'].value_counts()

0    16259
1     1639
Name: target_class, dtype: int64

Now let's make a random forest consisting 10 decision trees !

In [89]:
clf = RandomForestClassifier(n_estimators=10, max_depth=4,random_state=0)

In [90]:
clf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=4, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=0, verbose=0,
                       warm_start=False)

In [91]:
y_pred = clf.predict(X_valid)

In [92]:
from sklearn.metrics import classification_report
print(classification_report(y_valid, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.99      0.99      3229
           1       0.93      0.80      0.86       351

    accuracy                           0.97      3580
   macro avg       0.95      0.90      0.92      3580
weighted avg       0.97      0.97      0.97      3580



To see the importance of feature in classifying the classes, we can use `feature_importances_` from our model. We can later remove the unuseful input to make a better model.  

In [95]:
pd.DataFrame({'feature': list(X_train.columns),
              'importance': clf.feature_importances_}).\
sort_values('importance', ascending = False).\
reset_index(drop=True)

Unnamed: 0,feature,importance
0,Mean of the integrated profile,0.375648
1,Skewness of the integrated profile,0.282933
2,Excess kurtosis of the integrated profile,0.269109
3,Standard deviation of the DM-SNR curve,0.04473
4,Mean of the DM-SNR curve,0.012235
5,Excess kurtosis of the DM-SNR curve,0.008124
6,Skewness of the DM-SNR curve,0.00638
7,Standard deviation of the integrated profile,0.000841


Source:
[Statsbot](https://blog.statsbot.co/ensemble-learning-d1dcd548e936), 
[Medium1](https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de),
[medium2](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205),
[Analytics India](https://analyticsindiamag.com/primer-ensemble-learning-bagging-boosting/), 


