# Ensemble Learninig

* **Random Forest**
* **AdaBoost**
* **Garadient Boosting**

Ensemble learning, in general, is a model that makes predictions based on a number of different models. By combining individual models, the ensemble model tends to be more flexible (less bias) and less data sensitive (less variance).

Twe most popular ensemble methods are **bagging** and **boosting**.

* **Bagging** : Training a bunch of individual models in a parallel way. Each model is trained by a random subset of the data.
* **Boosting** : Training a bunch of individual models in a sequential way. Each individual model learns from mistakes made by the previous model.

## Random Forest

Random forest is an ensemble model using bagging as the ensemble method and decision tree as the individual.

**Step 1:** Select n random subsets from the training set

**Step 2:** Train n decision trees
* one random subset is used to train one decision tree
* the optimal splits for each decision tree are based on a random subset of features

**Step 3:** Each individual tree predicts the records/candidates in the test set, independently.

**Step 4:** Make the final prediction

For each candidate in the test set, Random Forest uses the class with the majoriy vote as this candidate's final prediction.

## AdaBoost(Adaptive Boosting)

AdaBoost is a boosting ensemble model and works especially well with the decision tree. Boosting model's key is learning from the previous the previous mistakes, e.g. misclassification data points.

AdaBoost learns from the mistakes by increasing the weight of misclassified data points.

**Step 0:** Initialize the weights of data points. Each instance weight is initially set to 1/m .

**Step 1:** Train a decision tree

**Step 2:** Calculate the weighted error rate of the decision tree.
* The weighted error rate is just how many wrong predictions out of total and you treat the wrong predictions differently based on its data point's weight. The higher the weight, the more the corresponding error will be weighted during the calculation of the weighted error rate.

**Step 3:** Calculate this decision tree's weight in the ensemble
* the higher weighted error rate of a tree, the less decision power the tree will be given during the later voting
* the lower weighted error rate of a tree, the higher decision power the tree will be given during the later voting

**Step 4:** Update weights of wrongly classified points

##### the weight of each data point = 
* if the model got this data point correct, the weight stays the same
* if the model got this data point wrong, the new weight of this point = old weight * np.exp(weight of this tree)

**Note:** The higher the weight of the tree (more accurate this tree performs), the more boost (importance) the missclassified data point by this tree will get. The weights of the data points are normalized after all the misclassified points are updated.

**Step 5:** Repeat Step1 (until the number of trees we set to train is reached)

**Step 6:** Make the final prediction

The AdaBoost makes a new prediction by adding up the weight (of each tree) multiply the prediction (of each tree). Obviously, the tree with higher weight will have more power of influence the final decision.

## Gradient Boosting

Gradient boosting is another boosting model. Remember, boosting model's key is learning from the previous mistakes.

Gradient Boosting learns from the mistake - residual error directly, rather than update the weights of data points.

**Step 1:** Train a decision tree

**Step 2:** Apply the decison tree just trained to predict

**Step 3:** Calculate the residual of this decision tree, save residual errors as the new y

**Step 4:** Repeat Step1 (until the number of trees we set to train is reached)

**Step 5:** Make the final prediction

The Gradient Boosting makes a new prediction by simply adding up the predictions (of all trees).

# Implementation in Python Sklearn

In [1]:
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier

In [24]:
X, y = make_moons(n_samples = 10000, noise = .5, random_state = 0)

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

In [26]:
# Decision Tree model

dt_clf = DecisionTreeClassifier()
dt_clf.fit(X_train, y_train)
y_pred = dt_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.7555

In [27]:
# Random Forest model

rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.796

In [28]:
# Adaboost model

ab_clf = AdaBoostClassifier()
ab_clf.fit(X_train, y_train)
y_pred = ab_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.832

In [29]:
# Gradient boosting model

gd_clf = GradientBoostingClassifier()
gd_clf.fit(X_train, y_train)
y_pred = gd_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.8335