## **$\color{red}{\text{Ensemble Learning}}$**
### In general, is a model that makes predictions based on a number of different models. By combining individual models, the ensemble model tends to be more flexible (less bias) and les data-sensitive (less variance).

## **$\color{red}{\text{Two Most Popular ensemble methods:}}$**

### 1. Bagging - Training a bunch of individual models in a parallel way. Each model is trained by a random subset of the data.
### 2. Boosting - Training a bunch of individual models in a sequential way. Each individual model learns from mistakes made by the previous model.

In [2]:
#load library
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier

In [3]:
#create datasets
X, y = make_moons(n_samples = 10000, noise = 0.5, random_state = 58)

In [4]:
#split train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 85)

In [5]:
#fit a decision tree as comparision - no ensemble
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.738

## **$\color{red}{\text{Random Forest}}$**
### an ensemble model using bagging as the ensemble method and decision tree as the individual model. Each individual decision tree model uses a random subset of features to split the tree. 

### Step 1. Split the training dataset into n subsets.
### Step 2. Train n individual decision tree models in parallel. The optimal splits for each decision tree are based on a random subset of features. 
### Step 3. Each decision tree generates a prediction, independently. 
### Step 4. Voting - For each candidate in the test set, Random Forest uses the class with the majority vote as this candidate's final prediction. 

In [7]:
rf = RandomForestClassifier(n_estimators = 100, 
                            max_features = "auto",
                            random_state = 58)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
accuracy_score(y_test, y_pred)

0.79

In [15]:
import time
start = time.time()


rf = RandomForestClassifier(n_estimators = 100, 
                            max_features = "auto",
                            random_state = 58)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy = ", accuracy)


end = time.time()
print("Run Time = ", end - start, "seconds")

Accuracy =  0.79
Run Time =  0.9505100250244141 seconds


## **$\color{red}{\text{AdaBoost (Adaptive Boosting)}}$**

### a boosting ensemble model and works especially well with the decision tree. Boosting model's key is learning from the previous mistakes. 

### Step 1. Initialize the weight for datapoints.
### Step 2. Train a decision tree using entire train set.
### Step 3. Calculate the weighted error rate (e, measure of how many wrong predictions out of total and you treat the wrong)
### Step 4. Re-calculate decision tree's weights in the ensemble.
            Weight = LearningRate * log((1-e)/e)
            --> if the weighted error rate is high, lower decision power the tree will be given during voting
            --> if the weighted error rate is low, higher decision power the tree will be given during voting
### Step 5. Update weights 
            --> if the model gets prediction correctly, the weight stays the same;
            --> if the model gets prediction wrong, the updated weight = old_weight * np.exp(weight)
### Step 6. Repeat from step 2, until all models are trained
### Step 7. Final prediction - adding up the weight (of each tree) multiply the prediction (of each tree) based on power. 

In [8]:
ada = AdaBoostClassifier(n_estimators = 100)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
accuracy_score(y_test, y_pred)

0.8235

In [13]:
import time
start = time.time()


ada = AdaBoostClassifier(n_estimators = 100)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy = ", accuracy)


end = time.time()
print("Run Time = ", end - start, "seconds")

Accuracy =  0.8235
Run Time =  0.5093522071838379 seconds


## **$\color{red}{\text{Gradient Boosting}}$**
### It learns from previous model's mistake - residual errors directly, ranther than update the weights of datapoints.
            residual error = actual y - predicted 
### Step 1. Train a decision tree on entire training set
### Step 2. Apply the decision tree to get prediction results
### Step 3. Calculate residual errors (save residual errors as the new y)
### Step 4. Repeat from step 1.
### Step 5. Final prediction - make a new prediction by simply adding up the predictions of all trees.

In [9]:
gb = GradientBoostingClassifier(n_estimators = 100)
gb.fit(X_train, y_train)
y_pred = gb.predict(X_test)
accuracy_score(y_test, y_pred)

0.82

In [14]:
import time
start = time.time()


gb = GradientBoostingClassifier(n_estimators = 100)
gb.fit(X_train, y_train)
y_pred = gb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy = ", accuracy)


end = time.time()
print("Run Time = ", end - start, "seconds")

Accuracy =  0.82
Run Time =  0.3449513912200928 seconds


link: https://towardsdatascience.com/basic-ensemble-learning-random-forest-adaboost-gradient-boosting-step-by-step-explained-95d49d1e2725 