# Ensemble Learning

An Ensemble is a group of predicitors(classifiers or regressors) that work together to make a prediction. An Emsemble of decision trees is called a random forest

## Voting Classifiers

If you have a few classifiers with decent results for each, a way you can make a better classifier is if you combine the results using a <b><i>Hard Voting Classifier</i></b> which takes the result of each clf and the final predictied output is the class with the most votes

<b>Note</b><br>
In order to make a good voting classifiers, the classifiers that are combined should be independent of each other, meaning different algorithms. This increases the accuracy of the Voting Clf. 

In [1]:
from sklearn.datasets import make_moons

from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

voting_clf = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42)),
        ('rf', RandomForestClassifier(random_state=42)),
        ('svc', SVC(random_state=42))
    ]
)

voting_clf.fit(X_train, y_train)

In [2]:
# Find accuracy of each clf
for name, clf in voting_clf.named_estimators_.items():
    print(name, ' = ', clf.score(X_test, y_test))

lr  =  0.864
rf  =  0.896
svc  =  0.896


In [3]:
voting_clf.predict(X_test[:1])

array([1])

In [4]:
[clf.predict(X_test[:1]) for clf in voting_clf.estimators_]

[array([1]), array([1]), array([0])]

In [5]:
voting_clf.score(X_test, y_test)

0.912

<i><b>Soft Voting</b></i> is when you take an average of the probabilities instead of a result and use that to determine predicited output

In [6]:
voting_clf.voting = "soft"
voting_clf.named_estimators['svc'].probability = True
voting_clf.fit(X_train, y_train)
voting_clf.score(X_train, y_train)

0.9466666666666667

## Bagging and Pasting

<b>Bagging</b> is when you use many of the same training algorithms but on different subsets of the training data <i>with <b>Replacing</b></i>. Without Replacing is called <b>Pasting</b>. Then you combine the clfs using something like Hard Voting.

One of the reasons that Bagging Or Pasting is used is because the different sub estimators can be trained in Parallel which makes them scale better.

In [7]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500,
                           max_samples=100, n_jobs=-1, random_state=42)
bag_clf.fit(X_train,y_train)

In [8]:
bag_clf = BaggingClassifier(DecisionTreeClassifier(), n_estimators=500,
                           oob_score=True, n_jobs=-1, random_state=42)
bag_clf.fit(X_train,y_train)
bag_clf.oob_score_

0.896

^ This means that the bag_clf should have a similar accuracy on the test set

In [9]:
from sklearn.metrics import accuracy_score
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.92

With the BaggingClassifier you can also sample the features as well. In sklearn you use the max_features and bootstrap_features to do that <br>
This can be useful when you have datasets with high-dimensional inputs(like images) because it can speed up the training process

## Random Forests

<b>Random Forests</b> is an ensemble of decision trees, generally trained via the bagging method. 

In [10]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, 
                                n_jobs=-1, random_state=42)

rnd_clf.fit(X_train,y_train)

y_pred_rf = rnd_clf.predict(X_test)

RandomForestClassifier has all the hyperparameters of a DecsionTreeClassifier and of a BaggingClassifier.

<b>Extra-Trees</b> are Extremely Randomized Trees meaning that random features with random threasholds are picked for the trees. This makes training process faster.

A major benefit of random forests is they allow you to see important features of a dataset

In [11]:
from sklearn.datasets import load_iris
iris = load_iris(as_frame=False)
rnd_clf = RandomForestClassifier(n_estimators=500, random_state=42)
rnd_clf.fit(iris.data, iris.target)
for score, name in zip(rnd_clf.feature_importances_, iris.feature_names):
    print(round(score, 2), name)

0.11 sepal length (cm)
0.02 sepal width (cm)
0.44 petal length (cm)
0.42 petal width (cm)


## Boosting

<b>Boosting</b> refers to an ensemble method that can combine several weak learners into a strong learner. The general idea of most boosting methods is to train predictors sequentially, each trying to correct the previous model

### AdaBoost (Adaptive Boosting)

AdaBoosting works by focusing on the training instanses that the previous model underfit.

In [12]:
from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), n_estimators=30,
    learning_rate=0.5, random_state=42
)
ada_clf.fit(X_train,y_train)



In [13]:
y_pred = ada_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.904

### Gradient Boosting

Gradient Boosting work similarily to AdaBoost but instead of tweaking the weights at every iteration, it fits the new predictor to the residual errors made by the previous predictor.

In [17]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor

np.random.seed(42)
X = np.random.rand(100, 1) - 0.5
y = 3 * X[:, 0] ** 2 + 0.05 * np.random.randn(100)

tree_reg1 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg1.fit(X, y)

In [18]:
y2 = y - tree_reg1.predict(X)
tree_reg2 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg2.fit(X, y2)

In [19]:
y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg3.fit(X,y3)

In [20]:
X_new = np.array([[-0.4], [0.], [0.5]])
sum(tree.predict(X_new) for tree in (tree_reg1, tree_reg2, tree_reg3))

array([0.49484029, 0.04021166, 0.75026781])

In [21]:
from sklearn.ensemble import GradientBoostingRegressor

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3,
                                learning_rate=1.0, random_state=42)
gbrt.fit(X,y)

In [22]:
# Early Stopping with GBRT
gbrt_best = GradientBoostingRegressor(
    max_depth=2, learning_rate=0.05, n_estimators=500,
    n_iter_no_change=10, random_state=42
)
gbrt_best.fit(X,y)

In [24]:
gbrt_best.n_estimators_

92

### Histogram-Based Gradient Boosting

HGB is a more optimized algorithm for large datasets.

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.preprocessing import OrdinalEncoder

hgb_reg = make_pipeline(
    make_column_transformer((OrdinalEncoder(), ['ocean_proximity']),
                           remainder='passthrough'),
    HistGradientBoostingRegressor(categorical_features=[0], random_state=42)
)
hgb_reg.fit(housing, housing_labels)

## Stacking

Instead of having a bunch of models vote on the prediction, just make a model that can handle the results of these models. This could be called a blender

In [26]:
from sklearn.ensemble import StackingClassifier

stacking_clf = StackingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42)),
        ('rf', RandomForestClassifier(random_state=42)),
        ('svc', SVC(probability=True, random_state=42)),
    ],
    final_estimator=RandomForestClassifier(random_state=43),
    cv=5
)
stacking_clf.fit(X_train,y_train)