# Ensemble Learning

"Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone."

"Ensemble learning trains two or more Machine Learning algorithms to a specific classification or regression task. The algorithms within the ensemble learning model are generally referred as "base models", "base learners" or "weak learners" in literature. The base models can be constructed using a single modelling algorithm or several different algorithms. The idea is train a diverse collection of weak performing models to the same modelling task. As a result, the predicted or classified outcomes of each weak learner have poor predictive ability (high bias, i.e. high model errors) and among the collection of all weak learners the outcome and error values exhibit high variance. **Fundamentally, an ensemble learning model trains many (at least 2) high-bias (weak) and high-variance (diverse) models to be combined into a stronger and better performing model.**"

Random forest is an example of ensemble learning (it uses *bagging* to implement)

- *Boosting* is an alternate technique where each subsequent model in the ensemble boosts attributes that address data mis-classified bu the previous model

- A *bucket of models* trains several models using training data and picks the one that owrks best with the test data

- *Stacking* runs multiple models at once on the data and combines the results together (obs: this is how netflix prize was won)


## Optimal ways to do it: Advanced Ensemble Learning

Bayes Optimal Classifier:

- Theoretically the best but almost impractical

Bayesian Parameter Averaging:

- Attempts to make BOC practical but it's still misunderstood, susceptible to overfitting and often outperformed by the simpler bagging approach

Bayesian Model Combination:

- Tries to address all of those problems

- But it the end, it's about the same as using cross-validation to find the best combinations of models

# XGBoost

XGBoost - Extreme gradient Boosted Trees

## Features of XGBoost

- Regularized boosting to prevent overfitting

- Handle missing values automatically

- Parallel processing

- Can cross-validate at each iteration

- Incremental training: stop training the model, save it and pick it up later

- Can plug in your own optimization objectives

- Tree pruning (goes futher with the tree and after comes back pruning the tree): generally results in deeper but optimized trees

## Using XGBoost

Uses DMatrix structure to hold features and labels (can create with numpy array)

All parameters are passed via dictionary

## XGBoost Hyperparameters

- Booster: gbtree or gblinear

- Objective function: multi:softmax, multi:softprob

- ETA: learning rate - adjusts weights on each step

- Max_depth: depth of the tree

- Min_child_weight: can control overfitting but too high will underfit

In [1]:
from sklearn.datasets import load_iris

iris = load_iris()

iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [2]:
numSamples, numFeatures = iris.data.shape

In [4]:
print(numSamples), print(numFeatures), print(list(iris.target_names))

150
4
['setosa', 'versicolor', 'virginica']


(None, None, None)

In [9]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [11]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(iris.data,iris.target, test_size = 0.2, random_state = 0)

In [12]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((120, 4), (30, 4), (120,), (30,))

In [13]:
import xgboost as xgb

In [14]:
train = xgb.DMatrix(x_train,label=y_train)
test = xgb.DMatrix(x_test,label=y_test)

In [15]:
param = {'max_depth': 4,
         'eta': 0.3,
         'objective': 'multi:softmax',
         'num_class': 3}
epochs = 10

In [16]:
model = xgb.train(param,train, epochs)

In [17]:
model.predict(test)

array([2., 1., 0., 2., 0., 2., 0., 1., 1., 1., 2., 1., 1., 1., 1., 0., 1.,
       1., 0., 0., 2., 1., 0., 0., 2., 0., 0., 1., 1., 0.], dtype=float32)

In [18]:
from sklearn.metrics import accuracy_score

In [20]:
accuracy_score(y_test,model.predict(test))

1.0