We will learn about the boosting methods in Sklearn, which enables building an ensemble model.
Boosting methods build ensemble model in an increment way. The main principle is to build the model incrementally by training each base model estimator sequentially. In order to build powerful ensemble, these methods basically combine several week learners which are sequentially trained over multiple iterations of training data. The sklearn.ensemble module is having following two boosting methods.

### AdaBoost

It is one of the most successful boosting ensemble method whose main key is in the way they give weights to the instances in dataset. That’s why the algorithm needs to pay less attention to the instances while constructing subsequent models.

#### Classification with AdaBoost

For creating a AdaBoost classifier, the Scikit-learn module provides sklearn.ensemble.AdaBoostClassifier. While building this classifier, the main parameter this module use is base_estimator. Here, base_estimator is the value of the base estimator from which the boosted ensemble is built. If we choose this parameter’s value to none then, the base estimator would be DecisionTreeClassifier(max_depth=1).

Implementation example

In the following example, we are building a AdaBoost classifier by using sklearn.ensemble.AdaBoostClassifier and also predicting and checking its score

In [1]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=10,n_informative=2, n_redundant=0,random_state=0, shuffle=False)
ADBclf = AdaBoostClassifier(n_estimators=100, random_state=0)
ADBclf.fit(X, y)

Once fitted, we can predict for new values as follows:

In [2]:
print(ADBclf.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))

[1]


In [3]:
ADBclf.score(X, y)

0.995

### Regression with AdaBoost

For creating a regressor with Ada Boost method, the Scikit-learn library provides sklearn.ensemble.AdaBoostRegressor. While building regressor, it will use the same parameters as used by sklearn.ensemble.AdaBoostClassifier

Implementation example

In the following example, we are building a AdaBoost regressor by using sklearn.ensemble.AdaBoostregressor and also predicting for new values by using predict() method.

In [5]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression

X, y = make_regression(n_features=10, n_informative=2, random_state=0, shuffle=False)
ADBoostReg = AdaBoostRegressor(random_state=0, n_estimators=100)
ADBoostReg.fit(X, y)


Once fitted we can predict from regression model as follows:

In [7]:
print(ADBoostReg.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))

[75.8528769]


### GRADIENT TREE BOOSTING

It is also called Gradient Boosted Regression Trees (GRBT). It is basically a generalization of boosting to arbitrary differentiable loss functions. It produces a prediction model in the form of an ensemble of weak prediction models. It can be used for the regression and classification problems. Their main advantage lies in the fact that they naturally handle the mixed type data.

**Classification with Gradient Tree Boost**

For creating a Gradient Tree Boost classifier, the Scikit-learn module provides sklearn.ensemble.GradientBoostingClassifier. While building this classifier, the main parameter this module use is ‘loss’. Here, ‘loss’ is the value of loss function to be optimized. If we choose loss = deviance, it refers to deviance for classification with probabilistic outputs.

On the other hand, if we choose this parameter’s value to exponential then it recovers the AdaBoost algorithm. The parameter n_estimators will control the number of week learners. A hyper-parameter named learning_rate (in the range of (0.0, 1.0]) will control overfitting via shrinkage.

Implementation example

In the following example, we are building a Gradient Boosting classifier by using sklearn.ensemble.GradientBoostingClassifier. We are fitting this classifier with 50 weak learners.

In [8]:
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:5000], X[5000:]
y_train, y_test = y[:5000], y[5000:]
GDBclf = GradientBoostingClassifier(n_estimators=50, learning_rate=1.0,max_depth=1, random_state=0).fit(X_train, y_train)
GDBclf.score(X_test, y_test)

0.8724285714285714

#### **Regression with Gradient Tree Boost**

For creating a regressor with Gradient Tree Boost method, the Scikit-learn library provides sklearn.ensemble.GradientBoostingRegressor. It can specify the loss function for regression via the parameter name loss. The default value for loss is ‘ls’.

**Implementation Example**

In the following example, we are building a Gradient Boosting regressor by using sklearn.ensemble.GradientBoostingregressor and also finding the mean squared error by using mean_squared_error() method.

In [13]:
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_friedman1(n_samples=2000, random_state=0, noise=1.0)
X_train, X_test = X[:1000], X[1000:]
y_train, y_test = y[:1000], y[1000:]

GDBreg = GradientBoostingRegressor(n_estimators=80, learning_rate=0.1, max_depth=1, random_state=0, loss='absolute_error')
GDBreg.fit(X_train, y_train)


In [14]:
mean_squared_error(y_test, GDBreg.predict(X_test))

5.52881888786075