# Gradient Boosting

Gradient Boosting Tree is another ensemble method of decision tree.

It makes trees to complement error of previous tree. This is difference with Random Forest.

Fundamentally, there is not randomness in gradient boosting.

Making **Loss Function** and Optimizing by **Gradient Descent**.


### Parameters

- learning_rate (of gradient descent)

In [1]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=0)

In [2]:
gbrt = GradientBoostingClassifier(random_state=0)
gbrt.fit(X_train, y_train)

print("Train set accuracy: {:.3f}".format(gbrt.score(X_train, y_train)))
print("Test set accuracy: {:.3f}".format(gbrt.score(X_test, y_test)))

Train set accuracy: 1.000
Test set accuracy: 0.958


It is overfitted.

To avoid this, limitation on max depth (pre-prunning), or learning rate can be used.

In [3]:
gbrt = GradientBoostingClassifier(random_state=0, max_depth=1)
gbrt.fit(X_train, y_train)

print("Train set accuracy: {:.3f}".format(gbrt.score(X_train, y_train)))
print("Test set accuracy: {:.3f}".format(gbrt.score(X_test, y_test)))

Train set accuracy: 0.991
Test set accuracy: 0.972


In [4]:
gbrt = GradientBoostingClassifier(random_state=0, learning_rate=0.01)
gbrt.fit(X_train, y_train)

print("Train set accuracy: {:.3f}".format(gbrt.score(X_train, y_train)))
print("Test set accuracy: {:.3f}".format(gbrt.score(X_test, y_test)))

Train set accuracy: 0.988
Test set accuracy: 0.965


Usually, first thing to do is Random Forest which is common to use.

But for squeezing performance of model, Gradient Boosting can be used.

And **xgboost** is more preferable in real machine learning problem.

## Tip for parameter tune.

n_estimators and learning_rate are mainly influence on model.

In Random Forest, the n_estimators is bigger, the better.

But in gradient boosting, it cause overfitting. So, after set proper n_estimators, find proper learning rate.


Commonly, max_depth(or max_leaf_nodes) is set with very small number in gradient boosting(<= 5).