### 1.11.5 Histogram-Based Gradient Boosting

Scikit-learn has two implementations of histogram-based gradient boosters:
* `HistGradientBoostingClassifier`
* `HistGradientBoostingRegressor`

Histogram-based estimators can be **orders of magnitude** faster than other boosting classifiers, when the number of samples is larger than tens of thousands.

###### Example - Partial Dependence and Individual Conditional Expectation Plots
https://scikit-learn.org/stable/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py

#### 1.11.5.1 Usage

Most parameters are same as `GradientBoostingClassifier` and `GradientBoostingRegressor` except `max_iter` parameter replaces `n_estimators`, controlling the number of iterations

In [1]:
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_hastie_10_2

In [2]:
X, y = make_hastie_10_2(random_state=0)

X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]

In [3]:
clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)

clf.score(X_test, y_test)

0.8965

For histogram-gradient boosting algorithms, the number of bins used to bin the data is controlled by the `max_bins` parameter. It's generally recommended to use as many bins as possible, which is also the default setting

Note: **early stopping is enabled by default if number of samples is greater than 10,000**

#### 1.11.5.2 Missing values support
The histogram gradient boosting algorithms have built-in support for missing values (NaNs). During training, the tree grower learns at each split point whether samples with missing values should go left or rightt based on the potential gain

Example:

In [4]:
from sklearn.ensemble import HistGradientBoostingClassifier
import numpy as np

In [6]:
X = np.array([0,1,2,np.nan]).reshape(-1,1)
y = [0,0,1,1]

gbdt = HistGradientBoostingClassifier(min_samples_leaf=1).fit(X,y)
gbdt.predict(X)

array([0, 0, 1, 1])

When the missingness pattern is predictive, the splits can be done on whether the feature value is missing or not:

In [7]:
X = np.array([0,1,2,np.nan]).reshape(-1,1)
y = [0,0,1,1]

gbdt = HistGradientBoostingClassifier(min_samples_leaf=1, max_depth=2,
                                     learning_rate=1, max_iter=1)

gbdt.fit(X,y)
gbdt.predict(X)

array([0, 0, 1, 1])