Skip to content

Commit

Permalink
DOC Improve description of l2_regularization for hgbt models (#28652)
Browse files Browse the repository at this point in the history
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
  • Loading branch information
4 people committed Apr 8, 2024
1 parent 2f5361a commit 016670e
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 7 deletions.
53 changes: 49 additions & 4 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,54 @@ The size of the trees can be controlled through the ``max_leaf_nodes``,
``max_depth``, and ``min_samples_leaf`` parameters.

The number of bins used to bin the data is controlled with the ``max_bins``
parameter. Using less bins acts as a form of regularization. It is
generally recommended to use as many bins as possible (256), which is the default.
parameter. Using less bins acts as a form of regularization. It is generally
recommended to use as many bins as possible (255), which is the default.

The ``l2_regularization`` parameter is a regularizer on the loss function and
corresponds to :math:`\lambda` in equation (2) of [XGBoost]_.
The ``l2_regularization`` parameter acts as a regularizer for the loss function,
and corresponds to :math:`\lambda` in the following expression (see equation (2)
in [XGBoost]_):

.. math::
\mathcal{L}(\phi) = \sum_i l(\hat{y}_i, y_i) + \frac12 \sum_k \lambda ||w_k||^2
|details-start|
**Details on l2 regularization**:
|details-split|

It is important to notice that the loss term :math:`l(\hat{y}_i, y_i)` describes
only half of the actual loss function except for the pinball loss and absolute
error.

The index :math:`k` refers to the k-th tree in the ensemble of trees. In the
case of regression and binary classification, gradient boosting models grow one
tree per iteration, then :math:`k` runs up to `max_iter`. In the case of
multiclass classification problems, the maximal value of the index :math:`k` is
`n_classes` :math:`\times` `max_iter`.

If :math:`T_k` denotes the number of leaves in the k-th tree, then :math:`w_k`
is a vector of length :math:`T_k`, which contains the leaf values of the form `w
= -sum_gradient / (sum_hessian + l2_regularization)` (see equation (5) in
[XGBoost]_).

The leaf values :math:`w_k` are derived by dividing the sum of the gradients of
the loss function by the combined sum of hessians. Adding the regularization to
the denominator penalizes the leaves with small hessians (flat regions),
resulting in smaller updates. Those :math:`w_k` values contribute then to the
model's prediction for a given input that ends up in the corresponding leaf. The
final prediction is the sum of the base prediction and the contributions from
each tree. The result of that sum is then transformed by the inverse link
function depending on the choice of the loss function (see
:ref:`gradient_boosting_formulation`).

Notice that the original paper [XGBoost]_ introduces a term :math:`\gamma\sum_k
T_k` that penalizes the number of leaves (making it a smooth version of
`max_leaf_nodes`) not presented here as it is not implemented in scikit-learn;
whereas :math:`\lambda` penalizes the magnitude of the individual tree
predictions before being rescaled by the learning rate, see
:ref:`gradient_boosting_shrinkage`.

|details-end|

Note that **early-stopping is enabled by default if the number of samples is
larger than 10,000**. The early-stopping behaviour is controlled via the
Expand Down Expand Up @@ -594,6 +637,8 @@ The parameter ``max_leaf_nodes`` corresponds to the variable ``J`` in the
chapter on gradient boosting in [Friedman2001]_ and is related to the parameter
``interaction.depth`` in R's gbm package where ``max_leaf_nodes == interaction.depth + 1`` .

.. _gradient_boosting_formulation:

Mathematical formulation
^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
6 changes: 4 additions & 2 deletions sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -1483,7 +1483,8 @@ class HistGradientBoostingRegressor(RegressorMixin, BaseHistGradientBoosting):
than a few hundred samples, it is recommended to lower this value
since only very shallow trees would be built.
l2_regularization : float, default=0
The L2 regularization parameter. Use ``0`` for no regularization (default).
The L2 regularization parameter penalizing leaves with small hessians.
Use ``0`` for no regularization (default).
max_features : float, default=1.0
Proportion of randomly chosen features in each and every node split.
This is a form of regularization, smaller values make the trees weaker
Expand Down Expand Up @@ -1859,7 +1860,8 @@ class HistGradientBoostingClassifier(ClassifierMixin, BaseHistGradientBoosting):
than a few hundred samples, it is recommended to lower this value
since only very shallow trees would be built.
l2_regularization : float, default=0
The L2 regularization parameter. Use ``0`` for no regularization (default).
The L2 regularization parameter penalizing leaves with small hessians.
Use ``0`` for no regularization (default).
max_features : float, default=1.0
Proportion of randomly chosen features in each and every node split.
This is a form of regularization, smaller values make the trees weaker
Expand Down
3 changes: 2 additions & 1 deletion sklearn/ensemble/_hist_gradient_boosting/grower.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ class TreeGrower:
interaction_cst : list of sets of integers, default=None
List of interaction constraints.
l2_regularization : float, default=0.
The L2 regularization parameter.
The L2 regularization parameter penalizing leaves with small hessians.
Use ``0`` for no regularization (default).
feature_fraction_per_split : float, default=1
Proportion of randomly chosen features in each and every node split.
This is a form of regularization, smaller values make the trees weaker
Expand Down

0 comments on commit 016670e

Please sign in to comment.