-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
The calculations of the binomial deviance and its gradient in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py are bothering me. I could be missing something, but:
The deviance is calculated as log(1 + exp(-2 * y * pred))
here. This matches equation 10.18 on p 346 of Elements of Statistical Learning. However, that derivation assumes that y is {-1, 1} valued, whereas in sklearn y is {0, 1}. Effectively, the calculation is insensitive to pred
whenever y=0
. The fix is to change the return line to np.sum(np.logaddexp(0.0, -2 * (2 * y - 1) * pred)) / y.shape[0]
.
The calculation of the gradient makes sense to me if the pred
values map to class probabilities via P(y=1) = 1 / (1 + exp(-pred))
. However, the loss function calculation above seems to follow the convention that P(y=1) = 1 / (1 + exp(-2 * pred))
(again, see the link above). One way to make the two equations consistent with each other is to remove the first 2 in the above equation:
np.sum(np.logaddexp(0.0, -(2 * y - 1) * pred)) / y.shape[0]