-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] Common Private Loss Module #19088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
3c3c059
ENH add common link function submodule
lorentzenchr 700f5bb
ENH add common loss function submodule
lorentzenchr b26a967
CLN replace deprecated np.int by int
lorentzenchr 062ff68
DOC document default=1 for n_threads
lorentzenchr d11fd5c
CLN comments and line wrapping
lorentzenchr 5f55187
CLN comments and doc
lorentzenchr da00b6f
BUG remove useless line of code
lorentzenchr 9488d37
CLN remove line that was commented out
lorentzenchr 18ff604
CLN nitpicks in comments and docstrings
lorentzenchr 8cdee57
ENH set NPY_NO_DEPRECATED_API
lorentzenchr f573a4f
MNT change NPY_1_13_API_VERSION to NPY_1_7_API_VERSION
lorentzenchr bf0bbe2
MNT comment out NPY_NO_DEPRECATED_API
lorentzenchr b6fbd76
TST restructure domain test cases
lorentzenchr c2397dc
DOC add losses to API reference
lorentzenchr 234cd0a
MNT add classes to __init__
lorentzenchr a0e3c7a
CLN fix import
lorentzenchr d42a524
DOC minor docstring changes
lorentzenchr c066bbe
TST prefer docstring over comment
lorentzenchr 009eee0
ENH define loss.is_multiclass
lorentzenchr df4bbb8
DOC fix typos
lorentzenchr 68c4468
CLN address review comments
lorentzenchr 09c28fa
DOC small docstring improvements
lorentzenchr d74418a
TST test more losses in test_specific_fit_intercept_only
lorentzenchr a5018f4
FIX test_loss_boundary
lorentzenchr 2871669
MNT apply black
lorentzenchr fc9e665
TST replace np.quantile by np.percentile
lorentzenchr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
""" | ||
The :mod:`sklearn._loss` module includes loss function classes suitable for | ||
fitting classification and regression tasks. | ||
""" | ||
|
||
from .loss import ( | ||
HalfSquaredError, | ||
AbsoluteError, | ||
PinballLoss, | ||
HalfPoissonLoss, | ||
HalfGammaLoss, | ||
HalfTweedieLoss, | ||
BinaryCrossEntropy, | ||
CategoricalCrossEntropy, | ||
) | ||
|
||
|
||
__all__ = [ | ||
"HalfSquaredError", | ||
"AbsoluteError", | ||
"PinballLoss", | ||
"HalfPoissonLoss", | ||
"HalfGammaLoss", | ||
"HalfTweedieLoss", | ||
"BinaryCrossEntropy", | ||
"CategoricalCrossEntropy", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# cython: language_level=3 | ||
|
||
import numpy as np | ||
cimport numpy as np | ||
|
||
np.import_array() | ||
|
||
|
||
# Fused types for y_true, y_pred, raw_prediction | ||
ctypedef fused Y_DTYPE_C: | ||
np.npy_float64 | ||
np.npy_float32 | ||
|
||
|
||
# Fused types for gradient and hessian | ||
ctypedef fused G_DTYPE_C: | ||
np.npy_float64 | ||
np.npy_float32 | ||
|
||
|
||
# Struct to return 2 doubles | ||
ctypedef struct double_pair: | ||
double val1 | ||
double val2 | ||
|
||
|
||
# C base class for loss functions | ||
cdef class cLossFunction: | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cHalfSquaredError(cLossFunction): | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cAbsoluteError(cLossFunction): | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cPinballLoss(cLossFunction): | ||
cdef readonly double quantile # readonly makes it inherited by children | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cHalfPoissonLoss(cLossFunction): | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cHalfGammaLoss(cLossFunction): | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cHalfTweedieLoss(cLossFunction): | ||
cdef readonly double power # readonly makes it inherited by children | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil | ||
|
||
|
||
cdef class cBinaryCrossEntropy(cLossFunction): | ||
cdef double closs(self, double y_true, double raw_prediction) nogil | ||
cdef double cgradient(self, double y_true, double raw_prediction) nogil | ||
cdef double_pair cgrad_hess(self, double y_true, double raw_prediction) nogil |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fused types have been defined in the code base for
KDTree
,BallTree
andDistMetrics
(here are their definitions).To ease maintenance on the long run, we might what to define as less new fused type as possible (I would favor using existing ones or cython's built-in fused types).
Are there particular reasons for using numpy's types?
Would using Cython's
floating
be possible here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason for numpy types other than copying from the
hist_gradient_boosting/common.pxd
.I do not use
cython.floating
as I want to have 2 different fused types:Y_DTYPE_C
fory_true
,raw_prediction
,sample_weight
G_DTYPE_C
forloss
,gradient
,hessian
Using only one fused type, all of them would always be forced to the same dtype. Use case is again the HGBT, where gradients and hessians are float32, whereas y_true is float64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Maybe it is worth indicating the motives here, explaining that defining two types allows working with more flexibility on inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading a bit in https://numpy.org/devdocs/reference/c-api/dtype.html, the
np.npy_float64
is really always a 64 bit floating point number, even on 32-bit systems, in contrast to plain Cdouble
which may be implementation specific. But I wonder if any C compiler that is used for scikit-learn does not comply to IEEE 754-1989.