New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_weighted_percentile does not lead to the same result than np.median #17370
Comments
ping @lucyleeow Since you already look at the code, you might have some intuition why this is the case. We should actually have the above test as a regression test for our |
Does |
yes numpy take |
|
pinging @NicolasHug @adrinjalali @ogrisel Making thing consistent will impact the Would be interesting to make the change and see which tests are breaking. |
For We take 'lower' percentile, as we use the default parameter We could amend In the end, I think the difference between 'lower' weighted percentile and interpolated weighted percentile would generally be small for large n sizes. Though, I guess the difference between the '2 middle values' would also affect the how different they are. |
For
default is 'linear', thus performs the same as |
I think that our |
|
Our np.median says
I do not know how to generalize this to account for sample weights. Therefore closing. |
While reviewing a test in #16937, it appears that our implementation of
_weighted_percentile
with unitsample_weight
will lead to a different result thannp.median
which is a bit problematic for consistency.In the gradient-boosting, it brakes the loss equivalence because the initial predictions are different. We could bypass this issue by always computing the median using
_weighted_percentile
there.The text was updated successfully, but these errors were encountered: