ENH: add weighted quantile for inverted_cdf #24254

lorentzenchr · 2023-07-24T20:25:19Z

Partially solves #8935.

This PR adds support for weights in np.quantile similar to np.average(..., weights=w). As an uncontroversial start, only method="inverted_cdf" is implemented for weights support.

See #8935 (comment) for details.

lorentzenchr · 2023-08-03T05:23:01Z

@seberg friendly ping. I know numpy 2.0 requires resources, but maybe you could have a quick look. The underlying issue is very old with a ton of discussions. (And I need help to fix the emscripten CI error.)

seberg

I am even here unsure whether this should be called weights or something more clear? For inverted_cdf I guess it doesn't matter (unless someone expects more than a single datapoint being mixed in the final step if its weight < 1, but I am not sure that can be argued to even make sense).

seberg · 2023-08-03T11:26:56Z

numpy/lib/function_base.py

+        # setup wgt to broadcast along axis
+        wgt = np.broadcast_to(wgt, (a.ndim-1)*(1,) + wgt.shape)
+        wgt = wgt.swapaxes(-1, axis)
+    return wgt


Interesting logic, but prior art in either case.

Why interesting? I just shifted these lines of code out of average for reuse in quantile.

The logic tries to add convenience for an N-D input with with axis and 1-D (or other input) and I find that very dubious guessing of user intent with relatively little gain (it isn't hard to make sure weights and data are properly broadcast/aligned) and potential for confusion.
(The potential for terrible bugs where you expect swapping of weights to happen but it doesn't because shapes happen to be identical seems possible but unlikely.)

Again, that code is not new or from me. I just took these lines out of np.average for reuse in np.quantile.

numpy/lib/function_base.py

seberg · 2023-08-03T11:38:58Z

numpy/lib/function_base.py

+            weights = np.moveaxis(weights, axis, destination=0)
+        index_array = np.argsort(arr, axis=axis, kind="stable")
+        # TODO: Which value to set to slices_having_nans.
+        slices_having_nans = None


Just [-1] after sorting.

numpy/lib/function_base.py

numpy/lib/tests/test_function_base.py

lorentzenchr · 2023-08-04T09:50:35Z

@seberg Thank you very much for having a look!

lorentzenchr · 2023-08-07T08:24:38Z

Do I have to add tests for n-dim data and that axis works correctly or is this done somewhere automatically?

lorentzenchr · 2023-09-04T17:38:30Z

@seberg I put more effort into it to also test several ndim of input. What should I do with _function_base_impl.pyi?
Can you also help with some of the failing tests? Locally with my laptop, they all pass.

numpy/lib/tests/test_function_base.py

numpy/lib/_function_base_impl.pyi

lorentzenchr · 2024-01-12T16:58:26Z

@ogrisel uncovered some shortcomings that I'm trying to solve, in particular the out parameter.

I'm stuck because of the following error:

import numpy as np

np.take(np.arange(10), np.array([1, 4]), out=np.zeros(2, dtype=float))

TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

How can I write the content of some slice of an int64 array into a float64 array?

lorentzenchr

~~I'm currently stuck - and I already thought the PR is ready...~~

EDIT: I finally solved it in fc0b5b7 and some minor fixes in 2d96b56.
I think this warrants for another proper review as those changes are not trivial at all.

numpy/lib/_function_base_impl.py

- cast cdf to dtype of quantile to avoid surprises - Convert scalar python objects to pure python objects as in unweighted case - Extend weighted case to test_linear_interpolation - Extend weighted case to test_percentile_out

lorentzenchr · 2024-01-15T12:06:12Z

This is now ready for a - hopfully - final review. Test coverage is much extended and corner cases like nan handling are now fixed (and tested), no todo left.

Edit: And I like the 100% 🟢 of the current CI.

lorentzenchr · 2024-01-16T20:28:04Z

@seberg Do you think this could get in numpy 2.0? That would be my hope.

seberg · 2024-01-22T09:25:08Z

Thanks, @lorentzenchr and @ogrisel for reviewing. The new changes look good, and the new tests were really needed. I did have a look at the shape logic, and it seems really quite clean to me, only comments I added were so small, that there would be no point in following up.

Since voiced anything about the added API here or on the mailing list. Putting this in.

lorentzenchr added 4 commits July 24, 2023 22:14

ENH add _weights_are_valid

28162b1

ENH use _weights_are_valid in np.average

4b079c8

TST add test_quantile_constant_weights

7f10b12

ENH add weighted quantile for inverted_cdf

0fce6b6

github-actions bot added the 01 - Enhancement label Jul 24, 2023

lorentzenchr changed the title ~~ENH add weighted quantile for inverted_cdf~~ ENH: add weighted quantile for inverted_cdf Jul 24, 2023

lorentzenchr added 6 commits July 24, 2023 22:33

Merge branch 'main' into weighted_quantile_inverted_cdf

aaac227

CLN satisfy linter

3c979bb

ENH add weights to nanquantile

bad4032

CLN make linter happy

d84654e

TST add test_quantile_with_integer_weights

24f758f

CLN remove blank line

cb48fd1

seberg reviewed Aug 3, 2023

View reviewed changes

lorentzenchr added 5 commits August 4, 2023 11:18

ENH replace take_along_axis with normal indexing

eee5455

ENH use cumsum directly to calc CDF

cff8df3

FIX try np.intp to fix emscripten CI

fa2e54a

FIX try np.int32 to fix CI error for np.repeat

93407f7

Merge branch 'main' into weighted_quantile_inverted_cdf

08f7066

TST add test for raising errors

7c3bec8

lorentzenchr added 4 commits September 4, 2023 13:46

Merge branch 'main' into weighted_quantile_inverted_cdf

f84a59c

TST add test_quantile_with_weights_and_axis

574ea98

CLN make linter happy again

e705958

ENH add weights to percentile

c4ca281

TYP add weights to _function_base_impl.pyi

bf9a767

seberg reviewed Sep 6, 2023

View reviewed changes

numpy/lib/tests/test_function_base.py Outdated Show resolved Hide resolved

BvB93 reviewed Sep 6, 2023

View reviewed changes

numpy/lib/_function_base_impl.pyi Show resolved Hide resolved

lorentzenchr added 4 commits January 12, 2024 10:22

FIX axis before _weights_are_valid

b2aeeac

FIX axis in weighted quantile

af58619

FIX weights in _percentile_dispatcher

423b401

CLN replace np.s_[..., ] by (...,)

4f7841a

lorentzenchr added 2 commits January 12, 2024 22:42

TST add weighted case to test_percentile_out

7cc8771

CLN make out argument trivial but inefficient

5aa30c9

lorentzenchr commented Jan 12, 2024

View reviewed changes

numpy/lib/_function_base_impl.py Outdated Show resolved Hide resolved

seberg mentioned this pull request Jan 13, 2024

ENH: Add weights option to numpy.quantile #16862

Closed

lorentzenchr added 10 commits January 14, 2024 12:38

FIX quantile output shape

fc0b5b7

FIX & TST

2d96b56

- cast cdf to dtype of quantile to avoid surprises - Convert scalar python objects to pure python objects as in unweighted case - Extend weighted case to test_linear_interpolation - Extend weighted case to test_percentile_out

CLN repacify linter

5f93d91

ENH make out efficient again

5357c77

FIX nanquantile and nanpercentile

5581292

TST add tests for nanquantile and nanpercentile

0bbe7d9

FIX nan handling with slices_having_nans

63c3536

CLN fix BE/AE inconsistency

798845f

CLN tame the linter

01b6ad4

CLN linter again

bbdd595

seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 22, 2024

seberg merged commit 89eac8b into numpy:main Jan 22, 2024
63 checks passed

lorentzenchr deleted the weighted_quantile_inverted_cdf branch January 24, 2024 16:42

lorentzenchr mentioned this pull request Jan 24, 2024

DOC: fix docstring of quantile and percentile #25678

Merged

jakevdp mentioned this pull request Jan 24, 2024

Test: add weights to unsupported arguments google/jax#19503

Merged

lorentzenchr mentioned this pull request Mar 14, 2024

Weighted quantile option in nanpercentile() #8935

Closed

adrinjalali mentioned this pull request Apr 15, 2024

ENH improve _weighted_percentile to provide several interpolation scikit-learn/scikit-learn#17768

Closed

mdhaber mentioned this pull request May 3, 2024

DOC: quantile: correct/simplify documentation #25704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add weighted quantile for inverted_cdf #24254

ENH: add weighted quantile for inverted_cdf #24254

lorentzenchr commented Jul 24, 2023

lorentzenchr commented Aug 3, 2023

seberg left a comment

seberg Aug 3, 2023

lorentzenchr Aug 4, 2023

seberg Aug 4, 2023

lorentzenchr Sep 4, 2023

seberg Aug 3, 2023

lorentzenchr commented Aug 4, 2023

lorentzenchr commented Aug 7, 2023

lorentzenchr commented Sep 4, 2023

lorentzenchr commented Jan 12, 2024

lorentzenchr left a comment •

edited

lorentzenchr commented Jan 15, 2024 •

edited

lorentzenchr commented Jan 16, 2024

seberg commented Jan 22, 2024

ENH: add weighted quantile for inverted_cdf #24254

ENH: add weighted quantile for inverted_cdf #24254

Conversation

lorentzenchr commented Jul 24, 2023

lorentzenchr commented Aug 3, 2023

seberg left a comment

Choose a reason for hiding this comment

seberg Aug 3, 2023

Choose a reason for hiding this comment

lorentzenchr Aug 4, 2023

Choose a reason for hiding this comment

seberg Aug 4, 2023

Choose a reason for hiding this comment

lorentzenchr Sep 4, 2023

Choose a reason for hiding this comment

seberg Aug 3, 2023

Choose a reason for hiding this comment

lorentzenchr commented Aug 4, 2023

lorentzenchr commented Aug 7, 2023

lorentzenchr commented Sep 4, 2023

lorentzenchr commented Jan 12, 2024

lorentzenchr left a comment • edited

Choose a reason for hiding this comment

lorentzenchr commented Jan 15, 2024 • edited

lorentzenchr commented Jan 16, 2024

seberg commented Jan 22, 2024

lorentzenchr left a comment •

edited

lorentzenchr commented Jan 15, 2024 •

edited