ENH: Add edge keyword argument to digitize #16937

alexrockhill · 2020-07-23T18:54:38Z

Adding an edge keyword allows a simpler syntax for passing in bins while including edge cases that would otherwise provoke slightly weird behavior as in

x = [1, 2, 3]
bins = [0, 1.5, 3]

resulting in

np.digitize = [1, 2, 3]

when users might prefer

np.digitize = [1, 2, 2]

as they only passed two ranges and all the values were within the ranges including edge cases.

alexrockhill · 2020-07-23T19:05:06Z

Ok so my solution to the overflow was to cast integers to 64 bit which will only fail if the greatest bin is 2 ** 63 - 1 but it will raise a runtime error.

eric-wieser · 2020-07-23T19:40:26Z

What are your thoughts on my proposal in #16933?

numpy/lib/function_base.py

alexrockhill · 2020-07-24T00:33:18Z

What are your thoughts on my proposal in #16933?

Yeah that would be a good solution. Then the edge argument could just be converted to the boolean list to pass to searchsorted.

eric-wieser · 2020-07-24T06:43:09Z

numpy/lib/function_base.py

+        idx = -1 if delta == mono else 0
+        if np.issubdtype(bins.dtype, _nx.integer):
+            bins = _nx.int64(bins)
+            if abs(bins[idx]) >= 2 ** 63 - 1:


This rejects uint64 2**64-2, but should not.

_nx.int64(2 ** 63) throws an error but _nx.int64(2 ** 63 - 1) does not so as far as I know that's the top of the range. That seems like a pretty big range of acceptable numbers, I think that should do it if that's the range of int64, not sure it would be worth extending to larger numbers unless there is some particular reason.

My comment is about uint64

I'm running into some real funny behavior with bins = np.uint64([0, 2**64 - 2]) where bins[-1] += 1 gives [0, 0] as does bins[-1] -= 1. And so does bins[-1] = np.uint64(bins[-1] + 1). Very strange.

This should definitely not be figured out in np.digitize and should be it's own issue. For digitize, I'd be in favor of just leaving it as is and not supporting the very narrow use-case of unsigned integers above 2 ** 63 at least until that gets figured out elsewhere.

Right, the issue you're seeing is that 1 is of type int64. bins[i] += np.uint64(1) should work for the uint64 case, although will fail for the int64 case.

and not supporting the very narrow use-case of unsigned integers above 2 ** 63

I'd wager that anyone using uint64 probably wants this extra range.

Thanks for the help, okay now this should work for uint64

eric-wieser

I really don't think we want to take the approach of fiddling with inputs to searchsorted, when it would be far less error-prone to just handle the flag inside search-sorted.

eric-wieser · 2020-08-10T15:35:09Z

numpy/lib/tests/test_function_base.py

@@ -1740,13 +1740,18 @@ def test_large_integers_increasing(self):
        # gh-11022
        x = 2**54  # loses precision in a float
        assert_equal(np.digitize(x, [x - 1, x + 1]), 1)
+        assert_equal(np.digitize(x, [x - 1, x + 1], False, True), 1)
+        with assert_raises(RuntimeError):
+            np.digitize(x, [x - 1, 2 ** 63 - 1], False, True)


While it's better than producing an incorrect result, I see no reason why we should want this to fail, 1 is clearly a correct result. I suppose NotImplementedError would at least convey that this is our fault not the callers, but I don't really like that approach.

Additionally, the following should not fail, but does:

np.digitize(np.uint64(x), np.array([x - 1, 2 ** 63 - 1], dtype=np.uint64), False, True)

I agree that it would be nice to support that but I tried implementing a branching logic for uint64 but numpy behaved very funny with the large numbers (see above).

Alex added 4 commits July 23, 2020 12:09

attempt at fix, change docs to rst

0e4d46c

fixed int overflow

f85efe7

changed release note

17975bc

added tests for RunTimeError

5666880

alexrockhill force-pushed the edge branch from 3324c7d to 5666880 Compare July 23, 2020 19:10

eric-wieser reviewed Jul 23, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

Alex added 2 commits July 23, 2020 17:44

fixed overflow issues

b64488b

changed to abs

0099f65

eric-wieser reviewed Jul 24, 2020

View reviewed changes

charris changed the title ~~MRG, ENH: added edge keyword argument to digitize~~ ENH: Add edge keyword argument to digitize Jul 25, 2020

charris added 01 - Enhancement component: numpy.lib labels Jul 25, 2020

moved int64 conversion so error is in front

7b65a9f

eric-wieser requested changes Aug 10, 2020

View reviewed changes

added uint64 support

2c3dad8

Base automatically changed from master to main March 4, 2021 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add edge keyword argument to digitize #16937

ENH: Add edge keyword argument to digitize #16937

alexrockhill commented Jul 23, 2020

alexrockhill commented Jul 23, 2020

eric-wieser commented Jul 23, 2020

alexrockhill commented Jul 24, 2020

eric-wieser Jul 24, 2020

alexrockhill Aug 4, 2020

eric-wieser Aug 4, 2020

alexrockhill Aug 10, 2020

eric-wieser Aug 10, 2020 •

edited

Loading

alexrockhill Aug 10, 2020

eric-wieser left a comment

eric-wieser Aug 10, 2020 •

edited

Loading

alexrockhill Aug 10, 2020

ENH: Add edge keyword argument to digitize #16937

Are you sure you want to change the base?

ENH: Add edge keyword argument to digitize #16937

Conversation

alexrockhill commented Jul 23, 2020

alexrockhill commented Jul 23, 2020

eric-wieser commented Jul 23, 2020

alexrockhill commented Jul 24, 2020

eric-wieser Jul 24, 2020

Choose a reason for hiding this comment

alexrockhill Aug 4, 2020

Choose a reason for hiding this comment

eric-wieser Aug 4, 2020

Choose a reason for hiding this comment

alexrockhill Aug 10, 2020

Choose a reason for hiding this comment

eric-wieser Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

alexrockhill Aug 10, 2020

Choose a reason for hiding this comment

eric-wieser left a comment

Choose a reason for hiding this comment

eric-wieser Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

alexrockhill Aug 10, 2020

Choose a reason for hiding this comment

eric-wieser Aug 10, 2020 •

edited

Loading

eric-wieser Aug 10, 2020 •

edited

Loading