ENH: Extend np.add ufunc to work with unicode and byte dtypes #24858

lysnikolaou · 2023-10-04T17:30:38Z

No description provided.

lysnikolaou · 2023-10-05T09:34:03Z

One question to ask ourselves here is whether we want to also implement loops for mixed dtypes like unicode + bytes, unicode + object, bytes + object, unicode + void and bytes + void.

mattip · 2023-10-05T10:18:22Z

I don't think the ufunc fastpath has anything to offer once object dtype is used. And I don't know how common it is to add string_ and bytes_, so my first instinct would be to keep this as simple as possible.

lysnikolaou · 2023-10-05T10:54:13Z

Here's another run of the benchmarks.

| Change   | Before [64fc516a] <main>   | After [6b970817] <string-ufuncs-add>   |   Ratio | Benchmark (Parameter)                               |
|----------|----------------------------|----------------------------------------|---------|-----------------------------------------------------|
| -        | 18.0±0.3μs                 | 3.04±0.05μs                            |    0.17 | bench_core.NumPyChar.time_add_small_list_big_string |
| -        | 3.29±0.01ms                | 9.77±0.5μs                             |    0    | bench_core.NumPyChar.time_add_big_list_small_string |

ngoldbaum · 2023-10-05T16:41:57Z

And I don't know how common it is to add string_ and bytes_, so my first instinct would be to keep this as simple as possible.

I agree, we shouldn't assume the encoding of bytes_ data, which you would need to do I think.

seberg

Just a couple of fly-by comments, since I had a look. But overall, it seems good.

numpy/core/src/umath/ufunc_type_resolution.c

numpy/core/src/umath/string_ufuncs.cpp

lysnikolaou · 2023-10-16T11:36:57Z

This appears to be ready to merge. Are there any more review comments for me to address?

charris · 2023-10-16T14:16:13Z

It needs a release note. Look in doc/release/upcoming_changes/ for examples.

lysnikolaou · 2023-10-16T14:25:11Z

@charris Added a minimal release note. Should it be more comprehensive than that?

seberg

A few comments, maybe @ngoldbaum can have a quick look, generally should be good to go though, but I would like the resolve dtypes to be clarified as "this doesn't make sense, but we need it for the DType class/kind".
Also, tests for things that were clearly buggy but fixed in later review would be good.

doc/release/upcoming_changes/24858.new_feature.rst

numpy/core/defchararray.py

numpy/core/src/umath/string_ufuncs.cpp

numpy/core/src/umath/ufunc_type_resolution.c

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

numpy/core/src/umath/string_ufuncs.cpp

ngoldbaum

Just one super minor nit otherwise this looks great!

numpy/core/src/umath/ufunc_type_resolution.c

ngoldbaum · 2023-10-17T17:00:54Z

Thanks @lysnikolaou!

ENH: Extend np.add ufunc to work with unicode and byte dtypes

3ab96bf

github-actions bot added the 01 - Enhancement label Oct 4, 2023

Only allow same argument types in np.char.add for now

d066e85

Rename resolve descriptors function

6b97081

Add benchmarks

1f3a508

lysnikolaou force-pushed the string-ufuncs-add branch from c2a6975 to 1f3a508 Compare October 5, 2023 11:33

seberg reviewed Oct 5, 2023

View reviewed changes

Address feedback

b49a6df

Add release note

4149c07

seberg reviewed Oct 17, 2023

View reviewed changes

lysnikolaou and others added 3 commits October 17, 2023 11:15

Update doc/release/upcoming_changes/24858.new_feature.rst

7185a71

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

Merge branch 'main' into string-ufuncs-add

44e41c0

Add comments that explains type resolution

ddf798a

seberg reviewed Oct 17, 2023

View reviewed changes

numpy/core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved

numpy/core/src/umath/string_ufuncs.cpp Outdated Show resolved Hide resolved

lysnikolaou added 2 commits October 17, 2023 13:32

Fix resolve_descriptors and add test

a5f0929

Fix lint errors

af8ea4b

ngoldbaum reviewed Oct 17, 2023

View reviewed changes

numpy/core/src/umath/ufunc_type_resolution.c Outdated Show resolved Hide resolved

lysnikolaou added 3 commits October 17, 2023 17:15

Improve comment in type resolver

43c4e03

Merge branch 'main' into string-ufuncs-add

e188c31

Fix after merge

09839cc

ngoldbaum merged commit 19bfa3f into numpy:main Oct 17, 2023
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Extend np.add ufunc to work with unicode and byte dtypes #24858

ENH: Extend np.add ufunc to work with unicode and byte dtypes #24858

lysnikolaou commented Oct 4, 2023

lysnikolaou commented Oct 5, 2023

mattip commented Oct 5, 2023

lysnikolaou commented Oct 5, 2023

ngoldbaum commented Oct 5, 2023

seberg left a comment

lysnikolaou commented Oct 16, 2023

charris commented Oct 16, 2023

lysnikolaou commented Oct 16, 2023 •

edited

seberg left a comment

ngoldbaum left a comment

ngoldbaum commented Oct 17, 2023

ENH: Extend np.add ufunc to work with unicode and byte dtypes #24858

ENH: Extend np.add ufunc to work with unicode and byte dtypes #24858

Conversation

lysnikolaou commented Oct 4, 2023

lysnikolaou commented Oct 5, 2023

mattip commented Oct 5, 2023

lysnikolaou commented Oct 5, 2023

ngoldbaum commented Oct 5, 2023

seberg left a comment

Choose a reason for hiding this comment

lysnikolaou commented Oct 16, 2023

charris commented Oct 16, 2023

lysnikolaou commented Oct 16, 2023 • edited

seberg left a comment

Choose a reason for hiding this comment

ngoldbaum left a comment

Choose a reason for hiding this comment

ngoldbaum commented Oct 17, 2023

lysnikolaou commented Oct 16, 2023 •

edited