ENH: Update scalar representations as per NEP 51 #22449

seberg · 2022-10-18T10:09:58Z

This is a WIP for implementing NEP 51/gh-22261. There are a few things that go beyond just changing the scalar representation:

It makes get_formatter() a more central/public API, to also move fmt=... to it.
- This introduces the distinction that if the dtype is known, we can e.g. represent a longdouble as '3.1' (with quotes) rather than np.longdouble('3.1').
- fmt is mainly used for fmt="r" (previous point), I think fmt="s" should be supported, but always be identical to str(arr[0]).
- There could be an argument for the fmt path to be a ufunc (it should not inspect array values, unlike the default printing).
- fmt should be used more cleanly for the __format__ work. This should be fine, I think it only means pushing the "option" adjustments lower.
arr.tofile() used repr(arr.item(...)) to print each element. This is weird/problematic:
- For longdouble, I want that repr to be np.longdouble('3.1').
- For strings, this means it uses 'string' or b'string'
Record and MA printing has to be adjusted to use the new functionality. This affects record scalars and the printing of the masked array fill_value.

After some back and forth, I think the default should just be to go back to str() (this was changed silently in BUG 4381 Longdouble from string without precision loss #6199). str(arr.item()) is not ideal for float32 for example.
This could use some thought (or should change to use the above get_formatter(fmt="s"), but I hope this can be decoupled from NEP 51.

I plan to update the NEP 51 draft to include these points. Happy to discuss here of course (will make pushing NEP 51 quicker). But for now the draft-PR exists primarily to help with the NEP 51 discussion.

TODOS::

Refguide check is failing (and may lead to fixups being necessary?)
Discuss and accept NEP 51
Decide whether this should be hidden behind a flag or if it should interact with the legacy print-mode (i.e. to allow inclusion in the next release, but only switch with NEP 50 – which would likely be a NumPy 2.0 as well)
New tests for some paths are needed. Mainly arr.tofile() is not well tested (which is why the change here should not fail tests).

tylerjereddy · 2022-10-20T00:16:04Z

@seberg so far so good for latest main of SciPy alongside this feature branch in my hands (Ubuntu Linux; i9-7900X):

python dev.py test -j 10
37669 passed, 2222 skipped, 139 xfailed, 9 xpassed in 83.59s (0:01:23)

seberg · 2022-10-21T13:11:46Z

So, I realized that the empty format string "" already has more or less the meaning of str(), and in that sense I was thinking of using the repr function itself as a singleton for "scalar repr" (rather than "r").

I don't really have an opinion, could even say that str should be used for !s formatting effectively. Or could go back, because why would anyone use {obj:r} for something that is not a repr...

EDIT: Reconsidering this, maybe "" is different, since it would be simpler if that was the same as the default formatting, which is not string. So using str and repr seems actually pretty nice?

The thing to look into now (besides test fixes), is that array float formatting doesn't match scalar float formatting, so I actually need to pass through the formatting information better (should not be hard, but needs a bit organizing/thought.)

seberg · 2022-10-24T12:56:49Z

This is starting to settle, although I am considering changing np.float64(nan) to add the quotes always (np.float64('nan') similar to Python).

@rossbar do you know why the refguide check does not seem to notice changes from 0.0 to np.float64(0.0)? In a sense, I like that, making that a followup is probably far more manageable.

h-vetinari · 2022-11-01T02:04:22Z

I saw #22261 only now, so I won't complain if this is too late, but I'd really like to get away from long, longlong, etc. I'd much prefer to just use sized integers throughout. It's a much saner model, and now that C99 is available on all platforms, we could do it (modulo the fact that LAPACK and perhaps some other venerable libs still have those types in their API 😑)

seberg · 2022-11-01T07:19:44Z

I am proposing to print the bit-sized version, even if imprecise sometimes. Yes, it would be nice to change it, but it seems like a big ABI change on the C-side?

h-vetinari · 2022-11-02T03:56:06Z

I am proposing to print the bit-sized version, even if imprecise sometimes.

Looked more closely, looks fine.

Yes, it would be nice to change it, but it seems like a big ABI change on the C-side?

I'm afraid so, but I still think it'd be worth it eventually (numpy 2.0 material?).

h-vetinari

There seems to be a mismatch between the implementation and the NEP w.r.t. whether printing is affected or not.

Based on the implementation, I think this PR has pretty high chance of disrupting services where something gets calculated with numpy, and then passed on in some serialized form to another consumer, where numpy (or even python) might not necessarily be installed.

Is there any fallback we could provide so that people can adapt such interfaces (e.g. .value or .raw or ...)?

doc/source/reference/arrays.classes.rst

ev-br · 2022-11-05T04:06:10Z

the refguide check does not seem to notice changes from 0.0 to np.float64(0.0)

It won't, as long as np.allclose(eval('np.float64(0.0)'), 0.0)

charris · 2023-02-19T18:57:57Z

Needs rebase.

pllim · 2023-07-03T13:31:06Z

@mhvk , how do you think astropy can/should deal with this in our doctest downstream? I can only think of disabling doctest for numpy-dev and then when numpy 2.0 is released, we update the docs and disable doctest for numpy 1.x. However, this comes with the risk that we won't catch any incompatibility if a feature is only tested in the doc examples. Hope you can advise. Thanks! 🙏

mhvk

@seberg - I think this all makes sense now. The only slightly larger remaining comment is about repr_format and str_format -- see there.

It may be good to have a follow-up issue about remaining items. It seems to be current array2string is actually fine, but the question seems to be whether there should be some kind of option where all scalars, including void, would round-trip. If we want to support sub-arrays for void, then I think this would imply that having a round-trip version for array would be a simple extension. But it seems to me it would also be fine not to do that - the main goal, of ensuring scalars are recognizable, has been reached here. Anyway, for a separate issue, let's get this one in first!

numpy/core/arrayprint.py

mhvk · 2023-07-18T16:49:50Z

numpy/core/arrayprint.py

@@ -536,7 +544,7 @@ def _array2string(a, options, separator=' ', prefix=""):
        summary_insert = ""

    # find the right formatting function for the array
-    format_function = _get_format_function(data, **options)
+    format_function = _get_format_function(data=data, **options)


Might as well avoid unnecessary changes...

mhvk · 2023-07-18T16:50:07Z

numpy/core/arrayprint.py

@@ -1400,13 +1408,23 @@ def __call__(self, x):
            return "({})".format(", ".join(str_fields))


-def _void_scalar_repr(x):
+def _void_scalar_to_string(x, is_repr=True):


Yes, the name change is fine!

mhvk · 2023-07-18T16:59:58Z

numpy/core/tests/test_arrayprint.py

+        normalized_name = np.dtype(f"{dtype.kind}{dtype.itemsize}").type.__name__
+        assert representation == f"np.{normalized_name}({repr_string})"
+
+    np.set_printoptions(legacy="1.25")


Fine in itself, but context manager safer:

with np.printoptions(legacy="1.25"): ...

mhvk · 2023-07-18T17:00:21Z

numpy/core/tests/test_arrayprint.py

+    assert repr(scalar) == representation
+
+    np.set_printoptions(legacy="1.25")
+    assert repr(scalar) == legacy_repr


Again would use context manager.

mhvk · 2023-07-18T17:03:48Z

numpy/core/tests/test_longdouble.py

@@ -316,7 +316,7 @@ class TestCommaDecimalPointLocale(CommaDecimalPointLocale):

    def test_repr_roundtrip_foreign(self):


And here too.

numpy/core/tests/test_records.py

numpy/ma/tests/test_core.py

numpy/ma/core.py

mhvk · 2023-07-18T17:29:39Z

doc/neps/nep-0051-scalar-representation.rst

@@ -232,6 +232,14 @@ found `here <https://github.com/numpy/numpy/pull/22449>`_
 Implementation
 ==============

+.. note::
+    This part has *not* been implemented with the initial changes.


Might as well give a PR number, for easier reference: gh-22449.

seberg · 2023-07-18T22:14:14Z

To be honest, I still think we will eventually go back to what I had, although hopefully it's OK to dump the special special fmt states (and improved) (and just have r and s, which void can use, whatever subarray makes of that, I don't care if it checks the printoptions and doesn't round-trip).

Structured void repr should stay integrated with array repr, I think. So any changes there need to be integrated into array2string internals (which are a mess anyway, and in need of reasonable hooks for both user dtypes and format string passing).

seberg · 2023-07-25T13:12:24Z

Can we push this over the finish line? Yes, my doctest changeset isn't quite there yet, but not sure it will be helpful to wait, I will expect that the worst affected projects would use the legacy= option anyway initially?

mhvk

@seberg - sorry, lots of other stuff came up. But yes, this is good to go as far as I'm concerned!

charris · 2023-07-26T00:31:03Z

Thanks Sebastian.

In numpygh-22449 `descr` was renamed `dtype`, which broke the build as numpygh-24211 decrefed the old variable to fix a reference leak. This bug slipped in because numpygh-22449 was not retested after the merge.

larsoner · 2023-07-26T20:20:31Z

I can see a lot of projects hitting this (EDIT: like us)... I'm trying this workaround in mne/conftest.py in pytest_configure after reading the release notes from here:

    try:
        np.set_printoptions(legacy="1.25")
    except Exception:
        pass  # probably missing kwarg

I'll let people know in 1h if it works!

larsoner · 2023-07-26T20:33:13Z

So far I had to change it to legacy=125, using a string broke stuff later (not at set_printoptions time) for released versions of NumPy. I'll update in another hour...

larsoner · 2023-07-26T21:19:33Z

Looks like we're headed to green with legacy="1.25" (set only when the NumPy version is > 1.25), so the suggested workaround is all good (when deployed correctly) 👍

seberg · 2023-07-27T08:25:27Z

I will note that just setting the legacy mode for the tests will unfortunately hide issues if you use repr(scalar) for serialization/writing to a file (or a dependency does). So it is a good hot-fix for doctests or tests which check repr explicitly, but might cause real issues to be missed.

github-actions bot added the 25 - WIP label Oct 18, 2022

seberg force-pushed the scalar-repr-change branch 2 times, most recently from 342b701 to ff552fc Compare October 18, 2022 11:59

seberg marked this pull request as ready for review October 21, 2022 13:14

seberg changed the title ~~WIP: Update scalar representations as per NEP 51~~ ENH: Update scalar representations as per NEP 51 Oct 21, 2022

seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Oct 21, 2022

github-actions bot added the 01 - Enhancement label Oct 21, 2022

seberg added 25 - WIP and removed 25 - WIP 01 - Enhancement labels Oct 21, 2022

seberg marked this pull request as draft October 21, 2022 13:44

seberg force-pushed the scalar-repr-change branch from 893d77d to 387e26a Compare October 24, 2022 12:34

github-actions bot added the 01 - Enhancement label Oct 24, 2022

seberg marked this pull request as ready for review October 24, 2022 12:56

seberg force-pushed the scalar-repr-change branch 3 times, most recently from ce4166f to 9a64758 Compare October 24, 2022 15:31

seberg mentioned this pull request Oct 25, 2022

NEP: Make NEP 51 to propose changing the scalar representation #22261

Merged

h-vetinari reviewed Nov 2, 2022

View reviewed changes

doc/source/reference/arrays.classes.rst Show resolved Hide resolved

seberg force-pushed the scalar-repr-change branch from decdee0 to 08c07ce Compare June 21, 2023 14:26

seberg force-pushed the scalar-repr-change branch from 08c07ce to bf92d6e Compare June 30, 2023 16:01

seberg force-pushed the scalar-repr-change branch from 7ea85db to 7fcf5ca Compare July 18, 2023 11:17

MAINT: Tweak string check (forgot the kind, but maybe tuple is nice)

19ed59e

seberg force-pushed the scalar-repr-change branch from 8bdf97c to 19ed59e Compare July 18, 2023 12:10

Also force repr for object

0329f18

mhvk reviewed Jul 18, 2023

View reviewed changes

MAINT: A few small fixups from review

51eb71e

seberg removed 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes 25 - WIP labels Jul 24, 2023

mhvk approved these changes Jul 25, 2023

View reviewed changes

charris merged commit de398b0 into numpy:main Jul 26, 2023
54 of 57 checks passed

charris mentioned this pull request Jul 26, 2023

BUG: Fix use of renamed variable. #24263

Merged

seberg deleted the scalar-repr-change branch July 26, 2023 06:44

mroeschke mentioned this pull request Jul 26, 2023

DEPS: Update reprs with numpy NEP51 scalar repr pandas-dev/pandas#54268

Merged

pllim mentioned this pull request Jul 26, 2023

TST: 400+ doctest failures because numpy 2.0.dev changed string representation astropy/astropy#15095

Closed

seberg mentioned this pull request Jul 26, 2023

ENH: Changed repr of np.bool_ #17592

Closed

larsoner mentioned this pull request Jul 26, 2023

Read raw eyelink fix mne-tools/mne-python#11823

Merged

3 tasks

keewis mentioned this pull request Jul 28, 2023

⚠️ Nightly upstream-dev CI failed ⚠️ pydata/xarray#7977

Closed

This was referenced Aug 5, 2023

[TST] Upcoming dependency test failures matplotlib/matplotlib#26460

Closed

MNT: Adjust for upcoming numpy repr changes matplotlib/matplotlib#26467

Merged

This was referenced Aug 9, 2023

MNT Remove DeprecationWarning for scipy.sparse.linalg.cg tol vs rtol argument scikit-learn/scikit-learn#26814

Merged

MNT Adjust code after NEP 51 numpy scalar formatting changes scikit-learn/scikit-learn#27042

Merged

ngoldbaum mentioned this pull request Jun 17, 2024

DOC: set_printoptions doesn't mention new legacy mode 1.25 #26731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Update scalar representations as per NEP 51 #22449

ENH: Update scalar representations as per NEP 51 #22449

seberg commented Oct 18, 2022 •

edited

Loading

tylerjereddy commented Oct 20, 2022

seberg commented Oct 21, 2022 •

edited

Loading

seberg commented Oct 24, 2022

h-vetinari commented Nov 1, 2022

seberg commented Nov 1, 2022

h-vetinari commented Nov 2, 2022

h-vetinari left a comment

ev-br commented Nov 5, 2022

charris commented Feb 19, 2023

pllim commented Jul 3, 2023

mhvk left a comment

mhvk Jul 18, 2023

mhvk Jul 18, 2023

mhvk Jul 18, 2023

mhvk Jul 18, 2023

mhvk Jul 18, 2023

mhvk Jul 18, 2023

seberg commented Jul 18, 2023

seberg commented Jul 25, 2023

mhvk left a comment

charris commented Jul 26, 2023

larsoner commented Jul 26, 2023 •

edited

Loading

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

seberg commented Jul 27, 2023

		@@ -316,7 +316,7 @@ class TestCommaDecimalPointLocale(CommaDecimalPointLocale):

		def test_repr_roundtrip_foreign(self):

ENH: Update scalar representations as per NEP 51 #22449

ENH: Update scalar representations as per NEP 51 #22449

Conversation

seberg commented Oct 18, 2022 • edited Loading

tylerjereddy commented Oct 20, 2022

seberg commented Oct 21, 2022 • edited Loading

seberg commented Oct 24, 2022

h-vetinari commented Nov 1, 2022

seberg commented Nov 1, 2022

h-vetinari commented Nov 2, 2022

h-vetinari left a comment

Choose a reason for hiding this comment

ev-br commented Nov 5, 2022

charris commented Feb 19, 2023

pllim commented Jul 3, 2023

mhvk left a comment

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

mhvk Jul 18, 2023

Choose a reason for hiding this comment

seberg commented Jul 18, 2023

seberg commented Jul 25, 2023

mhvk left a comment

Choose a reason for hiding this comment

charris commented Jul 26, 2023

larsoner commented Jul 26, 2023 • edited Loading

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

seberg commented Jul 27, 2023

seberg commented Oct 18, 2022 •

edited

Loading

seberg commented Oct 21, 2022 •

edited

Loading

larsoner commented Jul 26, 2023 •

edited

Loading