ENH: fix 0d array printing using `str` or `formatter`. #9332

ahaldane · 2017-06-29T19:25:20Z

This is split off from #9139, plus has some extra editing of the release notes.

eric-wieser · 2017-06-30T00:00:24Z

numpy/core/arrayprint.py

@@ -521,7 +515,9 @@ def array2string(a, max_line_width=None, precision=None,
    if formatter is None:
        formatter = _formatter

-    if a.size == 0:
+    if style is not None and a.shape == () and a.dtype.names is None:
+        return style(a[()])


Worth adding a comment here that the names is None check is to avoid co-recursion with voidtype_str, which upcasts structured scalars to arrays again

ahaldane · 2017-06-30T15:59:16Z

Added comment, also reorganized voidtype_str.

eric-wieser · 2017-06-30T16:15:09Z

numpy/core/src/multiarray/scalartypes.c.src

+        return ret;
+    }
+    else {
+        PyObject  *item, *item_str;


Too many spaces here

ahaldane · 2017-06-30T16:40:47Z

numpy/core/tests/test_arrayprint.py

        assert_equal(repr(np.datetime64('2005-02-25')[...]),
                     "array('2005-02-25', dtype='datetime64[D]')")

+        # repr of 0d arrays is affected by printoptions
        x = np.array(1)
        np.set_printoptions(formatter={'all':lambda x: "test"})
        assert_equal(repr(x), "array(test)")


Your comment below got lost on github, but I see it in the email.

This test class implements tearDown which resets the printoptions, so we don't need to worry about pollution. Will add the str test.

eric-wieser

LGTM. There's some related stuff in #9201, but we can deal with that later.

eric-wieser · 2017-09-06T07:57:51Z

numpy/core/src/multiarray/scalartypes.c.src

+            ret = PyObject_Str((PyObject *)arr);
+            Py_DECREF(arr);
+        }
+        return ret;


Can we get rid of this, and make a call to something in core.multiarray._internal? This is the recursive point I'd rather not have. The function would look like:

def _void_repr(x): fields = ', '.join(repr(xi) for xi in x) if len(x) == 1: return "(" + fields + ",)" else: return "(" + fields + ")"

Which is similar to StructureFormat.__call__, but doesn't invoke formatters - which is what we want - scalar repr shouldn't be customizable, should it?

eric-wieser · 2017-09-06T07:59:30Z

numpy/core/arrayprint.py

-    if a.size == 0:
+    # the ``dtype.names is None`` check is to avoid co-recursion with
+    # voidtype_str, which upcasts structured scalars to 0d arrays
+    if style is not None and a.shape == () and a.dtype.names is None:


Then we can ditch this check too

ahaldane · 2017-09-07T17:26:47Z

Yeah, eliminating the recursion that way is a good idea. Done in the new commit.

There was a small catch, though, that we have to treat subarray-fields carefully. Previously they were specially printed using the SubarrayFormatter which leaves out newlines, so I made the code here do that too.

ahaldane · 2017-09-07T18:41:29Z

I just pushed an extra commit with an alternate implementation of void_repr, which uses formatter rather than repr to show all the fields of the scalar. This will break a few more doctests, but I think I like it better. You can compare in the two last commits.

eric-wieser · 2017-09-25T06:17:53Z

numpy/core/arrayprint.py

+def _void_scalar_repr(x):
+    format_functions = []
+    for field_name in x.dtype.names:
+        format_function = _get_format_function(ravel(x[field_name]),


How about format_function = repr? I might be contradicting myself here, but I think that repr(void) should not be reconfigurable - isn't that now the case for other scalar reprs?

Sorry, I missed your above comments - you already did this, and wanted me to compare.

eric-wieser · 2017-09-25T06:33:46Z

The implementation of the second commit looks a little tidier in iterating over the dtype rather than the value, but I think we should be using repr inside _void_repr for scalars, instead of looking up the corresponding format function.

I think we should aim to keep the output constant no matter what the print setting sare.

eric-wieser · 2017-09-25T06:35:32Z

numpy/core/arrayprint.py

@@ -521,7 +515,9 @@ def array2string(a, max_line_width=None, precision=None,
    if formatter is None:
        formatter = _formatter

-    if a.size == 0:
+    if style is not None and a.shape == ():
+        return style(a[()])


I think we can kill this if entirely now, and leave the style argument as obsolete?

ahaldane · 2017-09-25T17:05:01Z

How do we want x = np.array([(2., '2005-01-01', 3.)], dtype='f,M8[D],f')[0] to be printed? If we use the formatters, we would get (2., '2005-01-01', 3.), just like when printed as part of an ndarray. If we use the repr, we will get (2.0, numpy.datetime64('2005-01-01'), 3.0). We will have a similar effect with unstructured void fields after #8981 because the void scalars will have a void( prefix.

I can see it both ways.

The formatter version is more streamlined, and I like that the output of print(arr) and print(arr[0]) will look the same, while there can be a big difference if we use repr.

But I admit the repr version is more explicit since it shows the dtypes, which are lost in the formatter version. When we printed a full array using formatters, it was ok to skip the dtypes in the elements since the dtype is printed at the end of the array.

ahaldane · 2017-09-25T17:05:27Z

Also, my original motivation for using formatter instead of repr was that void-scalars behave a lot more like arrays than all other scalars: They are writeable, sliceable, indexable, can be thought of as container-types, and are sequence-like. Therefore we might use formatter which is designed to print elements which are part of a sequence. ~~Should np.ones(2, dtype='f,f,f') print so differently from np.ones((2,3))?~~ (never mind, that sentence doesn't apply to scalars.) Also, If the scalar contains subarrays, the output is already configurable by the print-options.

eric-wieser · 2017-09-26T01:03:39Z

Also, If the scalar contains subarrays, the output is already configurable by the print-options.

Is that true? I would have thought using SubArrayFormat(repr) (which results from my suggested change) would cause the print options to be ignored

eric-wieser · 2017-09-26T01:08:27Z

But I admit the repr version is more explicit

Ultimately, I think this should be the goal of the scalar repr

I like that the output of print(arr) and print(arr[0]) will look the same, while there can be a big difference if we use repr.

This is already untrue of floating point and datetime types, so I don't think it needs to be a goal.

void-scalars behave a lot more like arrays than all other scalars

That's a reasonable argument, that I thought of too. But I think the important thing is that unlike arrays, void.__repr__ does not show any dtype information, so needs to show that by repeatable (ie, not configurable) formatting

eric-wieser · 2017-09-26T01:12:22Z

Either way, no point touching this till we pull the trigger on #9139

eric-wieser · 2017-09-26T05:56:42Z

I've gone ahead and rebased this, and then added a single commit on the end with the repr behaviour - feel free to revert.

eric-wieser · 2017-09-26T05:58:45Z

numpy/core/src/multiarray/scalartypes.c.src

@@ -634,6 +608,34 @@ static PyObject *
 /**end repeat**/

 static PyObject *
+voidtype_str(PyObject *self)
+{
+    if (PyDataType_HASFIELDS(((PyVoidScalarObject*)self)->descr)) {


Extra parens here

No, I think we need them all... compiler errors for if (PyDataType_HASFIELDS((PyVoidScalarObject*)self->descr))

My bad, I was incorrectly reading if ((PyDataType_HASFIELDS(...))) here

eric-wieser · 2017-09-26T05:59:09Z

numpy/core/tests/test_arrayprint.py

+        if sys.version_info[0] >= 3:
+            assert_equal(str(np.array('café', np.unicode_)), 'café')
+            assert_equal(repr(np.array('café', np.unicode_)),
+                         "array('café',\n      dtype='<U4')")


This has an issue number, right?

I don't think so... I think we came up with this one in #9143.

ahaldane · 2017-11-08T23:55:31Z

I actually somewhat dislike truncating floats if they appear on their own. Inside a "array(" + x + ")", or inside [] its fine, but on its own it could be confusing because it looks like a scalar but isn't. Scalars don't truncate, so with your suggestion with a str(0d) you might think you got a unique representation, but didn't.

So I still favor that str(0d) is printed exactly like scalars. Scalar floats print in unambiguous full precision. In general scalars and 0ds are printed on their own so we don't have limited space like when printing an array.

mhvk · 2017-11-09T00:15:08Z

OK, that is a good argument as well. And since with your current implementation I can override this if I wish by using np.set_string_function, I'm happy to just leave it as is (modulo the comments @eric-wieser made). It is a great improvement as is!

The str of 0d arrays now returns `str(a[()])` like scalars, and the repr returns `'array(' + formatter(a[()]) + ')'` like ndarrays. The default implementation of str and repr for user-defined types is removed. Fixes numpy#1415

…changes Fixes numpy#9804

eric-wieser · 2017-11-09T08:31:55Z

numpy/core/arrayprint.py

+    if a.shape == () and not a.dtype.names:
+        if _format_options['legacy']:
+            return str(a.item())
+        return str(a[()])


What effect is guarding this under a.dtype.names in the non-legacy case having?

I'd perhaps be inclined to write this as the slightly longer

if _format_options['legacy'] and a.shape == () and not a.dtype.names: return str(a.item()) # actually justify this behavior in a comment here, where it;s a feature, not deprecated behaviour if a.shape == () and not a.dtype.names: return str(a[()])

just to separate the legacy conditions from the rest

Yeah good idea for clarity.

I had the same thought about a.dtype.names when writing this. I did confirm for myself that in 1.13 the a.dtype.names case was special-cased to use the formatter to represent the value, i.e. it should go through the array2string code, which uses formatter.

Hmm but actualy, here is the implication of excluding a.dtype.names in the non-legacy case.

If we do if a.shape == () and not a.dtype.names:, then the eventual output is the output of StructureFormat, but with the precision specified as argument to array_str, but all other settings from __format_options. I am not sure the precision argument is ever used.

If we only do if a.shape == (), then the eventual output is still the output of StructureFormat, but with all settings, including precision, from __format_options. We go through a different path through void_scalar_repr to get there but the result is almost identical.

I'm going to remove the a.dtype.names test in the non-legacy case, just to make the code simpler. I don't think it has any real effect. I'll leave it in legacy mode since it makes a difference there,

eric-wieser · 2017-11-09T08:32:46Z

numpy/core/arrayprint.py

+    if options['legacy']:
+        if a.shape == () and not a.dtype.names:
+            return style(a.item())
+    elif not (style is np._NoValue):


nit: style is not np._NoValue

eric-wieser · 2017-11-09T16:55:45Z

doc/release/1.14.0-notes.rst

+character for positive values. The new default is '-'.
+
+This new default changes the float output relative to numpy 1.13. The old
+behavior can be obtained in "legacy" printing mode, see compatibility notes.


nit: add the word "above" here, and in the other place you refer to it.

eric-wieser · 2017-11-09T16:56:28Z

numpy/core/arrayprint.py

+        if a.shape == () and not a.dtype.names:
+            return style(a.item())
+    elif style is not np._NoValue:
+        warnings.warn("'style' argument is deprecated and no longer functional"


You lost the deprecation date comment here

eric-wieser · 2017-11-09T16:58:06Z

numpy/core/tests/test_arrayprint.py

@@ -302,11 +317,13 @@ def test_sign_spacing(self):
        assert_equal(repr(np.array(1.)), 'array(+1.)')
        assert_equal(repr(b), 'array([+1.234e+09])')

-        np.set_printoptions(sign='legacy')
+        np.set_printoptions(legacy=True)


Would be nice to have a test that style is not deprecated in legacy mode.

eric-wieser · 2017-11-09T17:00:27Z

Looks great! Nits about release note and tests, but happy with the iteration we've landed on.

charris · 2017-11-09T18:32:29Z

Marking this for 1.14 release. If there are more unmerged PRs involving printing/formatting, it would be helpful to also set their milestones.

ahaldane · 2017-11-09T18:38:07Z

@charris, I just went through and added the last 2 to 1.14

charris · 2017-11-09T18:58:02Z

Great, thanks Allan. I'm going to start pushing off most of the other PRs to 1.15, but I want to make sure all the printing/formatting changes get in.

ahaldane · 2017-11-12T04:14:14Z

I think everything is done.

I'd like to merge this one before rebasing #8981 because of the dependencies.

eric-wieser · 2017-11-12T10:04:13Z

numpy/core/arrayprint.py

+        if a.shape == () and not a.dtype.names:
+            return style(a.item())
+    elif style is not np._NoValue:
+        # Deprecation 11-9-2017  v1.14


Can we make this 2017-11-09 please (like we do elsewhere)? As a European, this format is silly ;)

(yes, I realized that I missed it the first round)

eric-wieser

All looks good to me! I've rebased #9201 with just the unicode changes.

One tiny nit about date formats in deprecation warning.

charris · 2017-11-12T15:03:22Z

Sounds like folks are happy with this, so in it goes. Thanks Allan, and thanks to the rest of the print formatting crew.

ahaldane mentioned this pull request Jun 29, 2017

ENH: remove unneeded spaces in float/bool reprs, fixes 0d str #9139

Merged

eric-wieser reviewed Jun 30, 2017

View reviewed changes

numpy/core/src/multiarray/scalartypes.c.src Outdated

return ret;

}

else {

PyObject *item, *item_str;

Copy link

Member

eric-wieser Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many spaces here

ahaldane commented Jun 30, 2017

View reviewed changes

ahaldane force-pushed the fix_0d_strrepr branch from a49f639 to 5f045dd Compare June 30, 2017 16:45

eric-wieser approved these changes Jun 30, 2017

View reviewed changes

ahaldane mentioned this pull request Jul 1, 2017

BUG: np.void(b'test') enters recursion loop in repr #9345

Closed

charris added 01 - Enhancement component: numpy._core labels Jul 5, 2017

eric-wieser reviewed Sep 6, 2017

View reviewed changes

ahaldane force-pushed the fix_0d_strrepr branch from 5f045dd to efa131b Compare September 7, 2017 17:26

ahaldane force-pushed the fix_0d_strrepr branch 2 times, most recently from 5b8a483 to e8c322d Compare September 7, 2017 19:49

eric-wieser reviewed Sep 25, 2017

View reviewed changes

ahaldane mentioned this pull request Sep 25, 2017

0-d arrays ignore printoptions (Trac #817) #1415

Closed

eric-wieser force-pushed the fix_0d_strrepr branch from e8c322d to 8fae6fc Compare September 26, 2017 05:55

eric-wieser reviewed Sep 26, 2017

View reviewed changes

ahaldane force-pushed the fix_0d_strrepr branch from be1e90a to e3ac30b Compare November 9, 2017 00:15

ahaldane added 2 commits November 8, 2017 19:15

ENH: fix 0d array printing using str or formatter.

2461bc9

The str of 0d arrays now returns `str(a[()])` like scalars, and the repr returns `'array(' + formatter(a[()]) + ')'` like ndarrays. The default implementation of str and repr for user-defined types is removed. Fixes numpy#1415

MAINT: create a "legacy" print mode to account for whitespace and 0d …

734b907

…changes Fixes numpy#9804

ahaldane force-pushed the fix_0d_strrepr branch 2 times, most recently from 136fd06 to e2e14ae Compare November 9, 2017 01:03

eric-wieser reviewed Nov 9, 2017

View reviewed changes

ahaldane force-pushed the fix_0d_strrepr branch 3 times, most recently from 120f0b8 to 8df86b6 Compare November 9, 2017 16:33

eric-wieser reviewed Nov 9, 2017

View reviewed changes

ahaldane added 2 commits November 9, 2017 12:27

MAINT: Move 0d special casing to array_str and array_repr

6fb5327

DOC: Update release notes about legacy mode and 0d arrays

bbf5438

ahaldane force-pushed the fix_0d_strrepr branch from 8df86b6 to bbf5438 Compare November 9, 2017 17:28

mhvk mentioned this pull request Nov 9, 2017

More format changes in numpy-dev astropy/astropy#6824

Closed

charris added this to the 1.14.0 release milestone Nov 9, 2017

eric-wieser reviewed Nov 12, 2017

View reviewed changes

eric-wieser approved these changes Nov 12, 2017

View reviewed changes

charris merged commit 32a58fc into numpy:master Nov 12, 2017

eric-wieser moved this from Pending to Done in Changes to 0d and scalar repr Nov 13, 2017

ENH: fix 0d array printing using str or formatter. #9332

ENH: fix 0d array printing using str or formatter. #9332

Conversation

ahaldane commented Jun 29, 2017

eric-wieser Jun 30, 2017 • edited

Choose a reason for hiding this comment

ahaldane commented Jun 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser left a comment

Choose a reason for hiding this comment

eric-wieser Sep 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahaldane commented Sep 7, 2017

ahaldane commented Sep 7, 2017

eric-wieser Sep 25, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Sep 25, 2017 • edited

Choose a reason for hiding this comment

ahaldane commented Sep 25, 2017

ahaldane commented Sep 25, 2017 • edited

eric-wieser commented Sep 26, 2017

eric-wieser commented Sep 26, 2017

eric-wieser commented Sep 26, 2017

eric-wieser commented Sep 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahaldane commented Nov 8, 2017

mhvk commented Nov 9, 2017

Choose a reason for hiding this comment

eric-wieser Nov 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Nov 9, 2017

charris commented Nov 9, 2017

ahaldane commented Nov 9, 2017

charris commented Nov 9, 2017

ahaldane commented Nov 12, 2017

Choose a reason for hiding this comment

eric-wieser left a comment • edited

Choose a reason for hiding this comment

charris commented Nov 12, 2017

ENH: fix 0d array printing using `str` or `formatter`. #9332

ENH: fix 0d array printing using `str` or `formatter`. #9332

eric-wieser Jun 30, 2017 •

edited

eric-wieser Sep 6, 2017 •

edited

eric-wieser Sep 25, 2017 •

edited

eric-wieser commented Sep 25, 2017 •

edited

ahaldane commented Sep 25, 2017 •

edited

eric-wieser Nov 9, 2017 •

edited

eric-wieser left a comment •

edited