Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: fix 0d array printing using str or formatter. #9332

Merged
merged 4 commits into from Nov 12, 2017

Conversation

ahaldane
Copy link
Member

This is split off from #9139, plus has some extra editing of the release notes.

@@ -521,7 +515,9 @@ def array2string(a, max_line_width=None, precision=None,
if formatter is None:
formatter = _formatter

if a.size == 0:
if style is not None and a.shape == () and a.dtype.names is None:
return style(a[()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment here that the names is None check is to avoid co-recursion with voidtype_str, which upcasts structured scalars to arrays again

@ahaldane
Copy link
Member Author

Added comment, also reorganized voidtype_str.

return ret;
}
else {
PyObject *item, *item_str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many spaces here

assert_equal(repr(np.datetime64('2005-02-25')[...]),
"array('2005-02-25', dtype='datetime64[D]')")

# repr of 0d arrays is affected by printoptions
x = np.array(1)
np.set_printoptions(formatter={'all':lambda x: "test"})
assert_equal(repr(x), "array(test)")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment below got lost on github, but I see it in the email.

This test class implements tearDown which resets the printoptions, so we don't need to worry about pollution. Will add the str test.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There's some related stuff in #9201, but we can deal with that later.

ret = PyObject_Str((PyObject *)arr);
Py_DECREF(arr);
}
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this, and make a call to something in core.multiarray._internal? This is the recursive point I'd rather not have. The function would look like:

def _void_repr(x):
    fields = ', '.join(repr(xi) for xi in x)
    if len(x) == 1:
        return "(" + fields + ",)"
    else:
        return "(" + fields + ")"

Which is similar to StructureFormat.__call__, but doesn't invoke formatters - which is what we want - scalar repr shouldn't be customizable, should it?

if a.size == 0:
# the ``dtype.names is None`` check is to avoid co-recursion with
# voidtype_str, which upcasts structured scalars to 0d arrays
if style is not None and a.shape == () and a.dtype.names is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can ditch this check too

@ahaldane
Copy link
Member Author

ahaldane commented Sep 7, 2017

Yeah, eliminating the recursion that way is a good idea. Done in the new commit.

There was a small catch, though, that we have to treat subarray-fields carefully. Previously they were specially printed using the SubarrayFormatter which leaves out newlines, so I made the code here do that too.

@ahaldane
Copy link
Member Author

ahaldane commented Sep 7, 2017

I just pushed an extra commit with an alternate implementation of void_repr, which uses formatter rather than repr to show all the fields of the scalar. This will break a few more doctests, but I think I like it better. You can compare in the two last commits.

@ahaldane ahaldane force-pushed the fix_0d_strrepr branch 2 times, most recently from 5b8a483 to e8c322d Compare September 7, 2017 19:49
def _void_scalar_repr(x):
format_functions = []
for field_name in x.dtype.names:
format_function = _get_format_function(ravel(x[field_name]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about format_function = repr? I might be contradicting myself here, but I think that repr(void) should not be reconfigurable - isn't that now the case for other scalar reprs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed your above comments - you already did this, and wanted me to compare.

@eric-wieser
Copy link
Member

eric-wieser commented Sep 25, 2017

The implementation of the second commit looks a little tidier in iterating over the dtype rather than the value, but I think we should be using repr inside _void_repr for scalars, instead of looking up the corresponding format function.

I think we should aim to keep the output constant no matter what the print setting sare.

@@ -521,7 +515,9 @@ def array2string(a, max_line_width=None, precision=None,
if formatter is None:
formatter = _formatter

if a.size == 0:
if style is not None and a.shape == ():
return style(a[()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can kill this if entirely now, and leave the style argument as obsolete?

@ahaldane
Copy link
Member Author

How do we want x = np.array([(2., '2005-01-01', 3.)], dtype='f,M8[D],f')[0] to be printed? If we use the formatters, we would get (2., '2005-01-01', 3.), just like when printed as part of an ndarray. If we use the repr, we will get (2.0, numpy.datetime64('2005-01-01'), 3.0). We will have a similar effect with unstructured void fields after #8981 because the void scalars will have a void( prefix.

I can see it both ways.

The formatter version is more streamlined, and I like that the output of print(arr) and print(arr[0]) will look the same, while there can be a big difference if we use repr.

But I admit the repr version is more explicit since it shows the dtypes, which are lost in the formatter version. When we printed a full array using formatters, it was ok to skip the dtypes in the elements since the dtype is printed at the end of the array.

@ahaldane
Copy link
Member Author

ahaldane commented Sep 25, 2017

Also, my original motivation for using formatter instead of repr was that void-scalars behave a lot more like arrays than all other scalars: They are writeable, sliceable, indexable, can be thought of as container-types, and are sequence-like. Therefore we might use formatter which is designed to print elements which are part of a sequence. Should np.ones(2, dtype='f,f,f') print so differently from np.ones((2,3))? (never mind, that sentence doesn't apply to scalars.) Also, If the scalar contains subarrays, the output is already configurable by the print-options.

@eric-wieser
Copy link
Member

Also, If the scalar contains subarrays, the output is already configurable by the print-options.

Is that true? I would have thought using SubArrayFormat(repr) (which results from my suggested change) would cause the print options to be ignored

@eric-wieser
Copy link
Member

But I admit the repr version is more explicit

Ultimately, I think this should be the goal of the scalar repr

I like that the output of print(arr) and print(arr[0]) will look the same, while there can be a big difference if we use repr.

This is already untrue of floating point and datetime types, so I don't think it needs to be a goal.

void-scalars behave a lot more like arrays than all other scalars

That's a reasonable argument, that I thought of too. But I think the important thing is that unlike arrays, void.__repr__ does not show any dtype information, so needs to show that by repeatable (ie, not configurable) formatting

@eric-wieser
Copy link
Member

Either way, no point touching this till we pull the trigger on #9139

@eric-wieser
Copy link
Member

I've gone ahead and rebased this, and then added a single commit on the end with the repr behaviour - feel free to revert.

@@ -634,6 +608,34 @@ static PyObject *
/**end repeat**/

static PyObject *
voidtype_str(PyObject *self)
{
if (PyDataType_HASFIELDS(((PyVoidScalarObject*)self)->descr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra parens here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we need them all... compiler errors for if (PyDataType_HASFIELDS((PyVoidScalarObject*)self->descr))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I was incorrectly reading if ((PyDataType_HASFIELDS(...))) here

if sys.version_info[0] >= 3:
assert_equal(str(np.array('café', np.unicode_)), 'café')
assert_equal(repr(np.array('café', np.unicode_)),
"array('café',\n dtype='<U4')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has an issue number, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so... I think we came up with this one in #9143.

@ahaldane
Copy link
Member Author

ahaldane commented Nov 8, 2017

I actually somewhat dislike truncating floats if they appear on their own. Inside a "array(" + x + ")", or inside [] its fine, but on its own it could be confusing because it looks like a scalar but isn't. Scalars don't truncate, so with your suggestion with a str(0d) you might think you got a unique representation, but didn't.

So I still favor that str(0d) is printed exactly like scalars. Scalar floats print in unambiguous full precision. In general scalars and 0ds are printed on their own so we don't have limited space like when printing an array.

@mhvk
Copy link
Contributor

mhvk commented Nov 9, 2017

OK, that is a good argument as well. And since with your current implementation I can override this if I wish by using np.set_string_function, I'm happy to just leave it as is (modulo the comments @eric-wieser made). It is a great improvement as is!

The str of 0d arrays now returns `str(a[()])` like scalars, and the
repr returns `'array(' + formatter(a[()]) + ')'` like ndarrays.
The default implementation of str and repr for user-defined types is
removed. Fixes numpy#1415
@ahaldane ahaldane force-pushed the fix_0d_strrepr branch 2 times, most recently from 136fd06 to e2e14ae Compare November 9, 2017 01:03
if a.shape == () and not a.dtype.names:
if _format_options['legacy']:
return str(a.item())
return str(a[()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What effect is guarding this under a.dtype.names in the non-legacy case having?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd perhaps be inclined to write this as the slightly longer

if _format_options['legacy'] and a.shape == () and not a.dtype.names:
    return str(a.item())

# actually justify this behavior in a comment here, where it;s a feature, not deprecated behaviour
if a.shape == () and not a.dtype.names:
    return str(a[()])

just to separate the legacy conditions from the rest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good idea for clarity.

I had the same thought about a.dtype.names when writing this. I did confirm for myself that in 1.13 the a.dtype.names case was special-cased to use the formatter to represent the value, i.e. it should go through the array2string code, which uses formatter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but actualy, here is the implication of excluding a.dtype.names in the non-legacy case.

If we do if a.shape == () and not a.dtype.names:, then the eventual output is the output of StructureFormat, but with the precision specified as argument to array_str, but all other settings from __format_options. I am not sure the precision argument is ever used.

If we only do if a.shape == (), then the eventual output is still the output of StructureFormat, but with all settings, including precision, from __format_options. We go through a different path through void_scalar_repr to get there but the result is almost identical.

I'm going to remove the a.dtype.names test in the non-legacy case, just to make the code simpler. I don't think it has any real effect. I'll leave it in legacy mode since it makes a difference there,

if options['legacy']:
if a.shape == () and not a.dtype.names:
return style(a.item())
elif not (style is np._NoValue):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: style is not np._NoValue

@ahaldane ahaldane force-pushed the fix_0d_strrepr branch 3 times, most recently from 120f0b8 to 8df86b6 Compare November 9, 2017 16:33
character for positive values. The new default is '-'.

This new default changes the float output relative to numpy 1.13. The old
behavior can be obtained in "legacy" printing mode, see compatibility notes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add the word "above" here, and in the other place you refer to it.

if a.shape == () and not a.dtype.names:
return style(a.item())
elif style is not np._NoValue:
warnings.warn("'style' argument is deprecated and no longer functional"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost the deprecation date comment here

@@ -302,11 +317,13 @@ def test_sign_spacing(self):
assert_equal(repr(np.array(1.)), 'array(+1.)')
assert_equal(repr(b), 'array([+1.234e+09])')

np.set_printoptions(sign='legacy')
np.set_printoptions(legacy=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a test that style is not deprecated in legacy mode.

@eric-wieser
Copy link
Member

Looks great! Nits about release note and tests, but happy with the iteration we've landed on.

@charris charris added this to the 1.14.0 release milestone Nov 9, 2017
@charris
Copy link
Member

charris commented Nov 9, 2017

Marking this for 1.14 release. If there are more unmerged PRs involving printing/formatting, it would be helpful to also set their milestones.

@ahaldane
Copy link
Member Author

ahaldane commented Nov 9, 2017

@charris, I just went through and added the last 2 to 1.14

@charris
Copy link
Member

charris commented Nov 9, 2017

Great, thanks Allan. I'm going to start pushing off most of the other PRs to 1.15, but I want to make sure all the printing/formatting changes get in.

@ahaldane
Copy link
Member Author

I think everything is done.

I'd like to merge this one before rebasing #8981 because of the dependencies.

if a.shape == () and not a.dtype.names:
return style(a.item())
elif style is not np._NoValue:
# Deprecation 11-9-2017 v1.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this 2017-11-09 please (like we do elsewhere)? As a European, this format is silly ;)

(yes, I realized that I missed it the first round)

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me! I've rebased #9201 with just the unicode changes.

One tiny nit about date formats in deprecation warning.

@charris charris merged commit 32a58fc into numpy:master Nov 12, 2017
@charris
Copy link
Member

charris commented Nov 12, 2017

Sounds like folks are happy with this, so in it goes. Thanks Allan, and thanks to the rest of the print formatting crew.

@eric-wieser eric-wieser moved this from Pending to Done in Changes to 0d and scalar repr Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants