Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: numpy.r_ interprets first argument as array element when invoked with unicode #8818

Closed
Dominik1123 opened this issue Mar 23, 2017 · 5 comments

Comments

@Dominik1123
Copy link

Dominik1123 commented Mar 23, 2017

Consider the following code on Python 2.x:

a = numpy.ones((2,))
b = numpy.zeros((2,))
# Passing unicode string here, this could also happen due to
# `from __future__ import unicode_literals`.
numpy.r_[u'0,2', a, b]

This gives:

array([u'0,2', u'1.0', u'1.0', u'0.0', u'0.0'], 
      dtype='<U32')

Apparently it considers the u'0,2' an array element (instead of a concatenation specifier) and casts all other array elements to the corresponding dtype. This is quite ambiguous because if the first argument was a str then it would use it to deduce how arrays should be concatenated:

>>> numpy.r_['0,2', a, b]
array([[ 1.,  1.],
       [ 0.,  0.]])

I would expect either the same behavior for str and unicode arguments or raise a TypeError for unicode (or at least add a .. warning:: to the docs). Since the documentation just speaks of "strings" (no type mentioned here) this can lead to funny bugs in application code.

Note that the same applies to Python 3.x using b'0,2' or encoded strings in general for the first argument (instead of unicode).

Tested on:

  • Python 2.7.12 + numpy==1.12.0
  • Python 3.5.2 + numpy==1.12.0
@eric-wieser
Copy link
Member

eric-wieser commented Mar 23, 2017

I really don't understand why the api was chosen as np.r_[value...] or np.r_[config, value...] instead of np.r_[values] or np.r_(config)[values]. There's no sensible out of band value to use as the config.

There's a danger here that fixing this would break any user who deliberately wanted to concatenate unicode strings into an array.

Also while we're getting angry at r_, np.r_['0,2', 1, 2] crashes due to carelessness (until #8816, anyway). Also, #8518 is a thing.

@Dominik1123
Copy link
Author

For the beginning adding a warning to the docs would be an appropriate solution I guess as it won't break any legacy code.

@eric-wieser
Copy link
Member

Note that the same applies to Python 3.x using b'0,2' or encoded strings in general for the first argument (instead of unicode).

I think this is correct behaviour. The only real issue is that from __future__ import unicode_literals causes things to break unexpectedly in user code in python 2. In python 3, the user should know better than to pass bytes to random apis that expect strings.

@Dominik1123 Dominik1123 changed the title numpy.r_ interprets first argument as array element when invoked with non-native string type numpy.r_ interprets first argument as array element when invoked with unicode Mar 23, 2017
@eric-wieser
Copy link
Member

eric-wieser commented Mar 23, 2017

It's also worth remembering that all these bugs also apply to numpy.ma.mr_, and need independently fixing in both places

@eric-wieser eric-wieser changed the title numpy.r_ interprets first argument as array element when invoked with unicode BUG: numpy.r_ interprets first argument as array element when invoked with unicode Mar 23, 2017
@WarrenWeckesser
Copy link
Member

This was an issue in Python 2. We no longer support Python 2, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants