Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: np.array fails on a list of arrays with partially matching dimensions #7453

Closed
martinosorb opened this issue Mar 23, 2016 · 9 comments
Closed

Comments

@martinosorb
Copy link

@martinosorb martinosorb commented Mar 23, 2016

The functions numpy.array and numpy.asarray have a well defined behaviour when applied to lists of arrays: if the listed arrays have the same dimensions and size, the list is turned in one of the dimensions of the resulting array (let's call it "mode 1"). If not, an array of arrays is returned ("mode 2").

However, the behaviour of numpy.array and numpy.asarray in "mode 2" seems to be dependent on the number of items in the arrays. The following code is not very elegant relative to numpy usefulness, but works:

>>> a = np.array([1, 2, 3])
>>> b = np.array([[1, 0], [0, 1]])
>>> np.asarray([a, b])
array([array([1, 2, 3]), array([[1, 0],
       [0, 1]])], dtype=object)

But the following doesn't:

>>> a = np.array([1, 2])
>>> b = np.array([[1, 0], [0, 1]])
>>> np.asarray([a, b])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/numpy/core/numeric.py", line 474, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not broadcast input array from shape (2,2) into shape (2)

Clearly, the problem is that, when numpy.asarray sees the first dimension has the same length in a and b, it tries to go for "mode 1", which is impossible here.

EDIT: I'm using numpy 1.10.4 and python 3.4.3.

@shoyer
Copy link
Member

@shoyer shoyer commented Mar 23, 2016

Agreed, this sort of fallback logic is unfortunate. We've discussed not making dtype=object arrays unless the dtype is explicitly provided. IMO np.array([a, b], dtype=object) should be the only way to write either of these -- and it shouldn't need to do any checks on shape.

@roman-kh
Copy link

@roman-kh roman-kh commented Apr 6, 2017

Providing dtype=object does not help - you still get the same error.
However, adding an empty array lets you avoid it:

arr_of_arr = np.array([np.array([]), a, b])[1:]

@ppwwyyxx
Copy link

@ppwwyyxx ppwwyyxx commented Jul 18, 2018

Still exists in 1.14.4

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jul 18, 2018

@ppwwyyxx - indeed, partially as it is not a trivial change, partially as it is not something one gets hit with all the time, so the urgency is relatively low (and there not that many people having time to contribute...).

But what might help here is to make it clearer exactly what the desired behaviour would be. @shoyer mentioned the also long-standing request to explicitly requiring dtype=object if that is in fact wanted, otherwise raising TypeError for anything that cannot be parsed as a numerical or string array (#5353). I've been pondering recently whether it would be useful to similarly have a dtype='structured', which would strictly enforce a difference between lists as indicating elements of an array and tuples as elements of a structured dtype.

@ppwwyyxx
Copy link

@ppwwyyxx ppwwyyxx commented Jul 19, 2018

I'm willing to contribute if anyone can send me some pointers on what to do. I just glanced at the related code in ctors.c, the constructor logic seems to be quite complicated as it needs to deal with many different forms of input.

@ppwwyyxx
Copy link

@ppwwyyxx ppwwyyxx commented Jul 19, 2018

Same issue: #8330

@eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Jul 23, 2018

I think this might be fixed by #11601. Edit: It is not.

@eric-wieser eric-wieser changed the title Inconsistent behaviour of numpy.asarray on list of arrays BUG: np.array fails on a list of arrays with partially matching dimensions Jul 23, 2018
@eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Jul 23, 2018

So, what's happening here is roughly:

a = np.array([1, 2])
b = np.array([[1, 0], [0, 1]])
out = np.asarray([a, b])
# translates to
out = np.empty((2, 2))  #shape is correctly inferred
out[0,:] = a
out[1,:] = b  # error comes from here

@seberg
Copy link
Member

@seberg seberg commented May 26, 2021

This is fixed in NumPy 1.19 (or 1.20). You have to provide dtype=object though, or you will get the ragged-array warning. Of course it may not have been quite fixed as mentioned above, since the inferred shape is (2,) and not (2, 2) (we do never rip apart an array!)

In [9]: a = np.array([1, 2, 3])
   ...: b = np.array([[1, 0], [0, 1]])
   ...: print(np.asarray([a, b], dtype=object))
   ...: np.asarray([a, b])
   ...: 
[array([1, 2, 3]) array([[1, 0],
                         [0, 1]])]
<ipython-input-9-f2d78f006499>:4: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.asarray([a, b])
Out[9]: 
array([array([1, 2, 3]), array([[1, 0],
                                [0, 1]])], dtype=object)

If that doesn't seem like the right fix, please open a new issue.

@seberg seberg closed this May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants