[Discussion]NumPy-like behavior in Array module #16911

kangzhiq · 2019-05-27T23:29:23Z

The idea comes originally from this issue #15464.
I have noticed that the initialization of an Array, in some case, accepts only a sequence of int, rather than a tuple like we do in NumPy.

>>> from sympy.tensor.array import MutableDenseNDimArray
>>> MutableDenseNDimArray.zeros(2, 3)
[[0, 0, 0], [0, 0, 0]]
>>> MutableDenseNDimArray.zeros((2, 3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\CodeHouse\Python\sympy\sympy\tensor\array\dense_ndim_array.py", line 62, in zeros
    return cls._new(([0]*list_length,), shape)
TypeError: can't multiply sequence by non-int of type 'tuple'

@Upabjojr has also noticed a difference of behavior between SymPy and NumPy. I would give the floor to him so that the problem can be illustrated clearly. (Thank you!)

In my opinion, it would be good if we follow the example of NumPy in syntax and behaviors of Array. Besides, as far as I know, Array module is recently created, so it could be eligible to some changes.

Based on this topic, what do you think about it?

The text was updated successfully, but these errors were encountered:

czgdp1807 · 2019-05-28T04:05:50Z

I think, the NumPy syntax should be supported. However, it would be good if you can present some advantages and disadvantages of using the NumPy syntax.
I am saying so because, tuple, (2, 3) can safely be converted to sequence 2, 3 by * operator, so, there shouldn't be any problem in adapting it. We can even allow both types of inputs. Let's see what @Upabjojr has to say over it. May be, there is some difference of principles used in SymPy and NumPy which don't allow us to use tuples. I want to know them.

kangzhiq · 2019-05-28T08:44:20Z

May be, there is some difference of principles used in SymPy and NumPy which don't allow us to use tuples.

Yes, I agree with you. I would also like to know them. :-)
Regarding the tuple thing, if you look at the codes of zeros():

def zeros(cls, *shape):
        list_length = functools.reduce(lambda x, y: x*y, shape, S.One)
        return cls._new(([0]*list_length,), shape)

In fact, the shape is passed as a tuple to a constructor. So I don't see other constraints to use tuple directly.
I think that both tuple and sequence of numbers can work nicely, and there is not(as far as I know) difference between the performance. So I might not be able to point out which one is better.
My point is that we are having both syntax in our API, which can be ambigious. For example, while initialting an array with a list:

>>> a = MutableDenseNDimArray([0, 1, 2, 3], (2, 2))
>>> a
[[0, 1], [2, 3]]

here we use a tuple to represent the shape. It doen't have to be NumPy-like, it is just a standardization of syntax. Of course, we should only apply it if there are no explicit obstacles.

Upabjojr · 2019-05-28T09:19:30Z

We can even allow both types of inputs.

Yes, that would be optimal.

Many users are used to NumPy's syntax, so the difference may be an obstacle to the adoption of SymPy.

kangzhiq · 2019-05-28T20:33:10Z

Ok I would try to make both parameters acceptable!

kangzhiq · 2019-05-28T22:15:04Z

@Upabjojr Regarding the index syntax, do you have some comments?
I reproduced the problem:

>>> from sympy.tensor.array import ImmutableDenseNDimArray
>>> a = ImmutableDenseNDimArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], (2, 2, 3))
>>> a[1]
1
>>> import numpy as np
>>> b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]])
>>> b[1]
array([[ 6,  7,  8],
       [ 9, 10, 11]])

Personnally, I have found several differences between those module:

>>> b[1, 1]
array([ 9, 10, 11])
>>> a[1, 1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\CodeHouse\Python\sympy\sympy\tensor\array\dense_ndim_array.py", line 56, in __getitem__
    index = self._parse_index(index)
  File "D:\CodeHouse\Python\sympy\sympy\tensor\array\ndim_array.py", line 78, in _parse_index
    raise ValueError('Wrong number of array axes')
ValueError: Wrong number of array axes
>>> b[b>3]
array([ 4,  5,  6,  7,  8,  9, 10, 11])
>>> a[a>3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'ImmutableDenseNDimArray' and 'int'
>>> a[1:9]
[1, 2, 3, 4, 5, 6, 7, 8]
>>> b[1:9]
array([[[ 6,  7,  8],
        [ 9, 10, 11]]])

I will test it more properly to list out the differences.

czgdp1807 · 2019-05-29T04:24:08Z

>>> from sympy.tensor.array import ImmutableDenseNDimArray
>>> a = ImmutableDenseNDimArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], (2, 2, 3))
>>> a[1]
1
>>> import numpy as np
>>> b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]])
>>> b[1]
array([[ 6,  7,  8],
       [ 9, 10, 11]])

I think that ImmutableDenseNDimArray should be in line with numpy. At least, the output should be same for same queries.

Upabjojr · 2019-05-31T13:27:21Z

I agree we should take NumPy as a reference. As the array module is still pretty young, we are still on time to change behaviour.

oscarbenjamin added the tensor label May 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion]NumPy-like behavior in Array module #16911

[Discussion]NumPy-like behavior in Array module #16911

kangzhiq commented May 27, 2019

czgdp1807 commented May 28, 2019 •

edited

kangzhiq commented May 28, 2019 •

edited

Upabjojr commented May 28, 2019

kangzhiq commented May 28, 2019

kangzhiq commented May 28, 2019

czgdp1807 commented May 29, 2019

Upabjojr commented May 31, 2019

[Discussion]NumPy-like behavior in Array module #16911

[Discussion]NumPy-like behavior in Array module #16911

Comments

kangzhiq commented May 27, 2019

czgdp1807 commented May 28, 2019 • edited

kangzhiq commented May 28, 2019 • edited

Upabjojr commented May 28, 2019

kangzhiq commented May 28, 2019

kangzhiq commented May 28, 2019

czgdp1807 commented May 29, 2019

Upabjojr commented May 31, 2019

czgdp1807 commented May 28, 2019 •

edited

kangzhiq commented May 28, 2019 •

edited