Can't take max of arrays at least as large as 2 ** 32 #495

wecassidy · 2021-06-27T06:12:21Z

Describe the bug
Calling sparse.COO.max on an array larger than 2 ** 32 - 1 fails a TypeError like so:

>>> a.shape
(4294967296,)
>>> a.max()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 444, in max
    return np.maximum.reduce(self, out=out, axis=axis, keepdims=keepdims)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 307, in __array_ufunc__
    result = SparseArray._reduce(ufunc, *inputs, **kwargs)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 278, in _reduce
    return self.reduce(method, **kwargs)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 360, in reduce
    out = self._reduce_calc(method, axis, keepdims, **kwargs)
  File "C:\<path_redacted>\sparse\_coo\core.py", line 692, in _reduce_calc
    data, inv_idx, counts = _grouped_reduce(a.data, a.coords[0], method, **kwargs)
  File "C:\<path_redacted>\sparse\_coo\core.py", line 1566, in _grouped_reduce
    result = method.reduceat(x, inv_idx, **kwargs)
TypeError: Cannot cast array data from dtype('uint64') to dtype('int64') according to the rule 'safe'

To Reproduce
Create an array a at least as large as 2 ** 32 with at least one nonzero element, then call a.max(). For example:

>>> b = sparse.DOK((2 ** 32,))
>>> b[0] = 1
>>> a = sparse.COO(b)
>>> a.nnz
1
>>> a.max() # TypeError

Expected behavior
Return the maximum value of the array (1 in the example above).

System

OS and version: Windows 10
sparse version: 0.12.0+44.g765e297 (bug is also present in 0.12.0, installed from pip)
NumPy version: 1.18.5
Numba version: 0.53.1

Additional context
sparse.COO.max works on an array of size 2 ** 32 if it is empty (i.e. a.nnz == 0).

The text was updated successfully, but these errors were encountered:

hameerabbasi · 2021-06-27T07:03:49Z

Are you on 32-bit Windows by any chance?

wecassidy · 2021-06-27T16:19:16Z

I'm on 64-bit Windows.

I just checked and this bug is not present on Manjaro 21.0.7 with Linux 5.12.9-1-MANJARO (x86_64).

hameerabbasi · 2021-10-23T09:16:59Z

Mentoring instructions: Replace all uses of np.[as]array(list) with np.[as]array(list, dtype=np.int64).

GPhilo · 2022-07-05T11:04:49Z

Hello, I ran into the same problem. Was there any solution to this?

GPhilo · 2022-07-05T11:35:38Z

A quick update since I'm now digging into the library. I see that there is an idx_dtype parameter for the constructor of COO that -I believe- should force COO to use a specific type as index format. However, if data is None in the constructor's call the array is converted via as_coo, which in turn relies on DOK's as_format, which here calls COO.from_iter, which doesn't take the idx_dtype and doesn't forward it to the final call to COO's constructor here.

The result is, effectively, that idx_dtype gets ignored.

A proposal for improving this would be:

as_coo should take idx_dtype (and possibly more parameters of the constructor, maybe directly **kwargs?) anf forward them down as appropriate.
as_format should take **kwargs and should forward them to whichever constructor/factory it uses internally
from_iter should take **kwargs and forward them to the COO constructor.

I don't know which, if any, parameter combinations should be forbidden to ensure there is no infinite recursion in the constructor, but I believe someone with more knowledge of the codebase might know what and where to check so this doesn't happen.

GPhilo · 2022-07-05T11:52:57Z

I traced the issue to its source and came up with a hack to make this work, should anyone else also run into this problem.
Basically, when this reshape is called, because idx_type is ignored, as mentioned in the comment above, it uses the default int32 idx_type. Since in32 can't store the new shape, this test checks positive and idx_type gets converted to the result of np.min_scalar_type(max(shape)), which is np.uint64 and that's what causes the problem.

My hack to solve this is to hardcode np.int64 instead of letting numpy choose:

idx_type = np.int64

This solves the problem when calling max().

hameerabbasi · 2022-07-05T15:47:44Z

Thanks @GPhilo for digging into this, I'll try to set some time aside this weekend to fix it and cut a release.

Violin9906 · 2024-03-04T10:37:31Z

It has been more than 2 years and this issue seems still exists. Any update on this?

hameerabbasi · 2024-03-04T11:01:32Z

This doesn't happen anymore on sparse 0.15.1, which is the latest release. Closing.

wecassidy added the type:bug label Jun 27, 2021

hameerabbasi closed this as completed Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't take max of arrays at least as large as 2 ** 32 #495

Can't take max of arrays at least as large as 2 ** 32 #495

wecassidy commented Jun 27, 2021 •

edited

hameerabbasi commented Jun 27, 2021

wecassidy commented Jun 27, 2021

hameerabbasi commented Oct 23, 2021

GPhilo commented Jul 5, 2022

GPhilo commented Jul 5, 2022

GPhilo commented Jul 5, 2022

hameerabbasi commented Jul 5, 2022

Violin9906 commented Mar 4, 2024

hameerabbasi commented Mar 4, 2024

Can't take max of arrays at least as large as 2 ** 32 #495

Can't take max of arrays at least as large as 2 ** 32 #495

Comments

wecassidy commented Jun 27, 2021 • edited

hameerabbasi commented Jun 27, 2021

wecassidy commented Jun 27, 2021

hameerabbasi commented Oct 23, 2021

GPhilo commented Jul 5, 2022

GPhilo commented Jul 5, 2022

GPhilo commented Jul 5, 2022

hameerabbasi commented Jul 5, 2022

Violin9906 commented Mar 4, 2024

hameerabbasi commented Mar 4, 2024

wecassidy commented Jun 27, 2021 •

edited