Skip to content

CSR matrix should infer data type consistent with what COO does #5353

@jni

Description

@jni

When using list inputs for (data, (i, j)), COO matrix infers the data type from the type of the items in data, just as np.array would.

In contrast, when constructing a CSR matrix with a (data, indices, indptr) tuple, either the data value must be an array, or the dtype= keyword argument must be passed to the constructor:

In [30]: i = [0, 1, 1, 1, 1, 2, 3, 4, 4]

In [31]: j = [2, 0, 1, 3, 4, 1, 0, 3, 4]

In [32]: data = [6, 1, 2, 4, 5, 1, 9, 6, 7]

In [33]: indptr = [0, 1, 5, 6, 7, 9]

In [34]: coo = sparse.coo_matrix((data, (i, j)))

In [35]: csr = sparse.csr_matrix((data, j, indptr))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Users/nuneziglesiasj/anaconda/envs/elegant/lib/python3.4/site-packages/scipy/sparse/sputils.py in getdtype(dtype, a, default)
    116         try:
--> 117             newdtype = a.dtype
    118         except AttributeError:

AttributeError: 'list' object has no attribute 'dtype'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-35-8de58fac1a34> in <module>()
----> 1 csr = sparse.csr_matrix((data, j, indptr))

/Users/nuneziglesiasj/anaconda/envs/elegant/lib/python3.4/site-packages/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
     54                     self.indices = np.array(indices, copy=copy, dtype=idx_dtype)
     55                     self.indptr = np.array(indptr, copy=copy, dtype=idx_dtype)
---> 56                     self.data = np.array(data, copy=copy, dtype=getdtype(dtype, data))
     57                 else:
     58                     raise ValueError("unrecognized %s_matrix constructor usage" %

/Users/nuneziglesiasj/anaconda/envs/elegant/lib/python3.4/site-packages/scipy/sparse/sputils.py in getdtype(dtype, a, default)
    121                 canCast = False
    122             else:
--> 123                 raise TypeError("could not interpret data type")
    124     else:
    125         newdtype = np.dtype(dtype)

TypeError: could not interpret data type

In [36]: sparse.csr_matrix?

In [37]: csr = sparse.csr_matrix((data, j, indptr), dtype=int)

In [38]: data2 = np.array(data)

In [39]: csr = sparse.csr_matrix((data2, j, indptr))

In [40]: coo.todense()
Out[40]:
matrix([[0, 0, 6, 0, 0],
        [1, 2, 0, 4, 5],
        [0, 1, 0, 0, 0],
        [9, 0, 0, 0, 0],
        [0, 0, 0, 6, 7]])

In [41]: csr.todense()
Out[41]:
matrix([[0, 0, 6, 0, 0],
        [1, 2, 0, 4, 5],
        [0, 1, 0, 0, 0],
        [9, 0, 0, 0, 0],
        [0, 0, 0, 6, 7]])

I would argue that CSR should infer the data type in the same way that COO does, even if the input is a list (or other array-like).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions