Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510

zietzm · 2017-06-20T16:00:11Z

It appears that adding or subtracting numpy.ndarrays with scipy.sparse matrices returns a numpy.matrix.

Is the inconsistency in returned array type (see code below) a bug or is it intentional?

It is confusing because different operations return either numpy arrays or numpy matrices, and since we are working with arrays that are often inter-converted between sparse and dense, it would be very helpful to have consistency in output array type (ie. operations between sparse and dense always return either a numpy array or always a numpy matrix.)

It makes sense that operations between like types return the same type as the input type. However, it would be nice if operations between numpy.arrays and scipy.sparse matrices would return always the same array type.

Reproducing code example:

import numpy as np
from scipy.sparse import csc_matrix

mat = np.array([[1,1],
                [1,0]])
sp_mat = csc_matrix([[1,1],
                     [1,0]])

mat + mat
>>> array([[2, 2],
           [2, 0]])

sp_mat + sp_mat
>>> <2x2 sparse matrix of type '<class 'numpy.int64'>'
	with 3 stored elements in Compressed Sparse Column format>

mat + sp_mat
>>> matrix([[2, 2],
            [2, 0]], dtype=int64)

sp_mat * mat
>>> array([[2, 1],
           [1, 1]], dtype=int64)

sp_mat == mat
>>> matrix([[ True,  True],
            [ True,  True]], dtype=bool)

Scipy/Numpy/Python version information:

print(scipy.__version__, numpy.__version__, sys.version_info)
>>> 0.19.0 1.12.1 sys.version_info(major=3, minor=6, micro=0, releaselevel='final', serial=0)

The text was updated successfully, but these errors were encountered:

dhimmel · 2017-06-20T16:57:56Z

For me, the issue here is that numpy.array + scipy.sparse returns a numpy.matrix. I think the ideal behavior is:

numpy.array + scipy.sparse returns a numpy.array
numpy.matrix + scipy.sparse returns a numpy.matrix

This issue also applies to subtraction. Actually the way we noticed it was from code like:

numpy_array -= scipy_sparse_matrix

This command changed the type of numpy_array to numpy.matrix which caused downstream problems.

perimosocordiae · 2017-06-21T13:58:02Z

In general, the rule we try to follow is: if you replace the sparse matrix with a numpy matrix, the resulting type should not change (unless the result itself is sparse).

Following this rule:

array -= sparse converting to matrix is certainly a bug (and I can reproduce it locally).
array + sparse returning matrix is working as intended (even though it's a bit strange).
array * sparse returning array is a bug.

dhimmel · 2017-06-21T14:39:46Z

In general, the rule we try to follow is: if you replace the sparse matrix with a numpy matrix, the resulting type should not change (unless the result itself is sparse).

@perimosocordiae so it sounds like goal is to imitate the type behavior of numpy.matrix. Let's work with the following definitions:

import numpy
from scipy.sparse import csc_matrix

data = [[1,1],
        [1,0]]

array = numpy.array(data)
matrix = numpy.matrix(data)
sparse = csc_matrix(data)

The following pure numpy operations all return a matrix: array + matrix, matrix + array, array - matrix, matrix - array. Hence according to @perimosocordiae's rule, array + sparse, sparse + array, array - sparse, and sparse - array should all return a matrix (which they do). However, since array += matrix and array -= matrix, keep array as an array, so should array += sparse and array -= sparse.

For the dot product case, I will use @ rather than * for clarity. array @ matrix and matrix @ array both return a matrix. Hence, array @ sparse and sparse @ array should return a matrix, but instead they return an array.

(unless the result itself is sparse).

@perimosocordiae how do you know whether the output should be sparse? For example, why doesn't matrix @ sparse return sparse?

From the scipy.linalg docs:

Despite its convenience, the use of the numpy.matrix class is discouraged, since it adds nothing that cannot be accomplished with 2D numpy.ndarray objects, and may lead to a confusion of which class is being used.

This makes a lot of sense to me. In fact, for our project, we've tried to avoid using matrix at all. Primarily because we sometimes have to do elementwise multiplication. The separate operators for dot product and elementwise multiplication is helpful. However, scipy.sparse seems pretty committed to the matrix rather than ndarray design.

So my question is, for a project that mixes 2d-arrays and scipy.sparse matrices, should we just migrate from ndarray to matrix entirely? While this forces us to use the "discouraged" class, it's an uphill battle to use ndarrays when everything is always getting converted to matrices?

perimosocordiae · 2017-06-21T15:24:53Z

Your analysis looks correct to me.

We determine if the output is sparse without considering the contents of the operands, based on the worst-case behavior. Thus, sparse - dense is always dense, even if the particular dense operand happens to be mostly zeros. In the case of dot products, the worst-case is that the dense operand has no zeros and the sparse operand is full-rank, which will result in an entirely dense result.

In general, I agree that users should avoid numpy.matrix whenever possible. Unfortunately, the scipy.sparse API design decision to mimic numpy.matrix can't be changed now. There's a growing list of efforts to produce a sparse ndarray-like library that may someday make it into scipy, but that will take considerable time.

To answer your question, I suggest taking a close look at the operations producing your arrays/matrices, and deciding what the appropriate type is on a case-by-case basis. It's fairly common to have separate code paths for sparse vs dense inputs, as the time/space complexity concerns often change pretty significantly between these cases. If you find that you often have sparse matrices that end up densifying, it might be faster/cheaper to avoid the sparse representation altogether.

evfro · 2020-04-14T12:51:54Z

@perimosocordiae
Hi, is there any work going on to fix the bug you outlined above?

array -= sparse converting to matrix is certainly a bug

perimosocordiae · 2020-04-14T16:50:57Z

gh-7826 has more discussion of that particular problem, and why it's not trivial to fix. I like the suggestion in #7826 (comment) about raising a warning, though.

As far as I know there aren't any open PRs along those lines yet.

evfro · 2020-04-18T12:33:51Z

@perimosocordiae thanks!

perimosocordiae added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse labels Jun 21, 2017

perimosocordiae added this to the 1.0 milestone Jun 21, 2017

perimosocordiae closed this as completed Sep 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510

Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510

zietzm commented Jun 20, 2017

dhimmel commented Jun 20, 2017 •

edited

Loading

perimosocordiae commented Jun 21, 2017

dhimmel commented Jun 21, 2017 •

edited

Loading

perimosocordiae commented Jun 21, 2017

evfro commented Apr 14, 2020

perimosocordiae commented Apr 14, 2020

evfro commented Apr 18, 2020

Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510

Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510

Comments

zietzm commented Jun 20, 2017

Reproducing code example:

Scipy/Numpy/Python version information:

dhimmel commented Jun 20, 2017 • edited Loading

perimosocordiae commented Jun 21, 2017

dhimmel commented Jun 21, 2017 • edited Loading

perimosocordiae commented Jun 21, 2017

evfro commented Apr 14, 2020

perimosocordiae commented Apr 14, 2020

evfro commented Apr 18, 2020

dhimmel commented Jun 20, 2017 •

edited

Loading

dhimmel commented Jun 21, 2017 •

edited

Loading