-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operations between numpy.array and scipy.sparse matrix return inconsistent array type #7510
Comments
For me, the issue here is that
This issue also applies to subtraction. Actually the way we noticed it was from code like:
This command changed the type of |
In general, the rule we try to follow is: if you replace the sparse matrix with a numpy matrix, the resulting type should not change (unless the result itself is sparse). Following this rule:
|
@perimosocordiae so it sounds like goal is to imitate the type behavior of import numpy
from scipy.sparse import csc_matrix
data = [[1,1],
[1,0]]
array = numpy.array(data)
matrix = numpy.matrix(data)
sparse = csc_matrix(data) The following pure numpy operations all return a matrix: For the dot product case, I will use
@perimosocordiae how do you know whether the output should be sparse? For example, why doesn't From the scipy.linalg docs:
This makes a lot of sense to me. In fact, for our project, we've tried to avoid using So my question is, for a project that mixes 2d-arrays and scipy.sparse matrices, should we just migrate from ndarray to matrix entirely? While this forces us to use the "discouraged" class, it's an uphill battle to use ndarrays when everything is always getting converted to matrices? |
Your analysis looks correct to me. We determine if the output is sparse without considering the contents of the operands, based on the worst-case behavior. Thus, In general, I agree that users should avoid To answer your question, I suggest taking a close look at the operations producing your arrays/matrices, and deciding what the appropriate type is on a case-by-case basis. It's fairly common to have separate code paths for sparse vs dense inputs, as the time/space complexity concerns often change pretty significantly between these cases. If you find that you often have sparse matrices that end up densifying, it might be faster/cheaper to avoid the sparse representation altogether. |
@perimosocordiae
|
gh-7826 has more discussion of that particular problem, and why it's not trivial to fix. I like the suggestion in #7826 (comment) about raising a warning, though. As far as I know there aren't any open PRs along those lines yet. |
@perimosocordiae thanks! |
It appears that adding or subtracting numpy.ndarrays with scipy.sparse matrices returns a numpy.matrix.
Is the inconsistency in returned array type (see code below) a bug or is it intentional?
It is confusing because different operations return either numpy arrays or numpy matrices, and since we are working with arrays that are often inter-converted between sparse and dense, it would be very helpful to have consistency in output array type (ie. operations between sparse and dense always return either a numpy array or always a numpy matrix.)
It makes sense that operations between like types return the same type as the input type. However, it would be nice if operations between numpy.arrays and scipy.sparse matrices would return always the same array type.
Reproducing code example:
Scipy/Numpy/Python version information:
The text was updated successfully, but these errors were encountered: