Skip to content

ENH function to return nnz per row/column in some sparse matrices #3247

Merged
merged 1 commit into from Jan 29, 2014

5 participants

@jnothman

I often use np.diff(mat.indptr) to get the number of nonzeros in each row of a CSR or each column of a CSC. bincount can be used for the minor axis. Reading these expressions is unfriendly and un-polymorphic, so I propose a method to get nnz by axis. This PR overloads getnnz to that purpose in appropriate matrices.

@jaimefrio
SciPy member

If only for consistency with other 2D arrays, it is probably a good idea to support axis=-1, either explicitly, or with the usual if axis < 0: axis += 2.

Also, the usual error returned by numpy is ValueError: 'axis' entry is out of bounds.

@jnothman

I thought that too, but followed the example of sum, min, max:

>>> import scipy.sparse as sp
>>> sp.rand(5, 5).sum(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../scipy/sparse/base.py", line 682, in sum
    raise ValueError("axis out of bounds")
ValueError: axis out of bounds
>>> sp.rand(5, 5).min(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../scipy/sparse/data.py", line 191, in min
    raise ValueError("invalid axis, use 0 for rows, or 1 for columns")
ValueError: invalid axis, use 0 for rows, or 1 for columns
@coveralls

Coverage Status

Coverage remained the same when pulling 74c249a on jnothman:nnz_axis into 8ad48cd on scipy:master.

@jaimefrio
SciPy member

Ah, didn't know that... If it is a conscious design choice for the whole sparse submodule then I guess it is better to keep consistency within the module, but I think that it is a very poor design choice: if the long term vision is to make sparse matrices interchangeable with dense arrays, then this definitely doesn't help.

@coveralls

Coverage Status

Coverage remained the same when pulling 74c249a on jnothman:nnz_axis into 8ad48cd on scipy:master.

@jnothman

Ah, didn't know that... If it is a conscious design choice for the whole sparse submodule then I guess it is better to keep consistency within the module, but I think that it is a very poor design choice: if the long term vision is to make sparse matrices interchangeable with dense arrays, then this definitely doesn't help.

I agree, but perhaps it should be changed in all methods and tests in one PR if possible...

@argriffing

Sparse matrices have different interfaces than either numpy ndarrays or numpy matrices, but they are pretty close to numpy matrices. Because numpy matrices can deal with -1 axis, I think it would be nice if scipy sparse matrices could do this too. But I'm not trying to slow down this PR.

@jaimefrio
SciPy member

Yes indeed, this is a great addition, I too have written the diff and bincount approach to this way too many times, so +1 on this, the axis indexing can be fit everywhere in a different PR.

@pv
SciPy member
pv commented Jan 28, 2014

Not supporting axis=-1 is probably a bug, rather than a feature... (And might as well be fixed already here. gh-3249 to track it)

@pv pv and 1 other commented on an outdated diff Jan 28, 2014
scipy/sparse/tests/test_base.py
@@ -3154,9 +3172,10 @@ def test_todia_all_zeros(self):
assert_array_equal(dia.A, zeros)
+print('TestDIA')
@pv
SciPy member
pv added a note Jan 28, 2014

debug print

@jnothman
jnothman added a note Jan 28, 2014

Not again!

@jnothman
jnothman added a note Jan 28, 2014

I've removed the debug statement and forced it into the same commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coveralls

Coverage Status

Coverage remained the same when pulling d171c00 on jnothman:nnz_axis into 8ad48cd on scipy:master.

@pv pv merged commit c683ff9 into scipy:master Jan 29, 2014

1 check passed

Details default The Travis CI build passed
@pv
SciPy member
pv commented Jan 29, 2014

Thanks, looks OK to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.