# ENH function to return nnz per row/column in some sparse matrices#3247

Merged
merged 1 commit into from Jan 29, 2014
+91 −18

### 5 participants

I often use `np.diff(mat.indptr)` to get the number of nonzeros in each row of a CSR or each column of a CSC. `bincount` can be used for the minor axis. Reading these expressions is unfriendly and un-polymorphic, so I propose a method to get `nnz` by axis. This PR overloads `getnnz` to that purpose in appropriate matrices.

SciPy member

If only for consistency with other 2D arrays, it is probably a good idea to support `axis=-1`, either explicitly, or with the usual `if axis < 0: axis += 2`.

Also, the usual error returned by numpy is `ValueError: 'axis' entry is out of bounds`.

I thought that too, but followed the example of `sum`, `min`, `max`:

``````>>> import scipy.sparse as sp
>>> sp.rand(5, 5).sum(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../scipy/sparse/base.py", line 682, in sum
raise ValueError("axis out of bounds")
ValueError: axis out of bounds
>>> sp.rand(5, 5).min(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../scipy/sparse/data.py", line 191, in min
raise ValueError("invalid axis, use 0 for rows, or 1 for columns")
ValueError: invalid axis, use 0 for rows, or 1 for columns
``````

Coverage remained the same when pulling 74c249a on jnothman:nnz_axis into 8ad48cd on scipy:master.

SciPy member

Ah, didn't know that... If it is a conscious design choice for the whole `sparse` submodule then I guess it is better to keep consistency within the module, but I think that it is a very poor design choice: if the long term vision is to make sparse matrices interchangeable with dense arrays, then this definitely doesn't help.

Coverage remained the same when pulling 74c249a on jnothman:nnz_axis into 8ad48cd on scipy:master.

Ah, didn't know that... If it is a conscious design choice for the whole sparse submodule then I guess it is better to keep consistency within the module, but I think that it is a very poor design choice: if the long term vision is to make sparse matrices interchangeable with dense arrays, then this definitely doesn't help.

I agree, but perhaps it should be changed in all methods and tests in one PR if possible...

Sparse matrices have different interfaces than either numpy ndarrays or numpy matrices, but they are pretty close to numpy matrices. Because numpy matrices can deal with -1 axis, I think it would be nice if scipy sparse matrices could do this too. But I'm not trying to slow down this PR.

SciPy member

Yes indeed, this is a great addition, I too have written the `diff` and `bincount` approach to this way too many times, so +1 on this, the axis indexing can be fit everywhere in a different PR.

SciPy member
commented Jan 28, 2014

Not supporting axis=-1 is probably a bug, rather than a feature... (And might as well be fixed already here. gh-3249 to track it)

and 1 other commented on an outdated diff Jan 28, 2014
scipy/sparse/tests/test_base.py
 @@ -3154,9 +3172,10 @@ def test_todia_all_zeros(self): assert_array_equal(dia.A, zeros) +print('TestDIA')
 SciPy member pv added a note Jan 28, 2014 debug print jnothman added a note Jan 28, 2014 Not again! jnothman added a note Jan 28, 2014 I've removed the debug statement and forced it into the same commit. to join this conversation on GitHub. Already have an account? Sign in to comment
 jnothman `ENH function to return nnz per row/column in some sparse matrices` `d171c00`

Coverage remained the same when pulling d171c00 on jnothman:nnz_axis into 8ad48cd on scipy:master.

merged commit `c683ff9` into scipy:master Jan 29, 2014

#### 1 check passed

Details default The Travis CI build passed
SciPy member
commented Jan 29, 2014

Thanks, looks OK to me.