TST tests for non-canonical input to sparse matrix operations #3254

Merged
merged 5 commits into from Feb 1, 2014

Conversation

Projects
None yet
3 participants
Contributor

jnothman commented Jan 29, 2014

A number of sparse matrix formats are designed to treat duplicate values as their sum; they prefer indices in sorted order, but should be capable when unsorted. The current test suite largely compares functionality to numpy arrays/matrices, and so constructs sparse matrices from those, producing only canonical sparse forms.

This introduces tests for non-canonical forms, but finds many test failures. I don't intend to fix them all here, but we can perhaps add known failure decorations.

@pv pv and 1 other commented on an outdated diff Jan 30, 2014

scipy/sparse/tests/test_base.py
+ if np.issubdtype(data.dtype, np.bool_):
+ if indptr is None:
+ return (data,) + inds
+ else:
+ return (data,) + inds + (indptr,)
+
+ data = data.repeat(2, axis=0)
+ data[::2] *= 2
+ data[1::2] *= -1
+
+ inds = tuple(indices.repeat(2) for indices in inds)
+
+ if indptr is None:
+ return (data,) + inds
+ else:
+ return (data,) + inds + 2 * indptr
@pv

pv Jan 30, 2014

Owner

This probably should read 2 * (indptr,)

@jnothman

jnothman Jan 30, 2014

Contributor

No, we're duplicating the data entries; 2*indptr fixes the indptr to point correctly.

@jnothman

jnothman Jan 30, 2014

Contributor

And I'd created this helper function in part to work with LIL as well, until I realised (I suppose) that LIL's not meant to handle duplicates like COO and CSR.

@pv

pv Jan 30, 2014

Owner

Then it probably should be (data,) + inds + (2 * indptr,)?
indptr is an array and cannot be added to a tuple (check the Travis-CI output)

@jnothman

jnothman Jan 30, 2014

Contributor

Yes, that ;)

@pv pv and 1 other commented on an outdated diff Jan 30, 2014

scipy/sparse/tests/test_base.py
+ indices[start:stop] = indices[start:stop][::-1]
+ data[start:stop] = data[start:stop][::-1]
+
+ if 'shape' not in kwargs:
+ kwargs['shape'] = M.shape
+
+ NC = construct((data, indices, indptr), **kwargs)
+ assert_array_almost_equal(M.A, NC.A)
+ return NC
+
+
+class TestCSRNonCanonical(_NonCanonicalCompressedMixin, TestCSR):
+ pass
+
+
+class TestCSCNonCanonical(_NonCanonicalCompressedMixin, TestCSR):
@pv

pv Jan 30, 2014

Owner

Inherits from TestCSR, not from TestCSC

@jnothman

jnothman Jan 30, 2014

Contributor

Whoops. Thanks.

@pv pv commented on an outdated diff Jan 30, 2014

scipy/sparse/tests/test_base.py
+ kwargs['shape'] = M.shape
+
+ NC = construct((data, indices, indptr), **kwargs)
+ assert_array_almost_equal(M.A, NC.A)
+ return NC
+
+
+class TestCSRNonCanonical(_NonCanonicalCompressedMixin, TestCSR):
+ pass
+
+
+class TestCSCNonCanonical(_NonCanonicalCompressedMixin, TestCSR):
+ pass
+
+
+class TestBSRNonCanonical(_NonCanonicalCompressedMixin, TestCSR):
@pv

pv Jan 30, 2014

Owner

As above

Owner

pv commented Jan 30, 2014

This test would indeed be useful to add.
For the compressed formats, one could also check if things work when the indices array is too big and contains crap beyond indptr[-1].

knownfailures can be added by overriding the corresponding functions in the Test*NonCanonical classes.

Contributor

jnothman commented Jan 30, 2014

For the compressed formats, one could also check if things work when the indices array is too big and contains crap beyond indptr[-1].

Do we want this case to work? Or do you mean that we should test that every method throws an error (do we need to be validating that often?)?

Owner

pv commented Jan 30, 2014

I was wondering whether len(indices) = len(data) = indptr[-1] should be taken as an invariant in the code or not. But maybe it's clearest to assume it's an invariant (doesn't need to be checked, except maybe in __init__ and in self.check_format).

Contributor

jnothman commented Jan 30, 2014

I think we can assume len(indices) == len(data) == indptr[-1] except where
there are functions for the user to set these (init). If the user
manually changes these things, it's their problem.

I'm pushing some known failures...

On 31 January 2014 07:05, Pauli Virtanen notifications@github.com wrote:

I was wondering whether len(indices) = len(data) = indptr[-1] should be
taken as an invariant in the code or not. But maybe it's clearest to assume
it's an invariant (doesn't need to be checked, except maybe in __init__and in
self.check_format).

Reply to this email directly or view it on GitHubhttps://github.com/scipy/scipy/pull/3254#issuecomment-33727268
.

Contributor

jnothman commented Jan 31, 2014

That's a whole lot of failures, and some are truly broken (abs, add_sub, bool, minmax, sparse_format_conversions, unary_ufunc_overrides; in CSR/C: sparse boolean indexing, broadcast element-wise multiply, inverse, solve and getnnz_axis).

I'll rebase on master and try the changes to min/max.

Contributor

jnothman commented Jan 31, 2014

It'll be nice to remove many of the known failures when #3233 is merged, but those cases largely throw an error at the moment, while other cases will silently return the wrong values, and at a minimum should have comments to note this fact.

Coverage Status

Coverage remained the same when pulling bd391ab on jnothman:test_noncanonical_sparse into 4844c63 on scipy:master.

Contributor

jnothman commented Jan 31, 2014

It turns out add_sub and mu were my fault for not handling uints.

Owner

pv commented Jan 31, 2014

The remaining test_mu failures are due to use of assert_array_almost_equal. This function uses absolute tolerances, and it is best to never use it.
assert_allclose is a better alternative.

Coverage Status

Coverage remained the same when pulling faedf01 on jnothman:test_noncanonical_sparse into 233ad82 on scipy:master.

Coverage Status

Coverage remained the same when pulling faedf01 on jnothman:test_noncanonical_sparse into 233ad82 on scipy:master.

pv merged commit 2b1c323 into scipy:master Feb 1, 2014

1 check passed

default The Travis CI build passed
Details
Owner

pv commented Feb 1, 2014

Thanks, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment