Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: sparse: Add copy parameter to all .toXXX() methods in sparse matrices #5829

Merged

Conversation

musically-ut
Copy link
Contributor

This closes #5822.

I have added copy=False as the default arguments for all .to??? conversion methods in sparse matrices. The only places where I left the defaults in place were bsr.py and csr.py's .tobsr method.

All tests pass and no extra functionality has been added, requiring no additional unit-tests.

@codecov-io
Copy link

@@            master   #5829   diff @@
======================================
  Files          238     238       
  Stmts        43658   43688    +30
  Branches      8197    8197       
  Methods          0       0       
======================================
+ Hit          34086   34114    +28
  Partial       2602    2602       
- Missed        6970    6972     +2

Review entire Coverage Diff as of a5c0f52

Powered by Codecov. Updated on successful CI builds.

@perimosocordiae
Copy link
Member

Looks good to me. I don't like that there are some copy=True defaults, but those will be harder to change without breaking old code.

I'm +1 to merge, but I'll wait a bit to see if anyone else has comments first.

@larsmans
Copy link
Contributor

Some of these methods have no docstrings, and the ones that do promise a copy. We should document that copy=True might cause sharing of data/indices.

To prevent copy-pasting docstrings everywhere, I guess we could...

  • rename all these methods to _tocsr etc.
  • put the public interfaces with docstrings on the base class, spmatrix, and call the underscore versions from there.

@perimosocordiae perimosocordiae changed the title EHN: Add copy parameter to all .to methods in sparse matrices. ENH: sparse: Add copy parameter to all .toXXX() methods in sparse matrices Feb 11, 2016
@musically-ut
Copy link
Contributor Author

@larsmans Unfortunately, not all matrices support all the .toXXX methods, e.g. BSR matrices do not have a .tolil method. I'd rather not throw a run-time error when that happens.

@perimosocordiae
Copy link
Member

I'd rather not add an extra layer of method call indirection to all of these methods. Perhaps we can do something like the suggestions in http://stackoverflow.com/q/8100166/10601 ?

@perimosocordiae
Copy link
Member

Specifically, we could do something as simple as:

# in coo.py, for example, after the class definition
vars(coo_matrix)['tocsr'].__doc__ = spmatrix.tocsr.__doc__

There's probably a better way to do this, but that's the general idea. I wouldn't be surprised if numpy already has a mechanism for this sort of thing.

@musically-ut
Copy link
Contributor Author

A cursory look at the documentation guide does not indicate that such a thing is possible or usual.

Also, I may be flawed in my understanding, but if we define .toXXX methods to spmatrix to copy over documentation, then aren't we adding an extra layer of indirection?

In any case, what should be the next steps?

@larsmans
Copy link
Contributor

Also, I may be flawed in my understanding, but if we define .toXXX methods to spmatrix to copy over documentation, then aren't we adding an extra layer of indirection?

We are. I guess @perimosocordiae's solution is simpler. The sparse matrix classes already have some example of __doc__ copying, I think.

@musically-ut
Copy link
Contributor Author

@larsmans That includes stubbing methods which do not exist across all the matrices (e.g. .tolil() on BSR matrices)? I think throwing a NotImplementedError by default would be the best option in such cases if we are going that route.

Also, indeed, doc copying happens in spmatrix; I have found an example.

@perimosocordiae
Copy link
Member

@musically-ut That looks like a good template to follow.

def tocsr(self):
return self.tocoo(copy=False).tocsr()
def tocsr(self, copy=False):
return self.tocoo(copy=copy).tocsr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter tocsr always makes a copy, so it seems to me the call to tocoo can always have copy=False as it did before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iow, return self.tocoo(copy=False).tocsr(copy=copy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was tempted to change this to:

if copy:
    # One of the following depending on which is more efficient:
    # return self.tocoo(copy=False).tocsr(copy=True)
    return self.tocoo(copy=True).tocsr(copy=False)
else:
    return self.tocoo(copy=False).tocsr(copy=False)

Also, I am not sure I see a reason why copy=False should be the default.
I'll be very surprised if it effects the correctness of the copying. (Does it?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @pv is saying here is that due to the current implementation of coo_matrix.tocsr(), it will always produce a copy no matter what. So doing self.tocoo(copy=False) is always acceptable, so long as you then convert to CSC/CSR format.

He also suggests using the copy kwarg in the final .tocsr() call, because that will protect this method from causing trouble in the future if the CSR/CSC conversion changes to allow non-copying behavior.

The same goes for the BSR->COO->CSC conversion below.

@perimosocordiae
Copy link
Member

@musically-ut I'd like to merge this soon, but I think it should include the suggested changes for BSR's tocsc and tocsr methods. Do you have time to make those changes?

@musically-ut
Copy link
Contributor Author

@perimosocordiae Sorry about the unclean rebase, but could you tell me if the last two commits are in line the kind of changes you had in mind?

If so, I can base it off master, create a new branch and make another PR.

~
ut

PS: I was seeing an odd test failure with lgmres on my local machine after merging master back in.

@perimosocordiae
Copy link
Member

Yeah, the last two commits look pretty good, minus some small nitpicks. It should be possible to clean up the history in the existing branch and force push, but you're free to close this and make a new PR if that's easier on your end.

@musically-ut
Copy link
Contributor Author

Great! I have fixed the typos (except one indicies in csr.h), squashed the two into a single commit and redone the rebase. This should be ready to merge.

PS: There was one other marginally unrelated test failure on my local computer in test_basic.TestLstsq after the rebase.

With copy=False, the data/indices may be shared between this matrix and
the resultant dia_matrix.
"""
return self.tocoo(copy=False).todia(copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bug: one of these should have copy=copy

@musically-ut
Copy link
Contributor Author

I've taken care of the issues raised. test_sparse_format_conversions actually tests out all the conversions, so there isn't need for more tests. Otherwise, I would like some ideas on how should this code be tested in isolation.

Also, should I work on including the benchmarks as a part of this PR itself?

Note that I am planning on having a more comprehensive set of benchmarks at musically-ut/scipy-sparse-benchmarks anyway.

perimosocordiae added a commit that referenced this pull request Mar 7, 2016
ENH: sparse: Add copy parameter to all .toXXX() methods in sparse matrices
@perimosocordiae perimosocordiae merged commit ce4d74a into scipy:master Mar 7, 2016
@perimosocordiae
Copy link
Member

Thanks @musically-ut, merged. We can add benchmarks in a separate PR.

@rgommers rgommers added this to the 0.18.0 milestone Mar 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EHN: Making .to*() functions for sparse matrices consistent.
6 participants