Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make COO canonical in the constructor. #141

Merged
merged 5 commits into from Apr 26, 2018

Conversation

hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Apr 23, 2018

Based on recommendations in scipy/scipy#8162 (comment) I believe that we should make COO always be canonical.

This has several advantages:

  • Nobody really wants the non-canonical version.
  • We don't have to keep track of when or if it's ever canonical.
  • The x.sum_duplicates() calls just disappear.
  • The string representation becomes simpler. The sorted=?, duplicates=? was just confusing to someone who didn't know how COO really worked.
  • The object is now truly immutable.

@codecov-io
Copy link

codecov-io commented Apr 23, 2018

Codecov Report

Merging #141 into master will decrease coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #141     +/-   ##
=========================================
- Coverage   95.91%   95.81%   -0.1%     
=========================================
  Files          10       10             
  Lines        1223     1195     -28     
=========================================
- Hits         1173     1145     -28     
  Misses         50       50
Impacted Files Coverage Δ
sparse/dok.py 93.57% <ø> (ø) ⬆️
sparse/coo/indexing.py 100% <ø> (ø) ⬆️
sparse/coo/umath.py 96.66% <100%> (-0.05%) ⬇️
sparse/coo/common.py 96.11% <100%> (-0.11%) ⬇️
sparse/utils.py 97.14% <100%> (-0.06%) ⬇️
sparse/coo/core.py 93.67% <100%> (-0.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2913d7...8fc71e7. Read the comment docs.

@hameerabbasi
Copy link
Collaborator Author

@nils-werner This should address your concerns in #58. You're welcome to review as well.

@mrocklin
Copy link
Collaborator

I don't have enough experience with performance applications to know if this is a good or a bad idea performance-wise in the common case. I recommend that we ping the original author of the comment, @perimosocordiae , for his thoughts. It might also be good to ask for more information and possibly for a wider viewpoint from some of the author maintainers of the library. Perhaps this is a good use of the scipy-dev mailing list? cc @stefanv

@hameerabbasi
Copy link
Collaborator Author

This isn't meant to improve performance (since we already skip these steps when already done for an object). I believe @perimosocordiae was arguing maintenance wise. We don't have all the extra fluff in the object, and don't have to think "is it canonical now?" because it always will be.

@mrocklin
Copy link
Collaborator

mrocklin commented Apr 23, 2018 via email

@perimosocordiae
Copy link

perimosocordiae commented Apr 23, 2018

I think this is a good idea. At least with scipy's sparse matrices, so many operations require canonical (or at least duplicate-summed) matrices that the odds of constructing a COO matrix with dups and never indirectly calling sum_duplicates are very low. This approach front-loads that work, which makes performance a lot simpler to reason about. It also makes maintenance easier, like @hameerabbasi commented.

The downside is that you're explicitly disallowing some uses of the object, like representing a multi-graph. I'd argue that those users are better served by an object designed for that purpose, though, and since this package is very new you're unlikely to break anyone's existing code.

Pinging the scipy-dev list for objections is a good idea.

@hameerabbasi
Copy link
Collaborator Author

I posted it to the Scipy-Dev mailing list. Link to thread.

@hameerabbasi
Copy link
Collaborator Author

Allowing another day for comments and then merging, in the absence of objections.

@mrocklin
Copy link
Collaborator

mrocklin commented Apr 25, 2018 via email

@shoyer
Copy link
Member

shoyer commented Apr 26, 2018

Just to be clear, this "canonicalizes" COO arrays, right? So you an still provide non-canonical inputs to the COO constructor, and this will convert them into canonical form?

If so, this makes complete sense to me.

@hameerabbasi
Copy link
Collaborator Author

hameerabbasi commented Apr 26, 2018

Yes, that's what it does. In addition, you can pass two flags (as before), sorted and has_duplicates which signify that canonicalization should be skipped. Technically, you could pass invalid values to keep things in non-canonical form, but it might break things (as it already did before).

@hameerabbasi hameerabbasi changed the title Make COO always canonical Make COO canonical in the constructor. Apr 26, 2018
@hameerabbasi hameerabbasi merged commit 8f2a9ae into pydata:master Apr 26, 2018
@hameerabbasi hameerabbasi deleted the always-canonical branch April 26, 2018 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants