Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make COO canonical in the constructor. #141

Merged
merged 5 commits into from Apr 26, 2018

Conversation

@hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Apr 23, 2018

Based on recommendations in scipy/scipy#8162 (comment) I believe that we should make COO always be canonical.

This has several advantages:

  • Nobody really wants the non-canonical version.
  • We don't have to keep track of when or if it's ever canonical.
  • The x.sum_duplicates() calls just disappear.
  • The string representation becomes simpler. The sorted=?, duplicates=? was just confusing to someone who didn't know how COO really worked.
  • The object is now truly immutable.
@hameerabbasi hameerabbasi requested a review from mrocklin Apr 23, 2018
@codecov-io
Copy link

@codecov-io codecov-io commented Apr 23, 2018

Codecov Report

Merging #141 into master will decrease coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #141     +/-   ##
=========================================
- Coverage   95.91%   95.81%   -0.1%     
=========================================
  Files          10       10             
  Lines        1223     1195     -28     
=========================================
- Hits         1173     1145     -28     
  Misses         50       50
Impacted Files Coverage Δ
sparse/dok.py 93.57% <ø> (ø) ⬆️
sparse/coo/indexing.py 100% <ø> (ø) ⬆️
sparse/coo/umath.py 96.66% <100%> (-0.05%) ⬇️
sparse/coo/common.py 96.11% <100%> (-0.11%) ⬇️
sparse/utils.py 97.14% <100%> (-0.06%) ⬇️
sparse/coo/core.py 93.67% <100%> (-0.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2913d7...8fc71e7. Read the comment docs.

@hameerabbasi hameerabbasi force-pushed the hameerabbasi:always-canonical branch from 2dc7322 to d4564f8 Apr 23, 2018
@hameerabbasi
Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Apr 23, 2018

@nils-werner This should address your concerns in #58. You're welcome to review as well.

@mrocklin
Copy link
Collaborator

@mrocklin mrocklin commented Apr 23, 2018

I don't have enough experience with performance applications to know if this is a good or a bad idea performance-wise in the common case. I recommend that we ping the original author of the comment, @perimosocordiae , for his thoughts. It might also be good to ask for more information and possibly for a wider viewpoint from some of the author maintainers of the library. Perhaps this is a good use of the scipy-dev mailing list? cc @stefanv

@hameerabbasi
Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Apr 23, 2018

This isn't meant to improve performance (since we already skip these steps when already done for an object). I believe @perimosocordiae was arguing maintenance wise. We don't have all the extra fluff in the object, and don't have to think "is it canonical now?" because it always will be.

@mrocklin
Copy link
Collaborator

@mrocklin mrocklin commented Apr 23, 2018

@perimosocordiae
Copy link

@perimosocordiae perimosocordiae commented Apr 23, 2018

I think this is a good idea. At least with scipy's sparse matrices, so many operations require canonical (or at least duplicate-summed) matrices that the odds of constructing a COO matrix with dups and never indirectly calling sum_duplicates are very low. This approach front-loads that work, which makes performance a lot simpler to reason about. It also makes maintenance easier, like @hameerabbasi commented.

The downside is that you're explicitly disallowing some uses of the object, like representing a multi-graph. I'd argue that those users are better served by an object designed for that purpose, though, and since this package is very new you're unlikely to break anyone's existing code.

Pinging the scipy-dev list for objections is a good idea.

@hameerabbasi
Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Apr 23, 2018

I posted it to the Scipy-Dev mailing list. Link to thread.

@hameerabbasi
Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Apr 25, 2018

Allowing another day for comments and then merging, in the absence of objections.

@mrocklin
Copy link
Collaborator

@mrocklin mrocklin commented Apr 25, 2018

@shoyer
Copy link
Member

@shoyer shoyer commented Apr 26, 2018

Just to be clear, this "canonicalizes" COO arrays, right? So you an still provide non-canonical inputs to the COO constructor, and this will convert them into canonical form?

If so, this makes complete sense to me.

@hameerabbasi
Copy link
Collaborator Author

@hameerabbasi hameerabbasi commented Apr 26, 2018

Yes, that's what it does. In addition, you can pass two flags (as before), sorted and has_duplicates which signify that canonicalization should be skipped. Technically, you could pass invalid values to keep things in non-canonical form, but it might break things (as it already did before).

@hameerabbasi hameerabbasi changed the title Make COO always canonical Make COO canonical in the constructor. Apr 26, 2018
@hameerabbasi hameerabbasi merged commit 8f2a9ae into pydata:master Apr 26, 2018
4 checks passed
4 checks passed
ci/circleci: build_27 Your tests passed on CircleCI!
Details
ci/circleci: build_36 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.91%)
Details
codecov/project Absolute coverage decreased by -0.09% but relative coverage increased by +4.08% compared to b2913d7
Details
@hameerabbasi hameerabbasi deleted the hameerabbasi:always-canonical branch Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants