New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make COO canonical in the constructor. #141

Merged
merged 5 commits into from Apr 26, 2018

Conversation

Projects
None yet
5 participants
@hameerabbasi
Collaborator

hameerabbasi commented Apr 23, 2018

Based on recommendations in scipy/scipy#8162 (comment) I believe that we should make COO always be canonical.

This has several advantages:

  • Nobody really wants the non-canonical version.
  • We don't have to keep track of when or if it's ever canonical.
  • The x.sum_duplicates() calls just disappear.
  • The string representation becomes simpler. The sorted=?, duplicates=? was just confusing to someone who didn't know how COO really worked.
  • The object is now truly immutable.

@hameerabbasi hameerabbasi requested a review from mrocklin Apr 23, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Apr 23, 2018

Codecov Report

Merging #141 into master will decrease coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #141     +/-   ##
=========================================
- Coverage   95.91%   95.81%   -0.1%     
=========================================
  Files          10       10             
  Lines        1223     1195     -28     
=========================================
- Hits         1173     1145     -28     
  Misses         50       50
Impacted Files Coverage Δ
sparse/dok.py 93.57% <ø> (ø) ⬆️
sparse/coo/indexing.py 100% <ø> (ø) ⬆️
sparse/coo/umath.py 96.66% <100%> (-0.05%) ⬇️
sparse/coo/common.py 96.11% <100%> (-0.11%) ⬇️
sparse/utils.py 97.14% <100%> (-0.06%) ⬇️
sparse/coo/core.py 93.67% <100%> (-0.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2913d7...8fc71e7. Read the comment docs.

@hameerabbasi hameerabbasi force-pushed the hameerabbasi:always-canonical branch from 2dc7322 to d4564f8 Apr 23, 2018

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Apr 23, 2018

@nils-werner This should address your concerns in #58. You're welcome to review as well.

@mrocklin

This comment has been minimized.

Collaborator

mrocklin commented Apr 23, 2018

I don't have enough experience with performance applications to know if this is a good or a bad idea performance-wise in the common case. I recommend that we ping the original author of the comment, @perimosocordiae , for his thoughts. It might also be good to ask for more information and possibly for a wider viewpoint from some of the author maintainers of the library. Perhaps this is a good use of the scipy-dev mailing list? cc @stefanv

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Apr 23, 2018

This isn't meant to improve performance (since we already skip these steps when already done for an object). I believe @perimosocordiae was arguing maintenance wise. We don't have all the extra fluff in the object, and don't have to think "is it canonical now?" because it always will be.

@mrocklin

This comment has been minimized.

Collaborator

mrocklin commented Apr 23, 2018

@perimosocordiae

This comment has been minimized.

perimosocordiae commented Apr 23, 2018

I think this is a good idea. At least with scipy's sparse matrices, so many operations require canonical (or at least duplicate-summed) matrices that the odds of constructing a COO matrix with dups and never indirectly calling sum_duplicates are very low. This approach front-loads that work, which makes performance a lot simpler to reason about. It also makes maintenance easier, like @hameerabbasi commented.

The downside is that you're explicitly disallowing some uses of the object, like representing a multi-graph. I'd argue that those users are better served by an object designed for that purpose, though, and since this package is very new you're unlikely to break anyone's existing code.

Pinging the scipy-dev list for objections is a good idea.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Apr 23, 2018

I posted it to the Scipy-Dev mailing list. Link to thread.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Apr 25, 2018

Allowing another day for comments and then merging, in the absence of objections.

@mrocklin

This comment has been minimized.

Collaborator

mrocklin commented Apr 25, 2018

@shoyer

This comment has been minimized.

Member

shoyer commented Apr 26, 2018

Just to be clear, this "canonicalizes" COO arrays, right? So you an still provide non-canonical inputs to the COO constructor, and this will convert them into canonical form?

If so, this makes complete sense to me.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Apr 26, 2018

Yes, that's what it does. In addition, you can pass two flags (as before), sorted and has_duplicates which signify that canonicalization should be skipped. Technically, you could pass invalid values to keep things in non-canonical form, but it might break things (as it already did before).

@hameerabbasi hameerabbasi changed the title from Make COO always canonical to Make COO canonical in the constructor. Apr 26, 2018

@hameerabbasi hameerabbasi merged commit 8f2a9ae into pydata:master Apr 26, 2018

4 checks passed

ci/circleci: build_27 Your tests passed on CircleCI!
Details
ci/circleci: build_36 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.91%)
Details
codecov/project Absolute coverage decreased by -0.09% but relative coverage increased by +4.08% compared to b2913d7
Details

@hameerabbasi hameerabbasi deleted the hameerabbasi:always-canonical branch Apr 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment