New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coindicent tests #554
Coindicent tests #554
Conversation
It should have been complete. The result wasn't incomplete on the test set I posted in the PR? I don't quite know what would be missing if it's now not working. |
@@ -230,7 +230,6 @@ def _distance_band(coordinates, threshold, ids=None): | |||
tree = spatial.KDTree(coordinates) | |||
dist = tree.sparse_distance_matrix(tree, threshold, output_type="ndarray") | |||
sp = sparse.csr_array((dist["v"], (dist["i"], dist["j"]))) | |||
sp.eliminate_zeros() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be completely removed or perhaps skipped by default and performed on conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed from here. We deal with the diagonal later. Though that logic should maybe happen here...
You would add a keyword specifying if self-weight shall be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like the taper
keyword that is happening in _kernel()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
i'm with you, but maybe we should revisit our design goals now that this has come full circle. When we originally sketched this out, we were trying to hide issues of sorting to the user, since conceptually the graph is unordered. The goal was that
but it's clear there are cases where we need to sort because the input order is sometimes broken. So we've implemented sorting in cases like inducing cliques, where we need to know the sort order to expand the neighbor set. In that case, the goal is to respect the focal index and then reorder the neighbors slightly, so we can manage the data internally. The neighbor sorting is invisible to the user, and it happens without notification, because the presumption is the user really doesnt care about the neighbor sorting (within each focal)... this part of the graph is unordered from the users' perspective because the only time alignment matters is when converting to a matrix representation, and they dont really need to encounter that part. So what i'm getting at is, in some cases we're saying the user doesnt need to know about this part, and we'll handle it the best way, but in principle "[we] will need to be convinced that we always need to sort by the use case where not sorting everything by default causes trouble." so, in some cases "we're handling it, because it doesnt matter to the user", but in other cases, "the user is correct and their data should not be manipulated". personally, i think its preferable to say "we're always handling the inner index (order of the neighbors)" because the graph is unordered, except with respect to focal. If it's unordered from the user's perspective, then internally we should just be consistent about reshaping into the best representation (in which case, i'd say we just commit to sorting the neighbors so we have an actual policy) |
(to me, the best way to resolve the ordered/unordered compromise is to say "the focal index is the only order that matters". Since the neighbor order is arbitrary, both levels of the graphy are always ordered along the focal index. That gives a consistent policy with the best data layout) |
I have added sorting on |
Codecov Report
@@ Coverage Diff @@
## geographs #554 +/- ##
==========================================
Coverage 81.1% 81.2%
==========================================
Files 127 127
Lines 14724 14839 +115
==========================================
+ Hits 11947 12047 +100
- Misses 2777 2792 +15
|
im gonna go ahead and merge this and explore the sparse input stuff |
This includes proper sorting within triangulation and tests for that.
I believe that we need to sort only by focal and only in cases where the constructor has a tendency to mess up the order. The rest shall always give equal results, so no more sorting shall be needed.
@ljwolf are cliques finished? The implementation does not seem to work, the resulting adjacency is always much shorter than it should be (even shorter than the one before inducing cliques, so something is wrong).
I have also included tests for cliques but since those are failing, they're now marked with xfail. When the implementation is fixed, they should pass (maybe a minor reorder of expected tails will be needed).
@knaaptime on the sorting, I will need to be convinced that we always need to sort by the use case where not sorting everything by default causes trouble.