Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coindicent tests #554

Merged
merged 3 commits into from Sep 1, 2023
Merged

Coindicent tests #554

merged 3 commits into from Sep 1, 2023

Conversation

martinfleis
Copy link
Member

This includes proper sorting within triangulation and tests for that.

I believe that we need to sort only by focal and only in cases where the constructor has a tendency to mess up the order. The rest shall always give equal results, so no more sorting shall be needed.

@ljwolf are cliques finished? The implementation does not seem to work, the resulting adjacency is always much shorter than it should be (even shorter than the one before inducing cliques, so something is wrong).

I have also included tests for cliques but since those are failing, they're now marked with xfail. When the implementation is fixed, they should pass (maybe a minor reorder of expected tails will be needed).

@knaaptime on the sorting, I will need to be convinced that we always need to sort by the use case where not sorting everything by default causes trouble.

@ljwolf
Copy link
Member

ljwolf commented Aug 28, 2023

It should have been complete. The result wasn't incomplete on the test set I posted in the PR? I don't quite know what would be missing if it's now not working.

@@ -230,7 +230,6 @@ def _distance_band(coordinates, threshold, ids=None):
tree = spatial.KDTree(coordinates)
dist = tree.sparse_distance_matrix(tree, threshold, output_type="ndarray")
sp = sparse.csr_array((dist["v"], (dist["i"], dist["j"])))
sp.eliminate_zeros()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be completely removed or perhaps skipped by default and performed on conditional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from here. We deal with the diagonal later. Though that logic should maybe happen here...

You would add a keyword specifying if self-weight shall be included?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like the taper keyword that is happening in _kernel()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@knaaptime
Copy link
Member

@knaaptime on the sorting, I will need to be convinced that we always need to sort by the use case where not sorting everything by default causes trouble.

i'm with you, but maybe we should revisit our design goals now that this has come full circle. When we originally sketched this out, we were trying to hide issues of sorting to the user, since conceptually the graph is unordered. The goal was that

not supporting any sorting whatsoever will make sure that the input order is never broken, so long as we can avoid pandas's predilection for it... pysal/libpysal@geographs/libpysal/weights/experimental/base.py#L216

but it's clear there are cases where we need to sort because the input order is sometimes broken. So we've implemented sorting in cases like inducing cliques, where we need to know the sort order to expand the neighbor set. In that case, the goal is to respect the focal index and then reorder the neighbors slightly, so we can manage the data internally. The neighbor sorting is invisible to the user, and it happens without notification, because the presumption is the user really doesnt care about the neighbor sorting (within each focal)... this part of the graph is unordered from the users' perspective because the only time alignment matters is when converting to a matrix representation, and they dont really need to encounter that part.

So what i'm getting at is, in some cases we're saying the user doesnt need to know about this part, and we'll handle it the best way, but in principle "[we] will need to be convinced that we always need to sort by the use case where not sorting everything by default causes trouble."

so, in some cases "we're handling it, because it doesnt matter to the user", but in other cases, "the user is correct and their data should not be manipulated".

personally, i think its preferable to say "we're always handling the inner index (order of the neighbors)" because the graph is unordered, except with respect to focal. If it's unordered from the user's perspective, then internally we should just be consistent about reshaping into the best representation (in which case, i'd say we just commit to sorting the neighbors so we have an actual policy)

@knaaptime
Copy link
Member

(to me, the best way to resolve the ordered/unordered compromise is to say "the focal index is the only order that matters". Since the neighbor order is arbitrary, both levels of the graphy are always ordered along the focal index. That gives a consistent policy with the best data layout)

@martinfleis
Copy link
Member Author

I have added sorting on ['focal', 'neighbor'] to ensure this passes and can be merged. We can potentially refactor depending on the result of discussions in #534.

@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Merging #554 (4c46e9c) into geographs (6a4c004) will increase coverage by 0.0%.
The diff coverage is 80.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           geographs    #554    +/-   ##
==========================================
  Coverage       81.1%   81.2%            
==========================================
  Files            127     127            
  Lines          14724   14839   +115     
==========================================
+ Hits           11947   12047   +100     
- Misses          2777    2792    +15     
Files Changed Coverage Δ
libpysal/graph/_kernel.py 65.1% <ø> (-0.3%) ⬇️
libpysal/graph/tests/test_triangulation.py 84.4% <78.4%> (-8.2%) ⬇️
libpysal/graph/_triangulation.py 98.0% <100.0%> (ø)
libpysal/graph/base.py 95.9% <100.0%> (+0.1%) ⬆️
libpysal/graph/tests/test_base.py 100.0% <100.0%> (ø)

... and 1 file with indirect coverage changes

@knaaptime
Copy link
Member

im gonna go ahead and merge this and explore the sparse input stuff

@knaaptime knaaptime merged commit bdacbea into pysal:geographs Sep 1, 2023
9 checks passed
@martinfleis martinfleis deleted the coindicent_tests branch September 1, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants