Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand graph.Graph constructors and other functionality #544

Merged
merged 15 commits into from
Aug 11, 2023

Conversation

martinfleis
Copy link
Member

@martinfleis martinfleis commented Aug 7, 2023

Very much WIP but wanted to open this for discussion. This partially handles isolates but not always. Distance band and knn are used to compute distance matrix that is then (optionally) passed via kernel.

@ljwolf I am not sure if this is the implementation you had in mind, but this seems to also work well.

includes:

  • Graph.build_distance_band
  • reworked kernels

edit: Also includes:

  • Graph.build_block_contiguity()
  • Parquet IO as graph.read_parquet and Graph.to_parquet mirroring (geo)pandas
  • Graph.lag()
  • __getitem__

@martinfleis martinfleis added WIP Work in progress, do not merge. Discussion only. graph labels Aug 7, 2023
@martinfleis martinfleis self-assigned this Aug 7, 2023
Comment on lines +175 to +176
dist = tree.sparse_distance_matrix(tree, threshold, output_type="ndarray")
return sparse.csr_array((dist["v"], (dist["i"], dist["j"])))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a trick how to get a sparse array instead of a sparse matrix out of sparse_distance_matrix. I suppose that scipy will have sparse_distance_array eventually but this is fairly cheap anyway.

Comment on lines 566 to 578
adjacency["weight"] = (
adjacency["weight"].fillna(0).replace([np.inf, -np.inf], 0)
) # handle isolates

# drop diagnoal
counts = adjacency.index.value_counts()
no_isolates = counts[counts > 1]
adjacency = adjacency[
~(
adjacency.index.isin(no_isolates.index)
& (adjacency.index == adjacency.neighbor)
)
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bits should ideally be handled within _kernel and properly. This does not always work.

@codecov
Copy link

codecov bot commented Aug 7, 2023

Codecov Report

Merging #544 (4d01f41) into geographs (c6d8e2a) will increase coverage by 0.3%.
Report is 2 commits behind head on geographs.
The diff coverage is 97.3%.

Impacted file tree graph

@@             Coverage Diff             @@
##           geographs    #544     +/-   ##
===========================================
+ Coverage       80.8%   81.1%   +0.3%     
===========================================
  Files            125     127      +2     
  Lines          14351   14549    +198     
===========================================
+ Hits           11595   11803    +208     
+ Misses          2756    2746     -10     
Files Changed Coverage Δ
libpysal/graph/_utils.py 42.7% <ø> (ø)
libpysal/graph/_parquet.py 84.0% <84.0%> (ø)
libpysal/graph/_kernel.py 69.2% <91.3%> (+6.6%) ⬆️
libpysal/graph/_contiguity.py 97.3% <100.0%> (+0.4%) ⬆️
libpysal/graph/_spatial_lag.py 100.0% <100.0%> (ø)
libpysal/graph/base.py 98.2% <100.0%> (+0.3%) ⬆️
libpysal/graph/tests/test_base.py 100.0% <100.0%> (ø)
libpysal/graph/tests/test_builders.py 100.0% <100.0%> (ø)
libpysal/graph/tests/test_contiguity.py 100.0% <100.0%> (ø)
libpysal/graph/tests/test_spatial_lag.py 100.0% <100.0%> (ø)

... and 1 file with indirect coverage changes

@ljwolf
Copy link
Member

ljwolf commented Aug 7, 2023

Yep, this is exactly what I had in mind!

@martinfleis martinfleis changed the title expand graph.Graph kernel-based constructors expand graph.Graph constructors Aug 10, 2023
@martinfleis martinfleis changed the title expand graph.Graph constructors expand graph.Graph constructors and other functionality Aug 10, 2023
@martinfleis martinfleis removed the WIP Work in progress, do not merge. Discussion only. label Aug 10, 2023
@martinfleis martinfleis marked this pull request as ready for review August 10, 2023 18:45
@martinfleis
Copy link
Member Author

This one is ready for review. Not everything properly works (isolates in kernel-based stuff is sketchy in some and plain wrong in other cases) but I'd leave that for follow-ups. I've left TODO comments for those.

I will be leaving for a week on Saturday and will not be able to respond to anything, so it would be best if this is merged by Saturday morning (my time) to ensure I am not a blocker.

Tests are mostly checking dtypes and expected shapes but not necessarily values and all possible paths. That will need to be done as follow-ups, especially for constructors.

Copy link
Member

@jGaboardi jGaboardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell, considering your comments above.

libpysal/graph/base.py Outdated Show resolved Hide resolved
libpysal/graph/base.py Outdated Show resolved Hide resolved
libpysal/graph/base.py Outdated Show resolved Hide resolved
neighbor_ix_flat = neighbor_ix.flatten()
D_linear_flat = D_linear.flatten()
if metric == "haversine":
D_linear_flat * 6371 # express haversine distances in kilometers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note to the docstrings of exposed functions that the returned haversine distance is in kilometers (as opposed to meters)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For haversine, we need more than that. It requires lat lan coords so we need to check if the gdf is in the correct CRS (or None) and if an array is given, that it is bounded by -180,180 and -90,90. I'll add this as a todo note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I implemented this so it would do the "right" thing when a geographically-literate user sends lon,lat coordinates, since sklearn haversine expects coordinates in radian lat,lon.

Thinking of it now, though, it'd be great to have a way to say "take my coordinates as is and use haversine" since deg2rad() will instantiate a new array if the input coordinates are a numpy.memmap... maybe the way around this is to allow for tree-based input in coordinates.

neighbor_ix_flat = neighbor_ix.flatten()
D_linear_flat = D_linear.flatten()
if metric == "haversine":
D_linear_flat * 6371 # express haversine distances in kilometers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I implemented this so it would do the "right" thing when a geographically-literate user sends lon,lat coordinates, since sklearn haversine expects coordinates in radian lat,lon.

Thinking of it now, though, it'd be great to have a way to say "take my coordinates as is and use haversine" since deg2rad() will instantiate a new array if the input coordinates are a numpy.memmap... maybe the way around this is to allow for tree-based input in coordinates.

ids=ids,
bandwidth=alpha,
)
sp.setdiag(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily valid, but I can understand why this is here. Many kernel weights require self-weighting. It would be good to provide a way to control this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was likely related to some isolate handling. That is still to-do as a follow up so I'll just add a note here to reconsider this.

self._adjacency.index.map(self.id2i),
self._adjacency.neighbor.map(self.id2i),
self._adjacency.index.map(self._id2i),
self._adjacency.neighbor.map(self._id2i),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is id2i how we're doing the lookups now from name to index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. do you have a better suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no! I was just wondering, given I'll need a way to deal with this when re-ordering the adjacency list after expanding cliques.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things here from my end:

  • Regarding the public vs. private discussion for id2i, was this talked about anywhere else besides here? Since id2i is currently exposed in the W implementation, I am curious what the argument is for making it private in the new Graph implementation.
  • Also, I wanted to bring up the idea again to discuss the potential usefulness of an i2id property (or _i2id if we stick with the current private schema). Currently that translated lookup is only used within asymmetry, but I am wondering if anyone can think of other situations where it might be helpful? If we will be using a translated lookup anywhere else, then another property is surely warranted IMHO.

@ljwolf
Copy link
Member

ljwolf commented Aug 11, 2023

I would like to merge this so that I can rebase #548... every triangulation can pass through kernel, so we need the kernel implementation stabilised before I can finish up the triangulation stuff.

@jGaboardi your review doesn't have any further outstanding requests, right? just questions about id2i? If so, can I merge this?

@martinfleis
Copy link
Member Author

@ljwolf Go ahead. We can open issues for API discussions.

@ljwolf ljwolf merged commit bd228c7 into pysal:geographs Aug 11, 2023
9 checks passed
@martinfleis martinfleis deleted the more_constructors branch August 11, 2023 17:23
@ljwolf ljwolf mentioned this pull request Aug 11, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants