-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand graph.Graph constructors and other functionality #544
Conversation
dist = tree.sparse_distance_matrix(tree, threshold, output_type="ndarray") | ||
return sparse.csr_array((dist["v"], (dist["i"], dist["j"]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a trick how to get a sparse array instead of a sparse matrix out of sparse_distance_matrix
. I suppose that scipy will have sparse_distance_array
eventually but this is fairly cheap anyway.
libpysal/graph/base.py
Outdated
adjacency["weight"] = ( | ||
adjacency["weight"].fillna(0).replace([np.inf, -np.inf], 0) | ||
) # handle isolates | ||
|
||
# drop diagnoal | ||
counts = adjacency.index.value_counts() | ||
no_isolates = counts[counts > 1] | ||
adjacency = adjacency[ | ||
~( | ||
adjacency.index.isin(no_isolates.index) | ||
& (adjacency.index == adjacency.neighbor) | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These bits should ideally be handled within _kernel
and properly. This does not always work.
Codecov Report
@@ Coverage Diff @@
## geographs #544 +/- ##
===========================================
+ Coverage 80.8% 81.1% +0.3%
===========================================
Files 125 127 +2
Lines 14351 14549 +198
===========================================
+ Hits 11595 11803 +208
+ Misses 2756 2746 -10
|
Yep, this is exactly what I had in mind! |
This one is ready for review. Not everything properly works (isolates in kernel-based stuff is sketchy in some and plain wrong in other cases) but I'd leave that for follow-ups. I've left TODO comments for those. I will be leaving for a week on Saturday and will not be able to respond to anything, so it would be best if this is merged by Saturday morning (my time) to ensure I am not a blocker. Tests are mostly checking dtypes and expected shapes but not necessarily values and all possible paths. That will need to be done as follow-ups, especially for constructors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can tell, considering your comments above.
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
neighbor_ix_flat = neighbor_ix.flatten() | ||
D_linear_flat = D_linear.flatten() | ||
if metric == "haversine": | ||
D_linear_flat * 6371 # express haversine distances in kilometers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a note to the docstrings of exposed functions that the returned haversine
distance is in kilometers (as opposed to meters)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For haversine, we need more than that. It requires lat lan coords so we need to check if the gdf is in the correct CRS (or None) and if an array is given, that it is bounded by -180,180 and -90,90. I'll add this as a todo note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I implemented this so it would do the "right" thing when a geographically-literate user sends lon,lat coordinates, since sklearn haversine expects coordinates in radian lat,lon.
Thinking of it now, though, it'd be great to have a way to say "take my coordinates as is and use haversine" since deg2rad()
will instantiate a new array if the input coordinates are a numpy.memmap
... maybe the way around this is to allow for tree-based input in coordinates
.
neighbor_ix_flat = neighbor_ix.flatten() | ||
D_linear_flat = D_linear.flatten() | ||
if metric == "haversine": | ||
D_linear_flat * 6371 # express haversine distances in kilometers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I implemented this so it would do the "right" thing when a geographically-literate user sends lon,lat coordinates, since sklearn haversine expects coordinates in radian lat,lon.
Thinking of it now, though, it'd be great to have a way to say "take my coordinates as is and use haversine" since deg2rad()
will instantiate a new array if the input coordinates are a numpy.memmap
... maybe the way around this is to allow for tree-based input in coordinates
.
ids=ids, | ||
bandwidth=alpha, | ||
) | ||
sp.setdiag(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessarily valid, but I can understand why this is here. Many kernel weights require self-weighting. It would be good to provide a way to control this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was likely related to some isolate handling. That is still to-do as a follow up so I'll just add a note here to reconsider this.
self._adjacency.index.map(self.id2i), | ||
self._adjacency.neighbor.map(self.id2i), | ||
self._adjacency.index.map(self._id2i), | ||
self._adjacency.neighbor.map(self._id2i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is id2i how we're doing the lookups now from name to index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. do you have a better suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no! I was just wondering, given I'll need a way to deal with this when re-ordering the adjacency list after expanding cliques.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 things here from my end:
- Regarding the public vs. private discussion for
id2i
, was this talked about anywhere else besides here? Sinceid2i
is currently exposed in theW
implementation, I am curious what the argument is for making it private in the newGraph
implementation. - Also, I wanted to bring up the idea again to discuss the potential usefulness of an
i2id
property (or_i2id
if we stick with the current private schema). Currently that translated lookup is only used withinasymmetry
, but I am wondering if anyone can think of other situations where it might be helpful? If we will be using a translated lookup anywhere else, then another property is surely warranted IMHO.
I would like to merge this so that I can rebase #548... every triangulation can pass through kernel, so we need the kernel implementation stabilised before I can finish up the triangulation stuff. @jGaboardi your review doesn't have any further outstanding requests, right? just questions about |
@ljwolf Go ahead. We can open issues for API discussions. |
Very much WIP but wanted to open this for discussion. This partially handles isolates but not always. Distance band and knn are used to compute distance matrix that is then (optionally) passed via kernel.
@ljwolf I am not sure if this is the implementation you had in mind, but this seems to also work well.
includes:
Graph.build_distance_band
edit: Also includes:
Graph.build_block_contiguity()
graph.read_parquet
andGraph.to_parquet
mirroring (geo)pandasGraph.lag()
__getitem__