-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intercept Coincident points #548
Conversation
I've implemented the jitter solution using displacements in a circle with a radius of the After some simulations, I don't see any improvement in the agreement of import geopandas, geodatasets, numpy
from libpysal.graph._triangulation import _delaunay,
from tqdm import trange
gdf = (
geopandas.read_file(geodatasets.get_path("geoda groceries"))
.explode(index_parts=False)
.reset_index(drop=True)
)
coincident = geopandas.pd.concat((gdf.head(5), gdf.head(2), gdf.head(1)), axis=0).reset_index(drop=True)
coincident.index = (
coincident.index.astype(str).str.rjust(3, '0')
+ '-'
+ coincident.Chain.str.replace(" ", "").str.replace("/", "")
)
jgraph = _delaunay(coincident.geometry, coincident='jitter')
def iou(a,b, normalized=True):
n_together = len(a.intersection(b))
if normalized:
return n_together / len(a.union(b))
else:
return n_together
reps = [set(zip(*jgraph[0:2]))] + [set(zip(*_delaunay(coincident.geometry, coincident='jitter')[0:2])) for _ in trange(999)]
sims_lt = [iou(reps[i], reps[j]) for i in trange(1000) for j in range(i+1)]
sims = numpy.empty((1000,1000))
sims[*numpy.tril_indices(1000)] = sims_lt
sims += sims.T
numpy.fill_diagonal(sims, 1) |
Codecov Report
@@ Coverage Diff @@
## geographs #548 +/- ##
===========================================
- Coverage 80.8% 80.7% -0.1%
===========================================
Files 125 125
Lines 14351 14405 +54
===========================================
+ Hits 11595 11628 +33
- Misses 2756 2777 +21
|
Great that we're doing |
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
Just for reference, this is my solution to sorting. df = pd.DataFrame({"focal": heads, "neighbor": tails})
# df = df.sample(frac=1).reset_index(drop=True)
sorted_index = df.applymap(coincident.index.unique().tolist().index).sort_values(["focal", "neighbor"]).index
return heads[sorted_index], tails[sorted_index], weights[sorted_index] |
|
yep. I think a similar approach would be useful for KNN/distance band? rather than doing knn truncation in kernel for precomputed weights, we make knn and distance band similar to the triangulators: ways to build a generic distance matrix, which then gets passed to kernel for weighting. it keeps the behaviour very consistent, and reduces the amount of unique class/constructor-based code. |
thats ideal because it allows
where distance_matrix is a travel time matrix from routingpy or pandana |
Indeed, already can with _kernel(distance_matrix, metric="precomputed", kernel="gaussian") so I think Graph.build_kernel(distance_matrix, metric="precomputed", kernel="gaussian") would work. The simplification would be that kernel wouldnt take a "k" argument, because that is a knn-style operation, so Graph.build_kernel(distance_matrix, k=5, kernel="gaussian") becomes Graph.build_knn(distance_matrix, k=5, kernel="gaussian") |
I am not sure if I am a big fan of this decorator approach. It does not result in the most legible code.
This would result in inconsistent API. |
got it, so the appropriate way is
equivalent to
building kernel from the sparse versus building kernel from the implied distances in the gdf.coordinates |
Yes, that is how it currently works. But I can also imagine Graph.from_sparse(distance_matrix).transform(transformation="kernel", threshold=2000, kernel='gaussian') |
i think we need a |
or using |
that assumes they are sorted the same way |
(the within-focal sorting of the neighbors may be different for each focal, so you wont get the right sparse) |
True. Then the constructor method may be a wise thing to do. |
update: looks like i was using old geographs code. This mostly works now (see updated gist. The preview refuses to update, click the actual link) https://gist.github.com/knaaptime/60bcf942508a47f6028ab1bfd1f3fa26 only thing is the user has to know how to reshape the adjlist to matrix and reindex the rows/cols. |
I think it is a good idea to have a constructor where we do this sorting ourselves. We just need to be clear that this happens as we won't have the original df as a reference as with the |
im happy to add a But this still doesnt solve the problem raised above. To get a kernel weight based on precomputed network distances, a user still has to manually convert to a matrix from an adjcency. Technically you can do the signature
with the current api, its hard for us to know an adjlist has been passed that needs to be reindexed? |
as of this commit, we're always reordering the inner index, (we take the row index as given and reorder the column index in the sparse), right? so i think that's just a stipulation of the Graph in general? |
That function is not used yet. I need to look into that and figure out if that's actually true in all cases. I think it is but don't want to say it for sure before testing. |
ok, let me know what you find. After working through this in #513, i ended up pretty confident that we always need to reorder columns to match row order |
This is my start at intercepting coincident points. I still need to
new_adj
table according to the derived original ids_triangulation.py
as a wrapper function? I think we could do this easily by intercepting the input & output of the wrapped function.implement for knn weightsI will hold off on implementing these for
_distance.py
until @martinfleis's changes land and we decide if the decorator strategy is the way to go.