Intercept Coincident points #548

ljwolf · 2023-08-10T17:19:40Z

This is my start at intercepting coincident points. I still need to

implement the jitter solution
re-order the new_adj table according to the derived original ids
(possibly) implement these checks in _triangulation.py as a wrapper function? I think we could do this easily by intercepting the input & output of the wrapped function.
~~implement for knn weights~~
tests

I will hold off on implementing these for _distance.py until @martinfleis's changes land and we decide if the decorator strategy is the way to go.

… instead

libpysal/graph/_utils.py

ljwolf · 2023-08-11T12:04:59Z

I've implemented the jitter solution using displacements in a circle with a radius of the resolution of the input dtype (if a float), or the resolution of float32 (1e-6) if the input are integers or float168. This should be the smallest possible movement required to de-dupe points.

After some simulations, I don't see any improvement in the agreement of _delaunay() for smaller displacements (up to 1e-22): for anything at or below this resolution value, the agreement between any two jittered tessellations is about 68%.... not very stable.

import geopandas, geodatasets, numpy
from libpysal.graph._triangulation import _delaunay, 
from tqdm import trange

gdf = (
    geopandas.read_file(geodatasets.get_path("geoda groceries"))
    .explode(index_parts=False)
    .reset_index(drop=True)
)

coincident = geopandas.pd.concat((gdf.head(5), gdf.head(2), gdf.head(1)), axis=0).reset_index(drop=True)
coincident.index = (
    coincident.index.astype(str).str.rjust(3, '0') 
    + '-' 
    + coincident.Chain.str.replace(" ", "").str.replace("/", "")
    )

jgraph = _delaunay(coincident.geometry, coincident='jitter')

def iou(a,b, normalized=True):
    n_together = len(a.intersection(b))
    if normalized:
        return n_together / len(a.union(b))
    else:
        return n_together

reps = [set(zip(*jgraph[0:2]))] + [set(zip(*_delaunay(coincident.geometry, coincident='jitter')[0:2])) for _ in trange(999)]
sims_lt = [iou(reps[i], reps[j]) for i in trange(1000) for j in range(i+1)]
sims = numpy.empty((1000,1000))
sims[*numpy.tril_indices(1000)] = sims_lt
sims += sims.T
numpy.fill_diagonal(sims, 1)

codecov · 2023-08-11T13:27:06Z

Codecov Report

Merging #548 (ca5ec9a) into geographs (e579820) will decrease coverage by 0.1%.
The diff coverage is 56.5%.

@@             Coverage Diff             @@
##           geographs    #548     +/-   ##
===========================================
- Coverage       80.8%   80.7%   -0.1%     
===========================================
  Files            125     125             
  Lines          14351   14405     +54     
===========================================
+ Hits           11595   11628     +33     
- Misses          2756    2777     +21

Files Changed	Coverage Δ
libpysal/graph/_utils.py	`41.0% <35.7%> (-1.7%)`	⬇️
libpysal/graph/_triangulation.py	`91.8% <62.5%> (-7.4%)`	⬇️
libpysal/graph/base.py	`97.9% <100.0%> (ø)`
libpysal/graph/tests/test_base.py	`100.0% <100.0%> (ø)`
libpysal/graph/tests/test_builders.py	`100.0% <100.0%> (ø)`

... and 1 file with indirect coverage changes

ljwolf · 2023-08-11T18:45:20Z

Great that we're doing kernel as a thin class. This makes basically all of _triangulate.py possible to implement as just building a sparse distance matrix. I have very good feelings about this API design!

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

…o rebase

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

martinfleis · 2023-08-25T16:41:09Z

Just for reference, this is my solution to sorting.

df = pd.DataFrame({"focal": heads, "neighbor": tails})
# df = df.sample(frac=1).reset_index(drop=True) 
sorted_index = df.applymap(coincident.index.unique().tolist().index).sort_values(["focal", "neighbor"]).index
return heads[sorted_index], tails[sorted_index], weights[sorted_index]

… weights

martinfleis · 2023-08-25T17:37:43Z

~~@ljwolf have you intentionally removed all the kernel functionality from triangulation here?~~ it is all in the decorator...

ljwolf · 2023-08-25T17:59:56Z

it's all in the decorator

yep. I think a similar approach would be useful for KNN/distance band? rather than doing knn truncation in kernel for precomputed weights, we make knn and distance band similar to the triangulators: ways to build a generic distance matrix, which then gets passed to kernel for weighting.

it keeps the behaviour very consistent, and reduces the amount of unique class/constructor-based code.

knaaptime · 2023-08-25T18:15:02Z

thats ideal because it allows

Graph.build_kernel(gdf, threshold=2000, kernel='gaussian') and

Graph.from_sparse(distance_matrix).build_kernel(threshold=2000, kernel='gaussian'),

where distance_matrix is a travel time matrix from routingpy or pandana

ljwolf · 2023-08-25T18:21:11Z

Indeed, already can with _kernel(distance_matrix, metric="precomputed", kernel="gaussian") so I think Graph.build_kernel(distance_matrix, metric="precomputed", kernel="gaussian") would work.

The simplification would be that kernel wouldnt take a "k" argument, because that is a knn-style operation, so Graph.build_kernel(distance_matrix, k=5, kernel="gaussian") becomes Graph.build_knn(distance_matrix, k=5, kernel="gaussian")

martinfleis · 2023-08-25T18:24:05Z

I am not sure if I am a big fan of this decorator approach. It does not result in the most legible code.

Graph.from_sparse(distance_matrix).build_kernel(threshold=2000, kernel='gaussian')

This would result in inconsistent API. build_kernel builds Graph from external object, not self. What are you looking for is transform() which should support a custom kernel. What we have there now is a subset of kernels anyway.

knaaptime · 2023-08-25T18:31:17Z

got it, so the appropriate way is

Graph.build_kernel(distance_matrix, metric="precomputed", threshold=2000, kernel="gaussian")

equivalent to

Graph.build_kernel(gdf, threshold=2000, kernel="gaussian")

building kernel from the sparse versus building kernel from the implied distances in the gdf.coordinates

martinfleis · 2023-08-25T18:36:16Z

Yes, that is how it currently works. But I can also imagine

Graph.from_sparse(distance_matrix).transform(transformation="kernel", threshold=2000, kernel='gaussian')

knaaptime · 2023-08-25T19:15:07Z

i think we need a from_adjacency. Currently the only way to build from a pandas adjlist is to give that directly to the init

martinfleis · 2023-08-25T20:08:09Z

i think we need a from_adjacency. Currently the only way to build from a pandas adjlist is to give that directly to the init

or using from_arrays(df['focal'], df['neighbor'], df['weight']) but feel free to write from_adjacency as well. That surely won't hurt.

knaaptime · 2023-08-25T20:22:04Z

or using from_arrays(df['focal'], df['neighbor'], df['weight'])

that assumes they are sorted the same way

knaaptime · 2023-08-25T20:31:22Z

(the within-focal sorting of the neighbors may be different for each focal, so you wont get the right sparse)

martinfleis · 2023-08-25T20:56:27Z

True. Then the constructor method may be a wise thing to do.

knaaptime · 2023-08-25T21:05:50Z

https://gist.github.com/knaaptime/60bcf942508a47f6028ab1bfd1f3fa26

* piping * fix stuff

knaaptime · 2023-08-25T21:39:58Z

update: looks like i was using old geographs code. This mostly works now (see updated gist. The preview refuses to update, click the actual link)

https://gist.github.com/knaaptime/60bcf942508a47f6028ab1bfd1f3fa26

only thing is the user has to know how to reshape the adjlist to matrix and reindex the rows/cols.

martinfleis · 2023-08-25T21:45:44Z

I think it is a good idea to have a constructor where we do this sorting ourselves. We just need to be clear that this happens as we won't have the original df as a reference as with the build_* constructors.

knaaptime · 2023-08-25T21:50:19Z

im happy to add a _from_adjlist constructor, cause i think we should have one anyway, and it can include the reindexing logic, which should be pretty costless.

But this still doesnt solve the problem raised above. To get a kernel weight based on precomputed network distances, a user still has to manually convert to a matrix from an adjcency. Technically you can do the signature

Graph.build_kernel(distance_matrix, metric="precomputed", threshold=2000, kernel="gaussian"), but its hard for us to help you get an adjlist into the right distance_matrix unless we allow something like Martin's transform

Graph.from_sparse(distance_matrix).transform(transformation="kernel", threshold=2000, kernel='gaussian'). I'm not sure i love that convention, but dont have a better idea at the moment

with the current api, its hard for us to know an adjlist has been passed that needs to be reindexed?

knaaptime · 2023-08-26T18:50:12Z

I think it is a good idea to have a constructor where we do this sorting ourselves. We just need to be clear that this happens as we won't have the original df as a reference as with the build_* constructors.

as of this commit, we're always reordering the inner index, (we take the row index as given and reorder the column index in the sparse), right? so i think that's just a stipulation of the Graph in general?

martinfleis · 2023-08-26T21:36:43Z

That function is not used yet. I need to look into that and figure out if that's actually true in all cases. I think it is but don't want to say it for sure before testing.

knaaptime · 2023-08-27T21:02:39Z

ok, let me know what you find. After working through this in #513, i ended up pretty confident that we always need to reorder columns to match row order

[WIP] start intercepting coincidence, probably should use a decorator…

a5c63c1

… instead

martinfleis reviewed Aug 11, 2023

View reviewed changes

libpysal/graph/_utils.py Outdated Show resolved Hide resolved

libpysal/graph/_utils.py Outdated Show resolved Hide resolved

ljwolf added 2 commits August 11, 2023 11:48

finalize decorator implementation for _triangulation

b22f2e5

finish jitter implementation

8aac540

ljwolf changed the title ~~[WIP] start intercepting coincident points~~ Intercept Coincident points Aug 11, 2023

ljwolf added the graph label Aug 11, 2023

ljwolf added 2 commits August 11, 2023 14:13

propagate coincident to build_triangulation

d0b7c0d

add reasons to assert clauses

ca5ec9a

ljwolf added 2 commits August 11, 2023 15:37

voronoi should still admit kernels

36a18e3

move to None for defaults, not boxcar/inf

2745af4

ljwolf mentioned this pull request Aug 11, 2023

expand graph.Graph constructors and other functionality #544

Merged

martinfleis and others added 16 commits August 11, 2023 20:34

add distance_band

e1b243a

channel distance band through kernel

cda1ef9

fix isolates in distance band

ab207dd

lag, parquet IO, __getitem__, block placeholder

d4f6c73

docstrings, optional pyarrow

3e20546

fix test

6ac2b7c

block contiguity

bed12a8

add tests

7671b77

to-dos

f6ecc00

Apply suggestions from code review

6f283f8

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

test block_contiguity

c25db44

test parquet without meta

3d27594

test lag

ea180bd

haversine todo note

017ef80

comment on diag

d30f63b

[WIP] work through integrations of kernel and triangulation in prep t…

1754691

…o rebase

ljwolf and others added 4 commits August 25, 2023 16:41

Update libpysal/graph/_utils.py

6bdd4bc

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

Update libpysal/graph/_utils.py

008b0e8

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

Update libpysal/graph/_utils.py

ae59ad7

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

finalize coincident checks in knn

cbf309b

ljwolf added 2 commits August 25, 2023 17:42

add the reorder table function to base, prepare for sorting inputs to…

82f3e18

… weights

handle ids is none

c4383e1

ljwolf merged commit 3c686b7 into pysal:geographs Aug 25, 2023
0 of 7 checks passed

martinfleis mentioned this pull request Aug 25, 2023

fix piping of constructors to Graph after #548 #553

Merged

martinfleis added a commit that referenced this pull request Aug 25, 2023

fix piping of constructors to Graph after #548 (#553)

4abf71b

* piping * fix stuff

martinfleis mentioned this pull request Aug 28, 2023

Weights sprint planning #534

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intercept Coincident points #548

Intercept Coincident points #548

ljwolf commented Aug 10, 2023 •

edited

ljwolf commented Aug 11, 2023 •

edited

codecov bot commented Aug 11, 2023 •

edited

ljwolf commented Aug 11, 2023

martinfleis commented Aug 25, 2023

martinfleis commented Aug 25, 2023 •

edited

ljwolf commented Aug 25, 2023

knaaptime commented Aug 25, 2023

ljwolf commented Aug 25, 2023 •

edited

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

knaaptime commented Aug 25, 2023 •

edited

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023 •

edited

knaaptime commented Aug 26, 2023

martinfleis commented Aug 26, 2023

knaaptime commented Aug 27, 2023

Intercept Coincident points #548

Intercept Coincident points #548

Conversation

ljwolf commented Aug 10, 2023 • edited

ljwolf commented Aug 11, 2023 • edited

codecov bot commented Aug 11, 2023 • edited

Codecov Report

ljwolf commented Aug 11, 2023

martinfleis commented Aug 25, 2023

martinfleis commented Aug 25, 2023 • edited

ljwolf commented Aug 25, 2023

knaaptime commented Aug 25, 2023

ljwolf commented Aug 25, 2023 • edited

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

knaaptime commented Aug 25, 2023

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023

knaaptime commented Aug 25, 2023 • edited

martinfleis commented Aug 25, 2023

knaaptime commented Aug 25, 2023 • edited

knaaptime commented Aug 26, 2023

martinfleis commented Aug 26, 2023

knaaptime commented Aug 27, 2023

ljwolf commented Aug 10, 2023 •

edited

ljwolf commented Aug 11, 2023 •

edited

codecov bot commented Aug 11, 2023 •

edited

martinfleis commented Aug 25, 2023 •

edited

ljwolf commented Aug 25, 2023 •

edited

knaaptime commented Aug 25, 2023 •

edited

knaaptime commented Aug 25, 2023 •

edited