Adopt numpy-hash approach to find duplicates #82

mattijn · 2020-05-15T19:17:46Z

This PR implements the method as prototyped in #78.

The method collects the hash identifiers of sorted linestring coordinates through tuples. With this approach every linestring can be represented as an equal length integer.

One side-effect are rotated rings. Using the shapely shared_paths function for junction detection these are not equal anymore as the hash is not the same. With a ring, the first and last coordinate are the same, but if the ring is rotated than these first and last coordinate are not the same anymore and so not the hash.

With the dictlist coords approach the first/last coordinated of a ring is seen as a junction so a rotated polygon is split in two duplicate segments.

Eg. test https://github.com/mattijn/topojson/blob/master/tests/test_cut.py#L58:L66 and two more were changed to capture this new behavior.

adtop numpy-hash function to find duplicates

d33bcbb

mattijn merged commit 945d23b into master May 15, 2020

mattijn deleted the replace-find-duplicates branch May 15, 2020 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt numpy-hash approach to find duplicates #82

Adopt numpy-hash approach to find duplicates #82

mattijn commented May 15, 2020

Adopt numpy-hash approach to find duplicates #82

Adopt numpy-hash approach to find duplicates #82

Conversation

mattijn commented May 15, 2020