Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: duplicates with collinear points sometimes not found with shared_coords=False #211

Conversation

theroggy
Copy link
Contributor

@theroggy theroggy commented Oct 20, 2023

In the following case 2 linestrings were not correctly identified as duplicate with with shared_coords=False:
- one of them has extra point(s) that disappear with simplify(0).
- there is another line combination in the dataset that creates junctions for
shared_coords=False.

Practical case where this showed up. The combination of the above ingredients triggered the bug like this:

  • At the top right the smaller polygon touches the outer boundary of the largest polygon, so both those polygons need to be split to create the relevant topologies -> this leads to junctions being created.
  • The small polygon at the bottom left fills up an "island" in the large polygon contained an extra collinear point compared to the linearring describing the island.
    image

@theroggy
Copy link
Contributor Author

Reference orthoseg/orthoseg#134

@mattijn
Copy link
Owner

mattijn commented Oct 20, 2023

I stared at this for a while, I ran a debugger and I tried to understand why the fix in your PR solves the issue, you basically go from

lines_split.append(remove_collinear_points([np.array(linestring.coords)]))

to

lines_split.append([remove_collinear_points(np.array(linestring.coords))])

While I cannot really explain why this results in another linestring being detected as a duplicate; it resolves the issue that you have and the PR includes a nice test. So thank you!

@mattijn mattijn merged commit 894650b into mattijn:main Oct 20, 2023
11 checks passed
@theroggy
Copy link
Contributor Author

theroggy commented Oct 20, 2023

Yes, it has taken me quite some hours going from the real-world problem to that specific code change, and the only thing to show for it are 2 square brackets being moved a few characters :-).

The difference is in what remove_collinear_points receives as input:

  • before something like this: [ [ [0, 1], [0, 2], [0, 3] ] ]
  • versus now it will get this: [ [0, 1], [0, 2], [0, 3] ]

One of the first lines of remove_collinear_points is len(line). For the first array will always be 1... so there are never collinear points to be removed :-(....

    # If only 2 points, no use checking
    if len(line) <= 2:
        return line

@theroggy theroggy deleted the Fix-bug-with-shared_coords=True-where-duplicates-are-found branch October 20, 2023 15:10
@mattijn
Copy link
Owner

mattijn commented Oct 20, 2023

Thanks for the explanation! I just released a new version of topojson, see https://pypi.org/project/topojson/1.7/

@theroggy theroggy changed the title Fix bug with shared_coords=True where duplicates are found Fix bug with shared_coords=True where duplicates are not found Oct 21, 2023
@theroggy theroggy changed the title Fix bug with shared_coords=True where duplicates are not found Fix: duplicates with collinear points sometimes not found with shared_coords=False Oct 21, 2023
@theroggy
Copy link
Contributor Author

By reading the change notes I noticed that the title of the PR was quite off... so I updated it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants