Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better filtering for irrelevant address parts for POIs #2658

Open
lonvia opened this issue Apr 6, 2022 · 5 comments
Open

Better filtering for irrelevant address parts for POIs #2658

lonvia opened this issue Apr 6, 2022 · 5 comments

Comments

@lonvia
Copy link
Member

lonvia commented Apr 6, 2022

#2082 has implemented filtering of address parts, where the POI is outside the area of the address part. This solves the issue for bad addresses when the parent street goes through multiple administrative area in most cases. For performance reasons we do this only, when multiple areas of the same address level are part of the address list of a street. This leaves some corner cases, where there is only one area of a certain address level that is partially touched by the street, see discussion in #2649.

Instead of guessing, which areas need rechecking, it would also be possible to explicitly mark areas that need rechecking in the place_addressline table. This would catch that case and make the rechecking a bit easier in general.

@champagne-cmd
Copy link

@lonvia I would be interested in working on this issue. Do you think someone new to the Nominatim project could handle this issue?

@lonvia
Copy link
Member Author

lonvia commented Apr 10, 2022

It's challenging but doable if you are not afraid to dive head-deep into pl/PgSQL code.

Here are some pointers to get you started:

  • The basic idea would be to add a boolean column to place_addressline which is set to true when the area of the parent place does not cover a street completely. A good approximation for being a street object is probably: it is of type LineString.
  • place_addressline is filled during indexing of places. The function insert_addressline is the relevant code. The tricky part here is that the parent objects are searched in the location_area_large_* tables. These tables contain chopped up geometries, so you can't use them. There is already a hack to load full geometries in the function here. That's the way to go.
  • The recheck if a POI point is really inside the area is hidden in this insane order statement.
  • Tests for the dynamic address computation are in test/bdd starting here.

I recommend to start with writing a BDD test that describes the problem from #2649 and initially fails. Feel free to make a PR with the test only, if you want to have a second opinion if the test tests the right thing.

@champagne-cmd
Copy link

@lonvia I have written a test case, but I'm having trouble running some of the tests - please see #2697

@champagne-cmd
Copy link

^ @lonvia See the PR above for my first crack at the test case

@lonvia
Copy link
Member Author

lonvia commented Sep 28, 2022

Tried this in https://github.com/lonvia/Nominatim/tree/drop-outside-address-parts-backup and it turns out to be far too slow to check for containment. This first needs a clever idea how to reduce the number of geometry checks that have to be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants