Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for polygon/linestring in results #823

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

red-fenix
Copy link

@red-fenix red-fenix commented Jul 11, 2024

At the moment, photon only returns the point of a location, and not the polygon (see #259). This PR will add the option to add the polygon (i.e. geometry) to the Elasticsearch Index and a API parameter polygon to return said polygon. If no polygon exists, the point is returned.

WARNING: This will increase the Elasticsearch Index size! (~575GB for a Planet import).

To enable: add the command line argument -use-geometry-column whilst importing and add &polygon=true to the API call.

@red-fenix red-fenix changed the title Add option for polygons in results Add option for polygon/linestring in results Jul 11, 2024
@lonvia
Copy link
Collaborator

lonvia commented Jul 14, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

  • This really needs to be implemented for the OpenSearch version because the ES version of Photon is on its way out. Note that the OpenSearch variant does not use mappings.json. It defines its mapping in https://github.com/komoot/photon/blob/master/app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java.
  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.
  • We already have a field extent which contains the bounding box. The geometry should replace this field, i.e. once full geometries are enabled, do not save extent and derive the extent from the geometry field when returning a result. We need to keep the centroid because it is not necessarily the geometric centroid of the geometry. (Note that extent was missing from the mapping specification so far. So likely it was in fact saved as a text field, which really is an oversight but can't really be fixed right now without creating an incompatible database version.)
  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).
  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behaviour as on import then.
  • This needs some tests for the import.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

  • Once we have full geometries, we'd want to use them for reverse lookup. See Discussion for PR to improve accuracy of reverse geocoding #357.
  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

@red-fenix
Copy link
Author

red-fenix commented Jul 17, 2024

I haven't done a full review yet but I do have some general thoughts on the implementation:

Thanks. I will update the PR with the changes in this file soon.

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

  • As long as the geometry isn't used for lookup, indexes should be disabled on the new field to save a bit of disk space.

Will take this along as well.

  • This would be the second optional database feature after add support for structured queries (opensearch only) #815. Before we become too prolific with command-line arguments, I'd lean towards adding a single parameter -extra-db-features which takes a list of features (right now: structured, geometries).

Agreed

  • I agree on the introduction of the polygon parameter but would prefer to disable it when the extended geometries are not available in the database. Otherwise there will be an endless stream of bug reports on photon.komoot.io, why the results return a point instead of polygon. You can save the state of the feature in the property table and load it from there on start, see add support for structured queries (opensearch only) #815 for an example on how to do it. This property also comes in handy during updates of the database. It would be useful to have exactly the same behavior as on import then.

OK. I will make the default to return the polygon when it's available in the index. The option 'polygon=false' will return the centroid instead.

  • This needs some tests for the import.

Will do.

Two other considerations come in mind but they are easily deferred to follow-up PRs:

I have to look into this issue.

  • It might be worth to slightly simplify the geometries before importing them, or at least make that an option. Nominatim always keeps the original OSM geometries which sometimes can have a lot more support points than necessary. Simplification might help to further reduce database size.

I will look into this.
Another thing related to this: when someone is searching for a street Nominatim (and thus Photon) returns a street in parts because they are separate OSM id's. I'm still looking for a method to merge multiple ways (i.e. linestrings) into 1 linestring to make sure the whole street is displayed instead of a part (example)

@lonvia
Copy link
Collaborator

lonvia commented Jul 21, 2024

One question though about the 'Elasticsearch is on it's way out': I've been planning to update the Elastic client to the Java API so you can use an existing Elasticsearch cluster instead of the internal one (newer versions of Elasticsearch don't support the Transport client). Is my new PR still a good idea?

We'll drop ES support completely and go with OpenSearch. Note that the OS version already supports an external OpenSearch cluster. The support is just somewhat rudimentary and HTTP-only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants