Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] - Implement Aggregations on Geo Shape Field #84

Closed
navneet1v opened this issue Jun 27, 2022 · 10 comments
Closed

[RFC] - Implement Aggregations on Geo Shape Field #84

navneet1v opened this issue Jun 27, 2022 · 10 comments
Assignees
Labels

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Jun 27, 2022

The purpose of this RFC (request for comments) is to gather community feedback on a proposal to allow OpenSearch users to facilitate aggregations over the GeoShape data type.

Background

The geo_shape data type facilitates the indexing and searching with arbitrary geo shapes such as rectangles and polygons. It gets used when we want to index the shapes(like Line, MultiLine, Polygon etc) other than just points in a geographical context.

Aggregations

As of OpenSearch 1.0(derived from ElasticSearch 7.10.x), aggregations are supported on Geo Point data type not on the Geo Shape. We will be adding 4 different Aggregations on the Geo Shape data type of Open Search.

  1. Geo Bounds Aggregation
  2. Geo Centroid Aggregation
  3. Geo Hash Aggregation
  4. Geo Tile Aggregation

Geo Bounds Aggregation

A metric aggregation that computes the bounding box containing all geo values for a field.

Geo Centroid Aggregation

A metric aggregation that computes the centroid from all coordinate values for geo fields. The centroid calculation implements the ISO/IEC 13249-3:2016 SQL/MM specification, section 8.1.6.

Geo Hash Aggregation

Geo Hash is multi-bucket aggregation that will be grouping Geo Shapes into buckets which represent cells in a grid. Each cell is labeled using a geohash which is of user-definable precision. The internal working and output of the aggregation will be very similar to Geo Points except that a single shape can be counted for multiple tiles if any part of the shape intersects with that tile.

Geo Tile Aggregation

Geo Tile is multi-bucket aggregation that will be grouping Geo Shapes into buckets which represents map tiles as used by many online map sites. Each cell/tile will be labeled using the format : {zoom}/{x}/{y}, where zoom will be equal to the user-specified precision provided as input. The internal working and output of the aggregation will be very similar to Geo Points except that a single shape can be counted for multiple cells if any part of the shape intersects with that cell.

Plan

We will be launching the metrics aggregations(Geo Bounds and Geo Centroid) first and then moving towards the Geo Tile and Hash Aggregation.

Next Steps

As a next steps we will be adding 1 RFC for each of the above aggregation with more details around each aggregation.

Customer Request

  1. https://forum.opensearch.org/t/support-for-geo-distance-queries-sorting-on-geo-shape-fields/7777/6
  2. [FEATURE] Add support for geo_shape fields #55

References

  1. GeoBound on Geo Points: https://opensearch.org/docs/latest/opensearch/metric-agg/#geo_bound
  2. Zoom Level Documentation: https://wiki.openstreetmap.org/wiki/Zoom_levels
  3. Centroid: https://en.wikipedia.org/wiki/Centroid
  4. Geo Hash: https://en.wikipedia.org/wiki/Geohash
  5. Map Tile: https://en.wikipedia.org/wiki/Tiled_web_map
  6. ISO/IEC: https://www.iso.org/obp/ui/#iso:std:iso-iec:13249:-3:ed-5:v1:en
@AdrienF
Copy link

AdrienF commented Jul 19, 2022

Hello, I just want to support this issue by explaining our use case, maybe it might help you prioritize it :

We want to build a "heatmap" of object detected on images. Objects are being modelized by axis aligned bounding boxes (stored as geo_shape of type Polygon or envelope) in our index, and scaled from normal coordinates to fit the Geo bounds [-180,180]x[-90,90].
Then, the heatmap is just a simple geotile_grid aggregation.

The problem comes from the fact that it is not yet possible in opensearch to aggregate on geo_shape field.

One workaround we currently have is to aggregate on detection centers, mapped as geo_point. It is not fully satisfactory as it does not take the detection area into account.

The ideal implementation for this feature would be an aggregation where geo_shape would be counted in each intersecting tile proportionally to its relative intersection area.

Additionnaly, the geo referential is useless to this application, do doing all this on arbitrary coordinates would be perfect.

Best regards.

Adrien

@navneet1v
Copy link
Collaborator Author

Hi Adrien,
Thanks for providing the detailed use case. These aggregations are prioritized in the roadmap and we are currently working on GeoBounds Aggregation(#93) to start with. We will keep this github issue updated for the release version we are targeting.

@navneet1v
Copy link
Collaborator Author

Uber level implementation Issue: #104

@philvarner
Copy link

I'd also like to see GeoHexGrid Aggregation on geo_shape field, which is already supported by geo_point.

@navneet1v
Copy link
Collaborator Author

@philvarner are you looking for a specific use case which can only be accomplished using GeoHexGrid aggregation on GeoShapes?

@philvarner
Copy link

My specific use case is I have ~50M (STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) in my OpenSearch instance that I'd like to do GeoHex aggregation over, and these only have a GeoJSON geometry (Polygon or MultiPolygon) indexed as a geo_shape. To new data, we're adding a geo_point field that can be used by the existing GeoHex aggregation, but most of our data doesn't have that field.

@navneet1v
Copy link
Collaborator Author

My specific use case is I have ~50M (STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) in my OpenSearch instance that I'd like to do GeoHex aggregation over, and these only have a GeoJSON geometry (Polygon or MultiPolygon) indexed as a geo_shape. To new data, we're adding a geo_point field that can be used by the existing GeoHex aggregation, but most of our data doesn't have that field.

Thanks for explaining the use case. Just want to understand one more thing, do you need to visualize this GeoHexGrid aggregations on OpenSearch Dashboard also or you have your own UI for that?

@philvarner
Copy link

We're building our own UI for that.

@nandi-github
Copy link

My specific use case is I have ~50M (STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) in my OpenSearch instance that I'd like to do GeoHex aggregation over, and these only have a GeoJSON geometry (Polygon or MultiPolygon) indexed as a geo_shape. To new data, we're adding a geo_point field that can be used by the existing GeoHex aggregation, but most of our data doesn't have that field.

Thanks for explaining the use case. Just want to understand one more thing, do you need to visualize this GeoHexGrid aggregations on OpenSearch Dashboard also or you have your own UI for that?

Lets plan this for next release even if the Ux is not ready. Looks like there are people wanting the backend.

@navneet1v
Copy link
Collaborator Author

@philvarner the code is merged and will be released with 2.9 version of OpenSearch.

I am closing this issue. Please create a github issue for any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants