Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate/query by geometry-type with geo_shape fields #49569

Closed
thomasneirynck opened this issue Nov 25, 2019 · 19 comments
Closed

Aggregate/query by geometry-type with geo_shape fields #49569

thomasneirynck opened this issue Nov 25, 2019 · 19 comments
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Nov 25, 2019

for geometries indexed into the geo_shape field, it would be helpful to be able to aggregate on the type of geometry.

Example use cases:

(1) for UX-applications that need to present a different UX based on the type-of geometries stored in the index.

  • e.g. for styling,
    • show an icon-editor for indices that have points in geo_shape
    • show a fill/outline-editor for indices that have polygons/multipolygons stored.

(2) count/unique counts are especially relevant, but could be appropriate for all aggregations

(3) Similarly, it would be great to be able to specify filters on the data based on geometry-type
- e.g. only query for points for POI-type data.

This would be similar to the ST_GeometryType function in SQL.

@thomasneirynck thomasneirynck changed the title Aggregate/query by geometry-type in geo_shape fields Aggregate/query by geometry-type with geo_shape fields Nov 25, 2019
@jtibshirani jtibshirani added :Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement labels Nov 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Geo)

@thomasneirynck
Copy link
Contributor Author

This enhancement would also be useful for vector tiling (elastic/kibana#58519). When a geo_shape field contains only points, we can run geotile_grid and geo_centroid aggs. When the grids are small enough, this combination will give good approximate results of the location of the points, especially when zoomed out.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@thomasneirynck
Copy link
Contributor Author

Also useful to determine if Maps can construct point-2-point layers elastic/kibana#68540

@nknize
Copy link
Contributor

nknize commented Jun 8, 2020

/cc @talevy @iverase @jpountz

Let's make sure we capture the "ask". I don't know many customers that have specifically requested this capability (even though it is a standard function in Oracle and PostGIS), so lets make sure we document if this is beneficial for either a. a performance boost, or b. enables "power play" functionality in maps.

Technically we can split this into two issues:

  1. geometry_type aggregation - (license: Gold) the geo doc value records the geometry type so I think we can support this relatively easily in aggregations. We should decide the "class" of aggregation (bucket or metric). I think it makes most sense as a bucket agg with bucket key being the geometry_type. e.g.,
            "buckets": [
                {
                    "key": "POLYGON",
                    "doc_count": 3
                },
                {
                    "key": "LINE",
                    "doc_count": 2
                },
                {
                    "key": "POINT",
                    "doc_count": 100
                }
            ]
        }

This way we could nest other metric aggs and run interesting analysis (e.g., attribute stats by geometry type). Note that we won't be able to support Multi types since those are split into multi value documents with individual geometries (can you confirm @iverase?).

  1. either new geometry_type query or geometry_type parameter on geo_shape query - (license: Gold) This one is more complicated as we "lose" geometry type in the lucene index due to tessellation. We have an open Lucene PR for adding triangle type to the ShapeField encoding that will help by identifying lines, points, or triangles but we need to figure out the best way to expose this as a query (if we think it's worth it). I wonder if it's worth considering a "system" index for Maps that records some of these global attributes (e.g., if a geo_shape index contains points only). As a side note, we had a points_only mapping parameter for geo_shape types that optimized the geo_shape index for points only but this went away in favor of strongly encouraging geo_point and having feature parity between geo_point and geo_shape fields for scenarios where users only expect point geometries.

@iverase
Copy link
Contributor

iverase commented Jun 9, 2020

Note that we won't be able to support Multi types since those are split into multi value documents with individual geometries.

That is right, we currently do not store information in the index / doc values about how a shape was defined. Note that the following shapes consisting in two points are equivalent for us:

// as a multi-point
MULTIPOINT(0 0, 1 1)
// as a geometry collection
GEOMETRYCOLLECTION(POINT(0 0), POINT(1 1))
// as an array
[POINT(0 0), POINT(1 1)]

On the other hand, we do have information about the shape topological dimensionality as part of the centroid calculation (dim=0 -> point; dim=1->line; dim=2->Polygon). I think exploding this information can provide most of the functionality required.

geometry_type aggregation

It would be straight forward to use is to provide an aggregation by topological dimensions. I would rename the egg accordingly.

either new geometry_type query or geometry_type parameter on geo_shape query

As above we can provide some filter capabilities wrt the topological dimensionality. For a stand alone query, we would have to implement the query on top of the doc values as BKD index is only efficient if we provide a spatial constraint.

@jpountz
Copy link
Contributor

jpountz commented Jun 9, 2020

for UX-applications that need to present a different UX based on the type-of geometries stored in the index

I wonder if this is something we should resist doing. Getting this information would be very costly, and couldn't be cached since a polygon could be added to a field that only stored points so far at any time.

@thomasneirynck
Copy link
Contributor Author

thomasneirynck commented Jun 9, 2020

and couldn't be cached

thx @jpountz. The Maps-app would not need to cache this information anywhere. It would request it when bootstrapping the UX for a layer.

To give some context for this request:

Clients of Elasticsearch-API have no efficient way of determining the types of the shapes stored in the geo_shape field (points, lines, or polygons) without actually pulling all of them.

This affects general purpose visualization tools like Kibana Maps.

Not knowing up front what geometries there are actually stored in an index, cascades in the UX. It results in a UX that has a "grabbag" look&feel.

Consider:

image

We are showing all 3 options because we don't know the geometry-types of all the documents in that index (e.g. consider there could be millions of documents. In the screenshot example 600k building footprints, too many to pull out for web-apps).

For end-users, this grab-bag UX is less than optimal. Especially since most users will store their data "thematically". e.g. rivers (lines) in one index, building footprints (polygons) in another, points-of-interest (points) in another. This implicit knowledge can be used in the UX.

e.g. Maps could simplify its UX by having knowledge of the geometry-type:

  • styling editors (see screenshot)
  • legends (e.g. show line-icons instead of fill-icons for rivers)

As for (3), this is more hypothetical since Maps does not do this today (although we do want Kibana to handle display of large datasets better elastic/kibana#58519). Not being able to filter on geometry-type, makes it harder to build maps where the display of documents is scale-dependent (ie. based on the zoom-level of the map, data gets filtered/simplified). Point-data should be handled differently than lines and polygons. E.g. building footprints should be filtered-out when zoomed-out (since they are invisible at that scale), but points-of-interest should be retained (because points have no size).

@jpountz
Copy link
Contributor

jpountz commented Jun 11, 2020

For end-users, this grab-bag UX is less than optimal. Especially since most users will store their data "thematically". e.g. rivers (lines) in one index, building footprints (polygons) in another, points-of-interest (points) in another. This implicit knowledge can be used in the UX.

Could we get half-way there by looking at field caps to know whether the field is mapped as a geo_point or as a geo_shape?

I'd really like to avoid making the UI block waiting for the result of an aggregation to know how it should specialize for the type of geometries that are stored in the index. This is something that would work with small amounts of data but would start giving users a bad experience as they start having non-negligible amounts of data and using our slow features (e.g. schema-on-read, searchable snapshots).

To be clear I'm not against adding this aggregation, which can be useful, I'm opposed to making UI loading depend on the result of this aggregation.

@thomasneirynck
Copy link
Contributor Author

thomasneirynck commented Jun 11, 2020

Could we get half-way there by looking at field caps to know whether the field is mapped as a geo_point or as a geo_shape?

Yes, Maps is already doing this right now. The "gap" is that for geo_shape itself, a client cannot determine what exactly is being stored (without actually pulling all the documents).

To be clear I'm not against adding this aggregation, which can be useful, I'm opposed to making UI loading depend on the result of this aggregation.

I don't think Maps would "block" the UI. Rather, knowledge about geometry-types would be used to fine-tune some of the presentation in an async-operation.

The potential that Kibana runs an agg on all the data is a generic issue in Kibana (e.g. date-histograms in Discover). A couple example in the Maps-application where this potentially occurs (absent any filter-context constraints, like time-range etc..)

  • If users add a cluster layer, and zooms out to the entire world, the geotile_grid-agg runs
  • Retrieving the data-bounds so users can zoom to the location of their data uses geo_bounds

I do agree that for enormous data-sets, this would result in a poor experience. But then likely Kibana is not the right tool to build a map-visualization on top of that data.

Just in general, it is really helpful for a web-app like Kibana to be able to determine relevant meta-data before actually having to query all the documents.

Also, maybe the ask was worded the wrong way. Rather than asking for "can we add an agg that gives us geometry-types", maybe the ask should be more along the lines of "How would clients get useful meta-data about geometries stored in ES, without actually pulling the entire dataset?" (e.g. the bounds of the data, the geometry-type of the shapes, the size of the shapes, ...).

@jpountz
Copy link
Contributor

jpountz commented Jun 11, 2020

I do agree that for enormous data-sets, this would result in a poor experience. But then likely Kibana is not the right tool to build a map-visualization on top of that data.

I was seeing scale as a competitive advantage, so I would be disappointed if we dropped the objective of making Maps usable with large amounts of data.

Yes, Maps is already doing this right now. The "gap" is that for geo_shape itself, a client cannot determine what exactly is being stored (without actually pulling all the documents).

So maybe we should recommend more strongly to use geo_point for point-only fields to get a better experience in Maps?

A middle ground that would be better than aggregating on the geometry type would be to enhance geo_shape to index the geometry type as a sub keyword field automatically, introduce a new query that allows filtering geo_points and geo_shapes by geometry type, and finally make Maps fire one filter per geometry type with a terminate_after equal to 1 to check whether there is any point, line and polygon in the index without needing to scan all documents.

@thomasneirynck
Copy link
Contributor Author

I would be disappointed if we dropped the objective of making Maps usable with large amounts of data.

Me too. A lot of the focus is on Maps is in working with ES-data at any scale. Blended layers (merged), aggs on geo_shape (merged), vector-tiling (future) are all efforts to display ES geo-data on a map at any scale (in two senses: whether there's few or many documents but also whether user is zoomed-out or zoomed-in). Every once in a while, feature request will trickle down to the ES-level to help Kibana achieve that ;)

So maybe we should recommend more strongly to use geo_point for point-only fields to get a better experience in Maps?

++ can do. I also understand that there is a performance benefit for using geo_point over geo_shape.

new query that allows filtering geo_points and geo_shapes by geometry type

Filtering-by-type would be very useful. Many other geo-tools allow querying geometries by type because the type impacts the styling. So it would be really useful for end-users to be able to structure their layers based on type.

@thomasneirynck
Copy link
Contributor Author

This is the corresponding issue on the Kibana-side, which is blocked by not being able to determine the geometry-type (or dimensionality) of the shapes. elastic/kibana#92672 (comment)

@thomasneirynck
Copy link
Contributor Author

@talevy
Copy link
Contributor

talevy commented Mar 2, 2021

not exactly, this SQL function operates on the source, it is not an aggregation

@thomasneirynck
Copy link
Contributor Author

Would it work in a GROUP BY statement? cc @imotov

@imotov
Copy link
Contributor

imotov commented Mar 3, 2021

@thomasneirynck you are correct, the function exists and it works with shapes. Unfortunately, as @talevy also correctly pointed out it can only extract the shape type from the shape source, which means it is available only in the contexts where source is available, which basically means we cannot use it for filtering (WHERE clause) nor in aggregations (GROUP BY clause).

@iverase
Copy link
Contributor

iverase commented May 27, 2021

With the introduction of painless support for geo_shape fields on #72886, this can now be achieved by using runtime fields. For example:

GET /example/_search
{
  "size": 0,
  "runtime_mappings": {
    "type": {
      "script": """
         int type = doc['location'].getDimensionalType();
         if (type == 0) {
           emit('POINT');
         } else if (type == 1) {
           emit('LINE');
         } else if (type == 2) {
           emit('POLYGON');
         }
       """,
      "type": "keyword"
    }
  },
  "aggs" : {
    "type" : {
      "terms": {
        "field": "type"
      }
    }
  }
}

would that fulfil the need?

@thomasneirynck
Copy link
Contributor Author

@iverase - yes I think using the runtime field satisfies the use-case. To confirm, this function is available starting 7.14?

@iverase
Copy link
Contributor

iverase commented May 27, 2021

yes, 7.14.

@iverase iverase closed this as completed Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

9 participants