Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Maps] automatically transition between aggregated view and individual documents in a single layer #46711

Closed
nreese opened this issue Sep 26, 2019 · 7 comments
Labels
[Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation discuss

Comments

@nreese
Copy link
Contributor

nreese commented Sep 26, 2019

Elastic maps supports multiple layers. The suggested strategy for displaying large datasets is to add two layers. The first layer will display individual documents. The layer will appear when the user zooms in the map to show smaller regions. The second layer will show aggregated data that represents many documents. The layer will appear when the user zooms out the map to show larger amounts of the globe.

This strategy is a great start for displaying large data sets but forces the user to choose a single static point at which aggregations switch to individual documents. In reality, the zoom level at which the data density is low enough to show individual documents will vary at any given location.

It would be nice to have a single layer that displays aggregated data until the data density drops below and certain threshold. At which point, individual documents will be displayed.

@nreese nreese added discuss [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation labels Sep 26, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-gis

@maihde
Copy link
Contributor

maihde commented Sep 27, 2019

For the Kibana 5.6 series I had modified the Enhanced Tile Maps plugin to achieve this functionality. I will describe what I choose to do and the downsides of that approach.

The automatic transition between aggregation views and individual points was based off a user defined threshold 1-10,000. Once the number of points within the map area was lower than this threshold a second query would be performed to retrieve the individual documents. I found three downsides to this approach:

  1. Two queries were performed to render the data. First an aggregation query and then a points query.
  2. Retrieving the individual documents was slower than an aggregation query.
  3. The ability to 'term' color points is a major part of our use case, but you couldn't achieve this unless the number of points was low enough to switch to documents view. This meant that a user has to spend a lot of time zooming-in and zooming-out to understand the data.

The other alternative I used was on the POI layers. Instead of trying to determine if it's more appropriate to show aggregations or points, I re-framed the problem into using aggregations with top-hits. This works very well because it eliminates the need for a user to pick an arbitrary threshold.

In this method, you define the maximum number of top_hits to return (i.e. 100). If the number of hits in the grid/tile is less than 100 you render each of them. If the number of hits is greater than 100 you render an placemark on the geo-centroid and can scale the size of this to represent the number of points contained at that centroid. You can further reinforce this by labeling the circle with the number of points contained. This comes with the additional benefit that we can also do term colors at the aggregation level by adding a sub-bucket on terms. The final benefit is you never need to do two queries all of the data is returned in one query.

If I were implementing a completely new layer, I would use the aggregation/top-hits method rather than the threshold method.

On a side-note, I've been playing a lot with integrating Datashader with ElasticSearch and Kibana because for very large datasets it provides insights that are not achievable using the grid aggregation as currently implemented in Kibana. If you are interested in how I'm implementing the prototype let me know. Basically I'm trying to implement something similar to this http://demo.tectonix.com/, which incidentally uses ES at least for part of the backend.

Finally, I personally think it would be cool to add a new ElasticSearch aggregation to complement geotile_grid that instead of aggregating and returning JSON (where we are generally limited to 10,000 bins) it returns an image like one produced by Datashader. In other words, having a Datashader-esque pipeline within ElasticSearch proper would be groundbreaking for many of my use cases.

@JacobBrandt
Copy link
Contributor

@maihde I'm interested in your approach to get enough data for something like Datashader to reach it's potential and be useful. The biggest problem I find is getting this data out of Elasticsearch is slow.

@maihde
Copy link
Contributor

maihde commented Oct 3, 2019

@JacobBrandt I quickly put together this Python Notebook to demonstrate the technique I'm using.

https://github.com/spectriclabs/elastic_datashader

It works for my use case and should to some degree support parallelism that would allow faster execution or higher resolution plots if you want.

I've used this same technique to build a tile-map-service (TMS) that renders tiles on the fly with Datashader. This let's me hook things directly into Kibana via the TMS layer but has the downside that the TMS layer has no ability to respond to changes in the query, filters, or time range.

This is why, IMHO, a neat feature would be to provide a new class of aggregations in ElasticSearch the return an image instead of JSON. Basically port the concept of the Datashader pipeline into ElasticSearch itself.

@JacobBrandt
Copy link
Contributor

@maihde Thanks I’ll take a closer look into this tonight.

@nreese
Copy link
Contributor Author

nreese commented Oct 16, 2019

Put up a POC that adds clustering to existing Elasticsearch documents source. The source first does a geo_grid aggregation fetch. If the counts of the cells is less then ES_SIZE_LIMIT then the layer just behaves like the old layer, fetching hits. If ES_SIZE_LIMIT is exceeded then a _search request is done for each cell that is less then 10% of ES_SIZE_LIMIT and the hits are combined with the grids in the sources geojson.

This is similar to the top_hits approach but uses _search instead of top_hits so 100 top hit limit can be exceeded. In the future we hope to serve vector tiles from Kibana so each of these documents request could just be a vector tile request.

#48459

@nreese
Copy link
Contributor Author

nreese commented Apr 6, 2020

closed by #57879

@nreese nreese closed this as completed Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation discuss
Projects
None yet
Development

No branches or pull requests

4 participants