Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geo_shape indexing write-performance/query-accuracy - add a Geo Post Filtering step before returning the query/search results #31875

Closed
hanoch opened this issue Jul 6, 2018 · 3 comments
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement

Comments

@hanoch
Copy link

hanoch commented Jul 6, 2018

This issue is with regards to writing and indexing geo_shape values in a performant way, and then being able to query them correctly with a bounding box, without getting false positive results.

We currently need to store and index geo_shape values, using a quadtree with (the default) precision of 50m precision, in order for to be able to query/search them (their documents) with a bounding box without getting false positive results. When using the 50m quadtree precision (tree depth) we are noticing a big performance hit trying to write geo_shape values (e.g. polygons, polylines, etc.), resulting in needing to wait long minutes, between 20 minutes to an hour or more, until the write (and index) is done. That said, when using a 50m precision, when issuing queries with a bounding box on the index, we are getting back the correct results, (almost) without any false positive results.

When writing the geo_shape values with a quadtree index of 50km (rather than 50m) precision, the write time is much improved and is useable, but when running queries with a bounding box, we are getting too many false positive results.

We discussed this issue with the support team, @nknize and other elastic PMs, using the support site/app, over phone calls, and during meetings we had in the last two Elastic{ON} conferences - Elastic{ON} 17 and Elastic{ON} 18. We discussed multiple different workaround approaches needed to be done client side, both when writing/indexing as well as when querying/searching, and we explained why these workarounds can't work for our use cases, since we need to render a portion of the index using a specific bounding box, after running an external analytic, and storing the analytic results in an Elasticsearch index. Having the write time take long minutes to an hour is a showstopper for our use cases.

In our last meeting with @nknize, @zuketo, and others during Elastic{ON} 18 we came to a conclusion that this issue will be addressed by Elastic in three phases:

  1. Phase One - implement a Geo Post Filtering on the ES (DB) side, having the ES queries always return correct results with no false positives. This will be done by using new Lucene v7.4 capabilities. That will allow us to use any precision with the quadtree index, including 50km which is performant enough for most of our use cases, but will ensure that queries with bounding boxes will always return correct results. Possibly also add a post_filtering=true parameter to the query parameters with a default value (true/false) TBD.
  2. Phase Two - geo_shape BKD tree support phase 1 of 2 - implement BKD tree based geo_shape indexing. This index will still rasterize the geo_shape geometry value into multiple raster LODs, but will use the BKD tree approach which is supposed to be somewhat more performing than the existing quadtree indexing approach.
  3. Phase Three - geo_shape BKD tree support phase 2 of 2 - switch to use some vector based indexing rather than rasterizing the geo_shape geometry value.

This GitHub issue is about implementing Phase One - Add the missing Geo post filtering ES (DB) side to the query/search implementation, using Lucene v7.4? capabilities, returning correct results when querying/searching with a bounding box, for any quadtree precision.

@danielmitterdorfer danielmitterdorfer added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jul 9, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@hanoch
Copy link
Author

hanoch commented Jul 10, 2018

This issue is a follow up issue to the https://support.elastic.co/customers/s/case/5006100000AeQy6AAF/geoshape-polygonpolyline-quadtree-precision-performance support case we opened back in April 19, 2017 at 6:38 PM:
image

@zuketo
Copy link

zuketo commented Aug 1, 2018

Hi @hanoch, I'll close this issue in favor of #32039. BKD based geo shapes are a preferred solution to phase 1 within your description, which means we can move directly to phase 2 and use the linked issue #32039.

@zuketo zuketo closed this as completed Aug 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement
Projects
None yet
Development

No branches or pull requests

4 participants