Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach #35320

Merged
merged 41 commits into from Dec 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
15c6517
[Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
nknize Nov 6, 2018
3171808
revert change to PercolatorFieldMapper
nknize Nov 6, 2018
f3d1736
fix ExistsQuery for geo_shape vector strategy
nknize Nov 7, 2018
ddcbdda
add deprecation logging for tree, precision, tree_levels, distance_er…
nknize Nov 7, 2018
551149d
initial update to geoshape docs, including mapping migration updates
nknize Nov 7, 2018
c81747d
initial support for GeoCollection queries
nknize Nov 7, 2018
8306a1a
fix docs and javadoc errors
nknize Nov 7, 2018
cdf1286
clean up geocollection queries
nknize Nov 8, 2018
5cc71a8
set deprecated mapping tests to NOTCONSOLE
nknize Nov 8, 2018
83e34b5
fix geo-shape mapper asciidoc mapping and test warnings
nknize Nov 9, 2018
c4bc5f6
add support for point queries using LatLonShapeBoundingBoxQuery
nknize Nov 12, 2018
2ae341c
update GeoShapeQueryBuilderTests to include POINT queries for VECTOR …
nknize Nov 13, 2018
6bb4122
add lucene geometry build testing to ShapeBuilder tests
nknize Nov 13, 2018
067e84f
remove deprecated prefix tree mapping from geo-shape.asciidoc
nknize Nov 13, 2018
b389084
refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoSh…
nknize Nov 19, 2018
c2b5bf5
update docs to remove vector strategy
nknize Nov 19, 2018
6d471c4
Merge branch 'master' into bkdBackedShapes
nknize Nov 21, 2018
24bff63
fix GeometryCollectionBuilder#buildLucene to return the object create…
nknize Nov 21, 2018
fea0972
fix LineLength failure in GeoJsonShapeParserTests
nknize Nov 21, 2018
10d4512
Merge branch 'master' into bkdBackedShapes
nknize Dec 8, 2018
f323b43
ShapeMapper refactor changes from PR feedback
nknize Dec 10, 2018
fb389e0
fix typo in geo-shape.asciidoc
nknize Dec 10, 2018
653f6fc
ignore circle test in docs
nknize Dec 10, 2018
b129a3b
update indexing-approach ref to geoshape-indexing-approach
nknize Dec 10, 2018
819b0b8
add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTe…
nknize Dec 10, 2018
95c04c9
fix deprecatedParameters setup
nknize Dec 10, 2018
985b30b
update indexing approach
nknize Dec 10, 2018
0b07a4f
fixing unexpected warnings failures
nknize Dec 10, 2018
b6e39d8
Merge branch 'master' into bkdBackedShapes
nknize Dec 10, 2018
78b9dc1
Merge branch 'master' into bkdBackedShapes
nknize Dec 10, 2018
d2d1eb7
Merge branch 'master' into bkdBackedShapes
nknize Dec 11, 2018
5f7061c
move orientation back to field type
nknize Dec 12, 2018
e3e9cd1
Merge branch 'master' into bkdBackedShapes
nknize Dec 12, 2018
16ad89f
remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldM…
nknize Dec 17, 2018
620fe42
Merge branch 'master' into bkdBackedShapes
nknize Dec 17, 2018
cbe14bf
fix indexing-approach link in circle section of geoshape docs
nknize Dec 17, 2018
d5049d5
add strategy to deprecation warnings check
nknize Dec 17, 2018
0489810
fix test failures
nknize Dec 17, 2018
2d677cf
fix typo in QueryStringQueryBuilderTests
nknize Dec 17, 2018
5949365
fix total hits to totalHits().value
nknize Dec 17, 2018
a455768
Merge branch 'master' into bkdBackedShapes
nknize Dec 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
186 changes: 110 additions & 76 deletions docs/reference/mapping/types/geo-shape.asciidoc
Expand Up @@ -21,48 +21,59 @@ type.
|=======================================================================
|Option |Description| Default

|`tree` |Name of the PrefixTree implementation to be used: `geohash` for
GeohashPrefixTree and `quadtree` for QuadPrefixTree.
| `geohash`

|`precision` |This parameter may be used instead of `tree_levels` to set
an appropriate value for the `tree_levels` parameter. The value
specifies the desired precision and Elasticsearch will calculate the
best tree_levels value to honor this precision. The value should be a
number followed by an optional distance unit. Valid distance units
include: `in`, `inch`, `yd`, `yard`, `mi`, `miles`, `km`, `kilometers`,
`m`,`meters`, `cm`,`centimeters`, `mm`, `millimeters`.
|`tree |deprecated[6.6, PrefixTrees no longer used] Name of the PrefixTree
implementation to be used: `geohash` for GeohashPrefixTree and `quadtree`
for QuadPrefixTree. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `quadtree`

|`precision` |deprecated[6.6, PrefixTrees no longer used] This parameter may
nknize marked this conversation as resolved.
Show resolved Hide resolved
be used instead of `tree_levels` to set an appropriate value for the
`tree_levels` parameter. The value specifies the desired precision and
Elasticsearch will calculate the best tree_levels value to honor this
precision. The value should be a number followed by an optional distance
unit. Valid distance units include: `in`, `inch`, `yd`, `yard`, `mi`,
`miles`, `km`, `kilometers`, `m`,`meters`, `cm`,`centimeters`, `mm`,
`millimeters`. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `50m`

|`tree_levels` |Maximum number of layers to be used by the PrefixTree.
This can be used to control the precision of shape representations and
therefore how many terms are indexed. Defaults to the default value of
the chosen PrefixTree implementation. Since this parameter requires a
certain level of understanding of the underlying implementation, users
may use the `precision` parameter instead. However, Elasticsearch only
uses the tree_levels parameter internally and this is what is returned
via the mapping API even if you use the precision parameter.
|`tree_levels` |deprecated[6.6, PrefixTrees no longer used] Maximum number
nknize marked this conversation as resolved.
Show resolved Hide resolved
of layers to be used by the PrefixTree. This can be used to control the
precision of shape representations andtherefore how many terms are
indexed. Defaults to the default value of the chosen PrefixTree
implementation. Since this parameter requires a certain level of
understanding of the underlying implementation, users may use the
`precision` parameter instead. However, Elasticsearch only uses the
tree_levels parameter internally and this is what is returned via the
mapping API even if you use the precision parameter. Note: This parameter
is only relevant for `term` and `recursive` strategies.
| various

|`strategy` |The strategy parameter defines the approach for how to
represent shapes at indexing and search time. It also influences the
capabilities available so it is recommended to let Elasticsearch set
this parameter automatically. There are two strategies available:
`recursive` and `term`. Term strategy supports point types only (the
`points_only` parameter will be automatically set to true) while
Recursive strategy supports all shape types. (IMPORTANT: see
<<prefix-trees, Prefix trees>> for more detailed information)
|`strategy` |deprecated[6.6, PrefixTrees no longer used] The strategy
parameter defines the approach for how to represent shapes at indexing
and search time. It also influences the capabilities available so it
is recommended to let Elasticsearch set this parameter automatically.
There are two strategies available: `recursive`, and `term`.
Recursive and Term strategies are deprecated and will be removed in a
future version. While they are still available, the Term strategy
supports point types only (the `points_only` parameter will be
automatically set to true) while Recursive strategy supports all
shape types. (IMPORTANT: see <<prefix-trees, Prefix trees>> for more
detailed information about these strategies)
| `recursive`

|`distance_error_pct` |Used as a hint to the PrefixTree about how
precise it should be. Defaults to 0.025 (2.5%) with 0.5 as the maximum
supported value. PERFORMANCE NOTE: This value will default to 0 if a `precision` or
`tree_level` definition is explicitly defined. This guarantees spatial precision
at the level defined in the mapping. This can lead to significant memory usage
for high resolution shapes with low error (e.g., large shapes at 1m with < 0.001 error).
To improve indexing performance (at the cost of query accuracy) explicitly define
`tree_level` or `precision` along with a reasonable `distance_error_pct`, noting
that large shapes will have greater false positives.
|`distance_error_pct` |deprecated[6.6, PrefixTrees no longer used] Used as a
hint to the PrefixTree about how precise it should be. Defaults to 0.025 (2.5%)
with 0.5 as the maximum supported value. PERFORMANCE NOTE: This value will
default to 0 if a `precision` or `tree_level` definition is explicitly defined.
This guarantees spatial precision at the level defined in the mapping. This can
lead to significant memory usage for high resolution shapes with low error
(e.g., large shapes at 1m with < 0.001 error). To improve indexing performance
(at the cost of query accuracy) explicitly define `tree_level` or `precision`
along with a reasonable `distance_error_pct`, noting that large shapes will have
greater false positives. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `0.025`

|`orientation` |Optionally define how to interpret vertex order for
Expand All @@ -77,13 +88,13 @@ sets vertex order for the coordinate list of a geo_shape field but can be
overridden in each individual GeoJSON or WKT document.
| `ccw`

|`points_only` |Setting this option to `true` (defaults to `false`) configures
the `geo_shape` field type for point shapes only (NOTE: Multi-Points are not
yet supported). This optimizes index and search performance for the `geohash` and
`quadtree` when it is known that only points will be indexed. At present geo_shape
queries can not be executed on `geo_point` field types. This option bridges the gap
by improving point performance on a `geo_shape` field so that `geo_shape` queries are
optimal on a point only field.
|`points_only` |deprecated[6.6, PrefixTrees no longer used] Setting this option to
`true` (defaults to `false`) configures the `geo_shape` field type for point
shapes only (NOTE: Multi-Points are not yet supported). This optimizes index and
search performance for the `geohash` and `quadtree` when it is known that only points
will be indexed. At present geo_shape queries can not be executed on `geo_point`
field types. This option bridges the gap by improving point performance on a
`geo_shape` field so that `geo_shape` queries are optimal on a point only field.
| `false`

|`ignore_malformed` |If true, malformed GeoJSON or WKT shapes are ignored. If
Expand All @@ -100,16 +111,35 @@ and reject the whole document.

|=======================================================================


[[geoshape-indexing-approach]]
[float]
==== Indexing approach
GeoShape types are indexed by decomposing the shape into a triangular mesh and
nknize marked this conversation as resolved.
Show resolved Hide resolved
indexing each triangle as a 7 dimension point in a BKD tree. This provides
near perfect spatial resolution (down to 1e-7 decimal degree precision) since all
spatial relations are computed using an encoded vector representation of the
original shape instead of a raster-grid representation as used by the
<<prefix-trees>> indexing approach. Performance of the tessellator primarily
depends on the number of vertices that define the polygon/multi-polyogn. While
this is the default indexing technique prefix trees can still be used by setting
the `tree` or `strategy` parameters according to the appropriate
<<geo-shape-mapping-options>>. Note that these parameters are now deprecated
and will be removed in a future version.

[[prefix-trees]]
[float]
==== Prefix trees

To efficiently represent shapes in the index, Shapes are converted into
a series of hashes representing grid squares (commonly referred to as "rasters")
using implementations of a PrefixTree. The tree notion comes from the fact that
the PrefixTree uses multiple grid layers, each with an increasing level of
precision to represent the Earth. This can be thought of as increasing the level
of detail of a map or image at higher zoom levels.
deprecated[6.6, PrefixTrees no longer used] To efficiently represent shapes in
an inverted index, Shapes are converted into a series of hashes representing
grid squares (commonly referred to as "rasters") using implementations of a
PrefixTree. The tree notion comes from the fact that the PrefixTree uses multiple
grid layers, each with an increasing level of precision to represent the Earth.
This can be thought of as increasing the level of detail of a map or image at higher
zoom levels. Since this approach causes precision issues with indexed shape, it has
been deprecated in favor of a vector indexing approach that indexes the shapes as a
triangular mesh (see <<geoshape-indexing-approach>>).

Multiple PrefixTree implementations are provided:

Expand All @@ -131,9 +161,10 @@ number of levels for the quad trees in Elasticsearch is 29; the default is 21.
[[spatial-strategy]]
[float]
===== Spatial strategies
The PrefixTree implementations rely on a SpatialStrategy for decomposing
the provided Shape(s) into approximated grid squares. Each strategy answers
the following:
deprecated[6.6, PrefixTrees no longer used] The indexing implementation
selected relies on a SpatialStrategy for choosing how to decompose the shapes
(either as grid squares or a tessellated triangular mesh). Each strategy
answers the following:

* What type of Shapes can be indexed?
* What types of Query Operations and Shapes can be used?
Expand All @@ -146,21 +177,21 @@ are provided:
|=======================================================================
|Strategy |Supported Shapes |Supported Queries |Multiple Shapes

|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
|`term` |<<point, Points>> |`INTERSECTS` |Yes
nknize marked this conversation as resolved.
Show resolved Hide resolved

|=======================================================================

[float]
===== Accuracy

Geo_shape does not provide 100% accuracy and depending on how it is configured
it may return some false positives for `INTERSECTS`, `WITHIN` and `CONTAINS`
queries, and some false negatives for `DISJOINT` queries. To mitigate this, it
is important to select an appropriate value for the tree_levels parameter and
to adjust expectations accordingly. For example, a point may be near the border
of a particular grid cell and may thus not match a query that only matches the
cell right next to it -- even though the shape is very close to the point.
`Recursive` and `Term` strategies do not provide 100% accuracy and depending on
how they are configured it may return some false positives for `INTERSECTS`,
`WITHIN` and `CONTAINS` queries, and some false negatives for `DISJOINT` queries.
To mitigate this, it is important to select an appropriate value for the tree_levels
parameter and to adjust expectations accordingly. For example, a point may be near
the border of a particular grid cell and may thus not match a query that only matches
the cell right next to it -- even though the shape is very close to the point.

[float]
===== Example
Expand All @@ -173,9 +204,7 @@ PUT /example
"doc": {
"properties": {
"location": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "100m"
"type": "geo_shape"
}
}
}
Expand All @@ -185,22 +214,23 @@ PUT /example
// CONSOLE
// TESTSETUP

This mapping maps the location field to the geo_shape type using the
quad_tree implementation and a precision of 100m. Elasticsearch translates
this into a tree_levels setting of 20.
This mapping definition maps the location field to the geo_shape
type using the default vector implementation. It provides
approximately 1e-7 decimal degree precision.

[float]
===== Performance considerations
===== Performance considerations with Prefix Trees

Elasticsearch uses the paths in the prefix tree as terms in the index
and in queries. The higher the level is (and thus the precision), the
more terms are generated. Of course, calculating the terms, keeping them in
deprecated[6.6, PrefixTrees no longer used] With prefix trees,
Elasticsearch uses the paths in the tree as terms in the inverted index
and in queries. The higher the level (and thus the precision), the more
terms are generated. Of course, calculating the terms, keeping them in
memory, and storing them on disk all have a price. Especially with higher
tree levels, indices can become extremely large even with a modest
amount of data. Additionally, the size of the features also matters.
Big, complex polygons can take up a lot of space at higher tree levels.
Which setting is right depends on the use case. Generally one trades off
accuracy against index size and query performance.
tree levels, indices can become extremely large even with a modest amount
of data. Additionally, the size of the features also matters. Big, complex
polygons can take up a lot of space at higher tree levels. Which setting
is right depends on the use case. Generally one trades off accuracy against
index size and query performance.

The defaults in Elasticsearch for both implementations are a compromise
between index size and a reasonable level of precision of 50m at the
Expand Down Expand Up @@ -598,7 +628,10 @@ POST /example/doc
===== Circle

Elasticsearch supports a `circle` type, which consists of a center
point with a radius:
point with a radius. Note that this circle representation can only
be indexed when using the `recursive` Prefix Tree strategy. For
the default <<geoshape-indexing-approach>> circles should be approximated using
a `POLYGON`.
nknize marked this conversation as resolved.
Show resolved Hide resolved

[source,js]
--------------------------------------------------
Expand All @@ -612,6 +645,7 @@ POST /example/doc
}
--------------------------------------------------
// CONSOLE
// TEST[skip:not supported in default]

Note: The inner `radius` field is required. If not specified, then
the units of the `radius` will default to `METERS`.
Expand Down
16 changes: 16 additions & 0 deletions docs/reference/migration/migrate_7_0/mappings.asciidoc
Expand Up @@ -52,3 +52,19 @@ as a better alternative.

An error will now be thrown when unknown configuration options are provided
to similarities. Such unknown parameters were ignored before.

[float]
==== deprecated `geo_shape` Prefix Tree indexing

`geo_shape` types now default to using a vector indexing approach based on Lucene's new
`LatLonShape` field type. This indexes shapes as a triangular mesh instead of decomposing
them into individual grid cells. To index using legacy prefix trees `recursive` or `term`
strategy must be explicitly defined. Note that these strategies are now deprecated and will
be removed in a future version.

[float]
==== deprecated `geo_shape` parameters

The following type parameters are deprecated for the `geo_shape` field type: `tree`,
`precision`, `tree_levels`, `distance_error_pct`, `points_only`, and `strategy`. They
will be removed in a future version.
nknize marked this conversation as resolved.
Show resolved Hide resolved
nknize marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 3 additions & 2 deletions docs/reference/query-dsl/geo-shape-query.asciidoc
Expand Up @@ -7,7 +7,7 @@ Requires the <<geo-shape,`geo_shape` Mapping>>.

The `geo_shape` query uses the same grid square representation as the
`geo_shape` mapping to find documents that have a shape that intersects
with the query shape. It will also use the same PrefixTree configuration
with the query shape. It will also use the same Prefix Tree configuration
as defined for the field mapping.

The query supports two ways of defining the query shape, either by
Expand Down Expand Up @@ -157,7 +157,8 @@ has nothing in common with the query geometry.
* `WITHIN` - Return all documents whose `geo_shape` field
is within the query geometry.
* `CONTAINS` - Return all documents whose `geo_shape` field
contains the query geometry.
contains the query geometry. Note: this is only supported using the
`recursive` Prefix Tree Strategy deprecated[6.6]

[float]
==== Ignore Unmapped
Expand Down
Expand Up @@ -19,6 +19,7 @@

package org.elasticsearch.common.geo;

import org.apache.lucene.document.LatLonShape.QueryRelation;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.io.stream.Writeable;
Expand Down Expand Up @@ -62,6 +63,17 @@ public static ShapeRelation getRelationByName(String name) {
return null;
}

/** Maps ShapeRelation to Lucene's LatLonShapeRelation */
public QueryRelation getLuceneRelation() {
switch (this) {
case INTERSECTS: return QueryRelation.INTERSECTS;
case DISJOINT: return QueryRelation.DISJOINT;
case WITHIN: return QueryRelation.WITHIN;
default:
throw new IllegalArgumentException("ShapeRelation [" + this + "] not supported");
}
}

public String getRelationName() {
return relationName;
}
Expand Down
Expand Up @@ -197,9 +197,6 @@ public Object buildLucene() {
}
}

if (shapes.size() == 1) {
return shapes.get(0);
}
nknize marked this conversation as resolved.
Show resolved Hide resolved
return shapes.toArray(new Object[shapes.size()]);
}

Expand Down