Skip to content

Commit

Permalink
New GeoHexGrid aggregation (#82924)
Browse files Browse the repository at this point in the history
This commit introduces a new geogrid aggregation called GeoHexGridAggregation that
is based in Uber h3 grid. It only supports geo_point fields.
  • Loading branch information
iverase committed Jan 27, 2022
1 parent 15de797 commit 0873893
Show file tree
Hide file tree
Showing 22 changed files with 1,308 additions and 8 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/82924.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 82924
summary: New `GeoHexGrid` aggregation
area: Geo
type: feature
issues: []
2 changes: 2 additions & 0 deletions docs/reference/aggregations/bucket.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ include::bucket/geodistance-aggregation.asciidoc[]

include::bucket/geohashgrid-aggregation.asciidoc[]

include::bucket/geohexgrid-aggregation.asciidoc[]

include::bucket/geotilegrid-aggregation.asciidoc[]

include::bucket/global-aggregation.asciidoc[]
Expand Down
249 changes: 249 additions & 0 deletions docs/reference/aggregations/bucket/geohexgrid-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
[role="xpack"]
[[search-aggregations-bucket-geohexgrid-aggregation]]
=== Geohex grid aggregation
++++
<titleabbrev>Geohex grid</titleabbrev>
++++

A multi-bucket aggregation that groups <<geo-point,`geo_point`>>
values into buckets that represent a grid.
The resulting grid can be sparse and only
contains cells that have matching data. Each cell corresponds to a
https://h3geo.org/docs/core-library/h3Indexing#h3-cell-indexp[H3 cell index] and is
labeled using the https://h3geo.org/docs/core-library/h3Indexing#h3index-representation[H3Index representation].

See https://h3geo.org/docs/core-library/restable[the table of cell areas for H3
resolutions] on how precision (zoom) correlates to size on the ground.
Precision for this aggregation can be between 0 and 15, inclusive.

WARNING: High-precision requests can be very expensive in terms of RAM and
result sizes. For example, the highest-precision geohex with a precision of 15
produces cells that cover less than 10cm by 10cm. We recommend you use a
filter to limit high-precision requests to a smaller geographic area. For an example,
refer to <<geohexgrid-high-precision>>.

[[geohexgrid-low-precision]]
==== Simple low-precision request

[source,console,id=geohexgrid-aggregation-example]
--------------------------------------------------
PUT /museums
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
POST /museums/_bulk?refresh
{"index":{"_id":1}}
{"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
{"index":{"_id":2}}
{"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
{"index":{"_id":3}}
{"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
{"index":{"_id":4}}
{"location": "51.222900,4.405200", "name": "Letterenhuis"}
{"index":{"_id":5}}
{"location": "48.861111,2.336389", "name": "Musée du Louvre"}
{"index":{"_id":6}}
{"location": "48.860000,2.327000", "name": "Musée d'Orsay"}
POST /museums/_search?size=0
{
"aggregations": {
"large-grid": {
"geohex_grid": {
"field": "location",
"precision": 4
}
}
}
}
--------------------------------------------------

Response:

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"large-grid": {
"buckets": [
{
"key": "841969dffffffff",
"doc_count": 3
},
{
"key": "841fb47ffffffff",
"doc_count": 2
},
{
"key": "841fa4dffffffff",
"doc_count": 1
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geohexgrid-high-precision]]
==== High-precision requests

When requesting detailed buckets (typically for displaying a "zoomed in" map),
a filter like <<query-dsl-geo-bounding-box-query,geo_bounding_box>> should be
applied to narrow the subject area. Otherwise, potentially millions of buckets
will be created and returned.

[source,console,id=geohexgrid-high-precision-ex]
--------------------------------------------------
POST /museums/_search?size=0
{
"aggregations": {
"zoomed-in": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": "52.4, 4.9",
"bottom_right": "52.3, 5.0"
}
}
},
"aggregations": {
"zoom1": {
"geohex_grid": {
"field": "location",
"precision": 12
}
}
}
}
}
}
--------------------------------------------------
// TEST[continued]

Response:

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"zoomed-in": {
"doc_count": 3,
"zoom1": {
"buckets": [
{
"key": "8c1969c9b2617ff",
"doc_count": 1
},
{
"key": "8c1969526d753ff",
"doc_count": 1
},
{
"key": "8c1969526d26dff",
"doc_count": 1
}
]
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geohexgrid-addtl-bounding-box-filtering]]
==== Requests with additional bounding box filtering

The `geohex_grid` aggregation supports an optional `bounds` parameter
that restricts the cells considered to those that intersect the
provided bounds. The `bounds` parameter accepts the same
<<query-dsl-geo-bounding-box-query-accepted-formats,bounding box formats>>
as the geo-bounding box query. This bounding box can be used with or
without an additional `geo_bounding_box` query for filtering the points prior to aggregating.
It is an independent bounding box that can intersect with, be equal to, or be disjoint
to any additional `geo_bounding_box` queries defined in the context of the aggregation.

[source,console,id=geohexgrid-aggregation-with-bounds]
--------------------------------------------------
POST /museums/_search?size=0
{
"aggregations": {
"tiles-in-bounds": {
"geohex_grid": {
"field": "location",
"precision": 12,
"bounds": {
"top_left": "52.4, 4.9",
"bottom_right": "52.3, 5.0"
}
}
}
}
}
--------------------------------------------------
// TEST[continued]

Response:

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"tiles-in-bounds": {
"buckets": [
{
"key": "8c1969c9b2617ff",
"doc_count": 1
},
{
"key": "8c1969526d753ff",
"doc_count": 1
},
{
"key": "8c1969526d26dff",
"doc_count": 1
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

[[geohexgrid-options]]
==== Options

[horizontal]
field::
(Required, string) Field containing indexed geo-point values. Must be explicitly
mapped as a <<geo-point,`geo_point`>> field. If the field contains an array,
`geohex_grid` aggregates all array values.

precision::
(Optional, integer) Integer zoom of the key used to define cells/buckets in
the results. Defaults to `6`. Values outside of [`0`,`15`] will be rejected.

bounds::
(Optional, object) Bounding box used to filter the geo-points in each bucket.
Accepts the same bounding box formats as the
<<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.

size::
(Optional, integer) Maximum number of buckets to return. Defaults to 10,000.
When results are trimmed, buckets are prioritized based on the volume of
documents they contain.

shard_size::
(Optional, integer) Number of buckets returned from each shard. Defaults to
`max(10,(size x number-of-shards))` to allow for more a accurate count of the
top cells in the final result.
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ public void writeTo(StreamOutput out) throws IOException {
aggregations.writeTo(out);
}

protected long hashAsLong() {
public long hashAsLong() {
return hashAsLong;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public static ObjectParser<ParsedGeoGrid, Void> createParser(
return parser;
}

protected void setName(String name) {
public void setName(String name) {
super.setName(name);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,22 @@ protected int maxNumberOfBuckets() {
@Override
protected T createTestInstance(String name, Map<String, Object> metadata, InternalAggregations aggregations) {
final int precision = randomPrecision();
int size = randomNumberOfBuckets();
List<InternalGeoGridBucket> buckets = new ArrayList<>(size);
final int size = randomNumberOfBuckets();
final List<InternalGeoGridBucket> buckets = new ArrayList<>(size);
final List<Long> seen = new ArrayList<>(size);
int finalSize = 0;
for (int i = 0; i < size; i++) {
double latitude = randomDoubleBetween(-90.0, 90.0, false);
double longitude = randomDoubleBetween(-180.0, 180.0, false);

long hashAsLong = longEncode(longitude, latitude, precision);
buckets.add(createInternalGeoGridBucket(hashAsLong, randomInt(IndexWriter.MAX_DOCS), aggregations));
if (seen.contains(hashAsLong) == false) { // make sure we don't add twice the same bucket
buckets.add(createInternalGeoGridBucket(hashAsLong, randomInt(IndexWriter.MAX_DOCS), aggregations));
seen.add(hashAsLong);
finalSize++;
}
}
return createInternalGeoGrid(name, size, buckets, metadata);
return createInternalGeoGrid(name, finalSize, buckets, metadata);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ private SpatialStatsAction() {
* Items to track. Serialized by ordinals. Append only, don't remove or change order of items in this list.
*/
public enum Item {
GEOLINE
GEOLINE,
GEOHEX
}

public static class Request extends BaseNodesRequest<Request> implements ToXContentObject {
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugin/spatial/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies {
compileOnly project(path: ':modules:legacy-geo')
compileOnly project(':modules:lang-painless:spi')
compileOnly project(path: xpackModule('core'))
api project(":libs:elasticsearch-h3")
testImplementation(testArtifact(project(xpackModule('core'))))
testImplementation project(path: xpackModule('vector-tile'))
}
Expand Down

0 comments on commit 0873893

Please sign in to comment.