Skip to content

Commit

Permalink
Centroid aggregation for cartesian points and shapes (#89216)
Browse files Browse the repository at this point in the history
Added Cartesian support for centroid aggregation

* First draft of cartesian-centroid docs
  However, this is largely a duplicate of geo-centroid docs since they are essentially identical behaviour. We should consider merging them.
* Work on isAggregatable caused a minor logic conflict. When that work was done, Point and Shape were not aggregatable, but now they are.
  • Loading branch information
craigtaverner committed Sep 28, 2022
1 parent bbe5d7f commit 4c5d246
Show file tree
Hide file tree
Showing 52 changed files with 3,306 additions and 222 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/89216.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 89216
summary: Centroid aggregation for cartesian points and shapes
area: Geo
type: enhancement
issues: []
2 changes: 2 additions & 0 deletions docs/reference/aggregations/metrics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ include::metrics/geocentroid-aggregation.asciidoc[]

include::metrics/geoline-aggregation.asciidoc[]

include::metrics/cartesian-centroid-aggregation.asciidoc[]

include::metrics/matrix-stats-aggregation.asciidoc[]

include::metrics/max-aggregation.asciidoc[]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
[[search-aggregations-metrics-cartesian-centroid-aggregation]]
=== Cartesian-centroid aggregation

++++
<titleabbrev>Cartesian-centroid</titleabbrev>
++++

A metric aggregation that computes the weighted {wikipedia}/Centroid[centroid] from all coordinate values for point and shape fields.

Example:

[source,console]
--------------------------------------------------
PUT /museums
{
"mappings": {
"properties": {
"location": {
"type": "point"
}
}
}
}
POST /museums/_bulk?refresh
{"index":{"_id":1}}
{"location": "POINT (491.2350 5237.4081)", "city": "Amsterdam", "name": "NEMO Science Museum"}
{"index":{"_id":2}}
{"location": "POINT (490.1618 5236.9219)", "city": "Amsterdam", "name": "Museum Het Rembrandthuis"}
{"index":{"_id":3}}
{"location": "POINT (491.4722 5237.1667)", "city": "Amsterdam", "name": "Nederlands Scheepvaartmuseum"}
{"index":{"_id":4}}
{"location": "POINT (440.5200 5122.2900)", "city": "Antwerp", "name": "Letterenhuis"}
{"index":{"_id":5}}
{"location": "POINT (233.6389 4886.1111)", "city": "Paris", "name": "Musée du Louvre"}
{"index":{"_id":6}}
{"location": "POINT (232.7000 4886.0000)", "city": "Paris", "name": "Musée d'Orsay"}
POST /museums/_search?size=0
{
"aggs": {
"centroid": {
"cartesian_centroid": {
"field": "location" <1>
}
}
}
}
--------------------------------------------------

<1> The `cartesian_centroid` aggregation specifies the field to use for computing the centroid.
(NOTE: field must be a <<point>> or a <<shape>> type)

The above aggregation demonstrates how one would compute the centroid of the location field for all museums' documents.

The response for the above aggregation:

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"centroid": {
"location": {
"x": 396.6213124593099,
"y": 5100.982991536458
},
"count": 6
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]

The `cartesian_centroid` aggregation is more interesting when combined as a sub-aggregation to other bucket aggregations.

Example:

[source,console]
--------------------------------------------------
POST /museums/_search?size=0
{
"aggs": {
"cities": {
"terms": { "field": "city.keyword" },
"aggs": {
"centroid": {
"cartesian_centroid": { "field": "location" }
}
}
}
}
}
--------------------------------------------------
// TEST[continued]

The above example uses `cartesian_centroid` as a sub-aggregation to a
<<search-aggregations-bucket-terms-aggregation, terms>> bucket aggregation for finding the central location for museums in each city.

The response for the above aggregation:

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"cities": {
"sum_other_doc_count": 0,
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": "Amsterdam",
"doc_count": 3,
"centroid": {
"location": {
"x": 490.9563293457031,
"y": 5237.16552734375
},
"count": 3
}
},
{
"key": "Paris",
"doc_count": 2,
"centroid": {
"location": {
"x": 233.16944885253906,
"y": 4886.0556640625
},
"count": 2
}
},
{
"key": "Antwerp",
"doc_count": 1,
"centroid": {
"location": {
"x": 440.5199890136719,
"y": 5122.2900390625
},
"count": 1
}
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]


[discrete]
[role="xpack"]
[[cartesian-centroid-aggregation-geo-shape]]
==== Cartesian Centroid Aggregation on `shape` fields

The centroid metric for shapes is more nuanced than for points.
The centroid of a specific aggregation bucket containing shapes is the centroid of the highest-dimensionality shape type in the bucket.
For example, if a bucket contains shapes consisting of polygons and lines, then the lines do not contribute to the centroid metric.
Each type of shape's centroid is calculated differently.
Envelopes and circles ingested via the <<ingest-circle-processor>> are treated as polygons.

|===
|Geometry Type | Centroid Calculation

|[Multi]Point
|equally weighted average of all the coordinates

|[Multi]LineString
|a weighted average of all the centroids of each segment, where the weight of each segment is its length in the same units as the coordinates

|[Multi]Polygon
|a weighted average of all the centroids of all the triangles of a polygon where the triangles are formed by every two consecutive vertices and the starting-point.
holes have negative weights. weights represent the area of the triangle is calculated in the square of the units of the coordinates

|GeometryCollection
|The centroid of all the underlying geometries with the highest dimension. If Polygons and Lines and/or Points, then lines and/or points are ignored.
If Lines and Points, then points are ignored
|===

Example:

[source,console]
--------------------------------------------------
PUT /places
{
"mappings": {
"properties": {
"geometry": {
"type": "shape"
}
}
}
}
POST /places/_bulk?refresh
{"index":{"_id":1}}
{"name": "NEMO Science Museum", "geometry": "POINT(491.2350 5237.4081)" }
{"index":{"_id":2}}
{"name": "Sportpark De Weeren", "geometry": { "type": "Polygon", "coordinates": [ [ [ 496.5305328369141, 5239.347642069457 ], [ 496.6979026794433, 5239.1721758934835 ], [ 496.9425201416015, 5239.238958618537 ], [ 496.7944622039794, 5239.420969150824 ], [ 496.5305328369141, 5239.347642069457 ] ] ] } }
POST /places/_search?size=0
{
"aggs": {
"centroid": {
"cartesian_centroid": {
"field": "geometry"
}
}
}
}
--------------------------------------------------
// TEST

[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"centroid": {
"location": {
"x": 496.74041748046875,
"y": 5239.29638671875
},
"count": 2
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[[search-aggregations-metrics-geocentroid-aggregation]]
=== Geo-centroid aggregation

++++
<titleabbrev>Geo-centroid</titleabbrev>
++++
Expand Down Expand Up @@ -49,7 +50,7 @@ POST /museums/_search?size=0

<1> The `geo_centroid` aggregation specifies the field to use for computing the centroid. (NOTE: field must be a <<geo-point>> type)

The above aggregation demonstrates how one would compute the centroid of the location field for all documents with a crime type of burglary.
The above aggregation demonstrates how one would compute the centroid of the location field for all museums' documents.

The response for the above aggregation:

Expand Down
19 changes: 10 additions & 9 deletions docs/reference/rest-api/common-parms.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ end::http-format[]

tag::frequency[]
The interval between checks for changes in the source indices when the
{transform} is running continuously. The minimum value is `1s` and the maximum
{transform} is running continuously. The minimum value is `1s` and the maximum
is `1h`. The default value is `1m`.
end::frequency[]

Expand Down Expand Up @@ -701,6 +701,7 @@ currently supported:
* <<search-aggregations-metrics-geobounds-aggregation,Geo bounds>>
* <<search-aggregations-metrics-geocentroid-aggregation,Geo centroid>>
* <<search-aggregations-metrics-geo-line,Geo line>>
* <<search-aggregations-metrics-cartesian-centroid-aggregation,Cartesian centroid>>
* <<search-aggregations-metrics-max-aggregation,Max>>
* <<search-aggregations-metrics-median-absolute-deviation-aggregation,Median absolute deviation>>
* <<search-aggregations-metrics-min-aggregation,Min>>
Expand Down Expand Up @@ -729,8 +730,8 @@ The following groupings are currently supported:
* <<_histogram,Histogram>>
* <<_terms,Terms>>

The grouping properties can optionally have a `missing_bucket` property. If
it's `true`, documents without a value in the respective `group_by` field are
The grouping properties can optionally have a `missing_bucket` property. If
it's `true`, documents without a value in the respective `group_by` field are
included. Defaults to `false`.
--
end::pivot-group-by[]
Expand Down Expand Up @@ -1006,13 +1007,13 @@ criteria is deleted from the destination index.
end::transform-retention[]

tag::transform-retention-time[]
Specifies that the {transform} uses a time field to set the retention policy.
Data is deleted if `time.field` for the retention policy exists and contains
Specifies that the {transform} uses a time field to set the retention policy.
Data is deleted if `time.field` for the retention policy exists and contains
data older than `max.age`.
end::transform-retention-time[]

tag::transform-retention-time-field[]
The date field that is used to calculate the age of the document. Set
The date field that is used to calculate the age of the document. Set
`time.field` to an existing date field.
end::transform-retention-time-field[]

Expand Down Expand Up @@ -1066,9 +1067,9 @@ The default value is the cluster-level setting `num_transform_failure_retries`.
end::transform-settings-num-failure-retries[]

tag::transform-settings-unattended[]
If `true`, the {transform} runs in unattended mode. In unattended mode, the
{transform} retries indefinitely in case of an error which means the {transform}
never fails. Setting the number of retries other than infinite fails in
If `true`, the {transform} runs in unattended mode. In unattended mode, the
{transform} retries indefinitely in case of an error which means the {transform}
never fails. Setting the number of retries other than infinite fails in
validation. Defaults to `false`.
end::transform-settings-unattended[]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import org.elasticsearch.search.sort.BucketedSort;
import org.elasticsearch.search.sort.SortOrder;

abstract class AbstractPointIndexFieldData<T extends MultiPointValues<? extends SpatialPoint>> implements IndexPointFieldData<T> {
public abstract class AbstractPointIndexFieldData<T extends MultiPointValues<? extends SpatialPoint>> implements IndexPointFieldData<T> {

protected final String fieldName;
protected final ValuesSourceType valuesSourceType;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,9 @@ public Object getProperty(List<String> path) {
}
}

protected static class Fields {
static final ParseField CENTROID = new ParseField("location");
static final ParseField COUNT = new ParseField("count");
public static class Fields {
public static final ParseField CENTROID = new ParseField("location");
public static final ParseField COUNT = new ParseField("count");
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ public String toString() {
};
}

static SortedNumericDocValues replaceMissing(final SortedNumericDocValues values, final long missing) {
public static SortedNumericDocValues replaceMissing(final SortedNumericDocValues values, final long missing) {
return new AbstractSortedNumericDocValues() {

private int count;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ private SpatialStatsAction() {
*/
public enum Item {
GEOLINE,
GEOHEX
GEOHEX,
CARTESIANCENTROID
}

public static class Request extends BaseNodesRequest<Request> implements ToXContentObject {
Expand Down

0 comments on commit 4c5d246

Please sign in to comment.