Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add painless script support for Geoshape field #72886

Merged
merged 10 commits into from
May 26, 2021

Conversation

iverase
Copy link
Contributor

@iverase iverase commented May 10, 2021

This PR proposes the extension of Painless API to support accessing selected information stored in geo fields. The new methods are based in two immediate needs:

  1. We have request to be able to query geo shapes by some geometric characteristics. e.g. height or width (Query by width/height or area of a geo-shape #54218). This information is not available in the lucene index but it is available in the geoshape doc values. Therefore adding the possibility to get that information with painless, it we ca support such queries using runtime fields. For example, we can sort our geoshapes by height using the following syntax:
GET /example/_search
{
   "runtime_mappings": {
    "width": {
      "type": "double",
      "script": {
        "source": "emit(doc['location'].width())"
      }
    }
  },
  "sort": [
    {
      "width": {
        "order": "desc"
      }
    }
  ]
}
  1. Accessing the centroid of a geoshape. This allows for an. application to use that information to place a label in the centroid of a geoshape. The syntax to access that information would look like:
GET /example/_search
{
   "runtime_mappings": {
    "centroidLat": {
      "type": "double",
      "script": {
        "source": "emit(doc['location'].getCentroidLat())"
      }
    },
   "centroidLon": {
      "type": "double",
      "script": {
        "source": "emit(doc['location'].getCentroidLon())"
      }
    }
  },
  "fields": [centroidLon, centroidLat]
}

The new methods will ideally be supported by both geo points and geo shapes.

This PR added a new abstract class ScriptDocValues.Geometry that is the base class for ScriptDocValues.GeoPoints and GeoShapeScriptValues. This class exposes the following new methods:

        /** Centroid latitude */
        public abstract double getCentroidLat();
        /** Centroid longitude */
        public abstract double getCentroidLon();
        /** width of the geometry in degrees */
        public abstract double width();
        /** height of the geometry in degrees */
        public abstract double height();

Let me know what you think.

@iverase iverase added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.0.0 labels May 10, 2021
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team labels May 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@iverase iverase marked this pull request as draft May 10, 2021 13:23
@iverase iverase requested review from imotov and rjernst May 10, 2021 13:24
@jdconrad
Copy link
Contributor

This looks good to me. I'd like to see some tests in Painless (guessing yaml makes the most sense) just to see this working correctly. Also is there ever a chance a geoshape has no points?

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. I wonder if we can make the interface a bit more generic.

double getCentroidLat();
double getCentroidLon();
double width();
double height();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these name might be confusing since these are not width and height of the shape but rather width and height of the bounding box. Which makes me think maybe we should return a point and a bounding box here. In case of the bounding box we will also give access to min/max lat/lon which we have to calculate anyway.

@iverase
Copy link
Contributor Author

iverase commented May 11, 2021

Thanks for the quick feedback.

@imotov : Yes, I think a more generic interface is a must and height and width are weird as the mean distances but they are expressed in degrees. We normally represent distances in meters in geo objects. I have changed to the following:

class org.elasticsearch.index.fielddata.ScriptDocValues$Geometry {
   org.elasticsearch.common.geo.GeoPoint getCentroid()
   org.elasticsearch.common.geo.GeoBoundingBox getBoundingBox()
 }

class org.elasticsearch.common.geo.GeoBoundingBox {
   double top()
   double bottom()
   double left()
   double right()
 }

We can use two geopoints for GeoBoundingBox but because the current order is so anti natural (top left and bottom right) I decided to use singular calls. I can change my mind.

@jdconrad I added yaml test and I left some questions. I don't think we accept empty geoshapes so we should be good there.

script:
source: "doc['geo_shape'].getBoundingBox()"
- match: { hits.hits.0.fields.bbox.0.top_left.lat: 59.942749994806945 }
- match: { hits.hits.0.fields.bbox.0.top_left.lon: 24.045249950140715 }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use in source doc['geo_shape'].get(0), you get a horrible error. What should we do in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error?

Copy link
Contributor Author

@iverase iverase May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error which is expected:

1>             "type" : "illegal_argument_exception",
1>             "reason" : "cannot write xcontent for unknown value of type class org.elasticsearch.xpack.spatial.index.fielddata.GeoShapeValues$GeoShapeValue",
1>             "stack_trace" : "org.elasticsearch.ElasticsearchException$1: cannot write xcontent for unknown value of type class org.elasticsearch.xpack.spatial.index.fielddata.GeoShapeValues$GeoShapeValue

GeoShapeValue is just a binary doc value that do not have XContent representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH! I thought this referenced a Painless error. I see why the change to GeoBoundingBox now. I'll defer to @imotov for this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, painless is doing the right thing. Is there any guideline when the actual doc value has no XContent value but only methods to do stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, this is a bit of an edge case. geo_shape doc values are an abstract representation of the geometry where you can get some information like the centroid or bounding box or some actions like checking spatial relationships with other geometries.

I guess the question is if we should error out if someone try to get the serialise representation or we should consider some representation, e.g in that case we just return the dimensional type?

Copy link
Member

@rjernst rjernst May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field script that is used in the test returns Object. Whatever concrete type is concerned needs to be representable in JSON. The error is due to GeoShapeValue not being writable to xcontent. You'll need to add an xcontent writer (see for example XContentElasticsearchExtension.java which registers the writer for many date classes that may be returned from a script).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. The problem is the geo_shape doc values is an interval tree of the tessellated shape design to perform very fast spatial operations but it currently looses the ability to construct the original shape. Therefore we cannot serialise into anything meaningful to the user.

GeoShapeValue now implements ToXContentFragment and throws an error if toXContent is called. I have some ideas of what we can do here but it requires more discussions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to serialize it into GeoJSON of the original shape, but in the way it is currently stored it is indeed pretty useless. So throwing an error message or serializing it into a string containing a placeholder like "<geo_shape>" or something like this is probably the best we can do at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote goes to throw an error so users do not have the temptation to rely in whatever value is used as placeholder?

}
centroid.reset(centroidLat / count, centroidLon / count);
boundingBox.topLeft().reset(maxLat, minLon);
boundingBox.bottomRight().reset(minLat, maxLon);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my worries here is to make the construction of a GeoPoint script value expensive.

On the other hand it seems there is a small bug here as we were building new GeoPoint objects for each new doc value which seems wasteful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is an internal detail we could always attempt to improve this at a later time.

@@ -29,7 +29,7 @@
* A class representing a Geo-Bounding-Box for use by Geo queries and aggregations
* that deal with extents/rectangles representing rectangular areas of interest.
*/
public class GeoBoundingBox implements ToXContentObject, Writeable {
public class GeoBoundingBox implements ToXContentFragment, Writeable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to make GeoBoundingBox to behave more like a GeoPoint

@imotov
Copy link
Contributor

imotov commented May 11, 2021

If width and height are important, we can add them to GeoBoundingBox as well. On this class it will make perfect sense and will not be misleading.

@iverase
Copy link
Contributor Author

iverase commented May 11, 2021

After speaking with @imotov and more thinking, the final API should look lie:

class org.elasticsearch.index.fielddata.ScriptDocValues$Geometry {
  int getDimensionalType()
  org.elasticsearch.common.geo.GeoPoint getCentroid()
  org.elasticsearch.common.geo.GeoBoundingBox getBoundingBox()
 }

class org.elasticsearch.common.geo.GeoBoundingBox {
  org.elasticsearch.common.geo.GeoPoint topLeft()
  org.elasticsearch.common.geo.GeoPoint bottomRight()
 }

@iverase
Copy link
Contributor Author

iverase commented May 12, 2021

I added more test, in particular when the geometry is null. In the case getCentroid() and getBoundingBox() returns null and getDimensionalType() returns -1.

@iverase iverase marked this pull request as ready for review May 12, 2021 07:43
@iverase
Copy link
Contributor Author

iverase commented May 26, 2021

@jdconrad is there anything to do in order to move this forward?

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants