-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add painless script support for Geoshape field #72886
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
This looks good to me. I'd like to see some tests in Painless (guessing yaml makes the most sense) just to see this working correctly. Also is there ever a chance a geoshape has no points? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. I wonder if we can make the interface a bit more generic.
double getCentroidLat(); | ||
double getCentroidLon(); | ||
double width(); | ||
double height(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these name might be confusing since these are not width and height of the shape but rather width and height of the bounding box. Which makes me think maybe we should return a point and a bounding box here. In case of the bounding box we will also give access to min/max lat/lon which we have to calculate anyway.
Thanks for the quick feedback. @imotov : Yes, I think a more generic interface is a must and height and width are weird as the mean distances but they are expressed in degrees. We normally represent distances in meters in geo objects. I have changed to the following:
We can use two geopoints for @jdconrad I added yaml test and I left some questions. I don't think we accept empty geoshapes so we should be good there. |
script: | ||
source: "doc['geo_shape'].getBoundingBox()" | ||
- match: { hits.hits.0.fields.bbox.0.top_left.lat: 59.942749994806945 } | ||
- match: { hits.hits.0.fields.bbox.0.top_left.lon: 24.045249950140715 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use in source doc['geo_shape'].get(0), you get a horrible error. What should we do in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error which is expected:
1> "type" : "illegal_argument_exception",
1> "reason" : "cannot write xcontent for unknown value of type class org.elasticsearch.xpack.spatial.index.fielddata.GeoShapeValues$GeoShapeValue",
1> "stack_trace" : "org.elasticsearch.ElasticsearchException$1: cannot write xcontent for unknown value of type class org.elasticsearch.xpack.spatial.index.fielddata.GeoShapeValues$GeoShapeValue
GeoShapeValue is just a binary doc value that do not have XContent representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH! I thought this referenced a Painless error. I see why the change to GeoBoundingBox now. I'll defer to @imotov for this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, painless is doing the right thing. Is there any guideline when the actual doc value has no XContent value but only methods to do stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, this is a bit of an edge case. geo_shape doc values are an abstract representation of the geometry where you can get some information like the centroid or bounding box or some actions like checking spatial relationships with other geometries.
I guess the question is if we should error out if someone try to get the serialise representation or we should consider some representation, e.g in that case we just return the dimensional type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field script that is used in the test returns Object. Whatever concrete type is concerned needs to be representable in JSON. The error is due to GeoShapeValue not being writable to xcontent. You'll need to add an xcontent writer (see for example XContentElasticsearchExtension.java which registers the writer for many date classes that may be returned from a script).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. The problem is the geo_shape doc values is an interval tree of the tessellated shape design to perform very fast spatial operations but it currently looses the ability to construct the original shape. Therefore we cannot serialise into anything meaningful to the user.
GeoShapeValue now implements ToXContentFragment and throws an error if toXContent is called. I have some ideas of what we can do here but it requires more discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome to serialize it into GeoJSON of the original shape, but in the way it is currently stored it is indeed pretty useless. So throwing an error message or serializing it into a string containing a placeholder like "<geo_shape>"
or something like this is probably the best we can do at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote goes to throw an error so users do not have the temptation to rely in whatever value is used as placeholder?
} | ||
centroid.reset(centroidLat / count, centroidLon / count); | ||
boundingBox.topLeft().reset(maxLat, minLon); | ||
boundingBox.bottomRight().reset(minLat, maxLon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my worries here is to make the construction of a GeoPoint script value expensive.
On the other hand it seems there is a small bug here as we were building new GeoPoint objects for each new doc value which seems wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is an internal detail we could always attempt to improve this at a later time.
@@ -29,7 +29,7 @@ | |||
* A class representing a Geo-Bounding-Box for use by Geo queries and aggregations | |||
* that deal with extents/rectangles representing rectangular areas of interest. | |||
*/ | |||
public class GeoBoundingBox implements ToXContentObject, Writeable { | |||
public class GeoBoundingBox implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to make GeoBoundingBox to behave more like a GeoPoint
If width and height are important, we can add them to GeoBoundingBox as well. On this class it will make perfect sense and will not be misleading. |
After speaking with @imotov and more thinking, the final API should look lie:
|
I added more test, in particular when the geometry is null. In the case getCentroid() and getBoundingBox() returns null and getDimensionalType() returns -1. |
@jdconrad is there anything to do in order to move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding this.
This PR proposes the extension of Painless API to support accessing selected information stored in geo fields. The new methods are based in two immediate needs:
The new methods will ideally be supported by both geo points and geo shapes.
This PR added a new abstract class
ScriptDocValues.Geometry
that is the base class forScriptDocValues.GeoPoints
andGeoShapeScriptValues
. This class exposes the following new methods:Let me know what you think.