-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GeoSpatial functions in Presto #8878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thank you for extracting spatial functions into a separate PR and thanks for addressing comments. The code looks good. All functions seem to have tests for happy paths. It would be helpful to add some negative test cases to the test suite as well. I have a question about handling nulls returned by deserialize function below.
|
||
public final class GeoFunctions | ||
{ | ||
private static final String LINE = "LineString"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeometryUtils.LINE_STRING to avoid copy-pasting hard-coded strings
Ditto other constants.
private static final String MULTIPOINT = "MultiPoint"; | ||
private static final String POLYGON = "Polygon"; | ||
private static final String POINT = "Point"; | ||
private static final String GEOMETRY = "Geometry"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeometryType.NAME to avoid copy-pasting hard-coded strings
return getOGCType(s); | ||
} | ||
|
||
public static int getWkid(Slice slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't seem to be used. Did you mean to make it private and call it from deserialize line 71?
return result; | ||
} | ||
|
||
public static OGCType getGeomType(Slice slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't seem to be used.
} | ||
} | ||
|
||
public static final OGCType getOGCType(int value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used only by getGeomType which itself it not used.
public void testSTLineString() | ||
throws Exception | ||
{ | ||
assertQuery("select st_as_text(st_line_string('linestring(1 1, 2 2, 1 3)'))", "select 'LINESTRING (1 1, 2 2, 1 3)'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, test 'linestring empty' as well
public void testSTCentroid() | ||
throws Exception | ||
{ | ||
assertQuery("select st_as_text(st_centroid(st_geometry_from_text('polygon ((1 1, 1 4, 4 4, 4 1))')))", "select 'POINT (2.5 2.5)'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider testing st_centroid on a non-polygon shape
ditto other methods which apply to a restricted set of shape types
public void testSTIsClosed() | ||
throws Exception | ||
{ | ||
assertQuery("select st_is_closed(st_geometry_from_text('linestring(1 1, 2 2, 1 3, 1 1)'))", "select true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider testing linestring which is not closed
public void testSTIsEmpty() | ||
throws Exception | ||
{ | ||
assertQuery("select st_is_empty(st_geometry_from_text('POINT (1.5 2.5)'))", "select false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider testing empty shape
throws Exception | ||
{ | ||
assertQuery("select st_max_x(st_geometry_from_text('linestring(8 4, 4 8)'))", "select 8.0"); | ||
assertQuery("select st_max_y(st_geometry_from_text('linestring(8 4, 4 8)'))", "select 8.0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider testing using a linestring with different max_x and max_y; otherwise a typo in the implementation which makes max_x and max_y always return same value will go unnoticed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments
@SqlType(GEOMETRY) | ||
public static Slice stLineString(@SqlType(StandardTypes.VARCHAR) Slice input) | ||
{ | ||
OGCGeometry geometry = getGeometry(OGCGeometry.fromText(input.toStringUtf8()), "st_line_string", LINE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there well-known constants for things like "st_line_string"?
@Description("Construct Point Geometry") | ||
@ScalarFunction | ||
@SqlType(GEOMETRY) | ||
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double longitude, @SqlType(StandardTypes.DOUBLE) double latitude) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "latitude" and "longitude" vs just "x" and "y"? This is mixing concepts of euclidean geometry with geospatial systems, which include reference systems and geodetic datums.
This distinction is important when performing computations like calculating areas and lengths. E.g. the area of a "polygon" whose vertices are defined by latitude/longitude pairs (which is not really a polygon, but a slice of curved surface) is not the same as the area of a 2-d polygon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will fix it
} | ||
|
||
@Description("Returns Geometry of input WKT text") | ||
@ScalarFunction("st_geometry_from_text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this from_wkt
/ to_wkt
(similar to from_iso8601
/to_iso8601
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint, ST_GeometryFromText and ST_AsText are standard names for conversion functions to/from WKT. The way I think about it is "text" in the context of "spatial" (provided by ST_ prefix) implies Well-Known-Text. Similarly, ST_GeometryFromBinary and ST_AsBinary names are used for conversions to/from Well-Known-Binary format.
PostGIS: http://www.postgis.org/docs/ST_AsText.html
ESRI-for-Hadoop: https://github.com/Esri/spatial-framework-for-hadoop/wiki/UDF-Constructors
Speaking of function names, I noticed that standard names don't match Presto's convention: ST_GeometryFromText vs. st_geometry_from_text. Do you think it is better to keep standard names or change them to match Presto's style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we should keep the name. We should stay as close to the standard as possible.
BTW, are all the functions in this PR defined in the standard? Have we looked at what's their expected behavior and whether our implementation matches? Can you comment on the question I had above whether the library is meant to operate on euclidean geometry or with respect to a reference system and datum (WGS84, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , @martint , re: euclidean geometry - it is my understanding is that all computations in this PR are performed on a 2D plane. @zhenxiao , would you confirm?
The new Geometry type uses ESRI shape file format for serialization. That format allows to specify spatial reference, but also accepts null. Constructor functions ST_Point, ST_Polygon, ST_GeometryFromText accept WKT which doesn't include SRID, hence, all the instances of Geometry type have null spatial reference. Computations accept Geometry arguments, but strip spatial references via calls to OGCGeometry.getEsriGeometry() before invoking underlying library functions to perform the computation. For example,
public static double stArea(@SqlType(NAME) Slice input)
{
// Convert ESRI shape file to OGCGeometry
OGCGeometry geometry = getGeometry(deserialize(input), "ST_Area", POLYGON);
// Strip spatial reference and call calculateArea2D
return geometry.getEsriGeometry().calculateArea2D();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint I confirm all the implementations are 2D and euclidean geometry.
Yes, as @mbasmanova said, this PR is using DSRI shape for serialization/deserialization, and calling underlying library function calls to perform the computation
} | ||
|
||
@SqlNullable | ||
@Description("Returns true if and only if the line is closed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean the line is closed? Does it mean that if it's made of multiple segments all segments are connected? Or that it's a completely closed loop (i.e., the last segment connects to the first)? If the latter, this implementation seems incorrect.
boolean result = true; | ||
for (int i = 0; result && i < pathCount; i++) { | ||
Point start = lines.getPoint(lines.getPathStart(i)); | ||
Point end = lines.getPoint(lines.getPathEnd(i) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the getPoint/getPathEnd APIs, but shouldn't this be getPathEnd(i - 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getPathEnd: Returns the index immediately following the last index of the path. So use MultiPath.getPathEnd(i) -1 to get the last index of the path
return polygon.getPointCount() + polygon.getPathCount(); | ||
} | ||
else if (geometry.geometryType().equals(POINT)) { | ||
return geometry.getEsriGeometry().isEmpty() ? 0 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange. If I have an st_point, shouldn't count always be 1? What does it mean for the point to be "empty"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will fix it
} | ||
|
||
@SqlNullable | ||
@Description("Returns true if and only if the line is closed and simple") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does closed and simple mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple means:
Each figure of the instance must not intersect itself, except at its endpoints.
No two figures of the instance can intersect each other at a point that is not in both of their boundaries.
return line.isClosed() && line.isSimple(); | ||
} | ||
|
||
@Description("Returns the start point of an line") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of a line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
OGCGeometry geometry = getGeometry(deserialize(input), "st_start_point", LINE); | ||
MultiPath lines = (MultiPath) geometry.getEsriGeometry(); | ||
SpatialReference reference = geometry.getEsriSpatialReference(); | ||
return serialize(OGCGeometry.createFromEsriGeometry(lines.getPoint(0), reference)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the SpatialReference? shouldn't this just return lines.getPoint(0)
? (See my comment above regarding euclidean geometry vs geospatial)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to serialize an OGCGeometry, Point is not OGCGeometry
OGCGeometry geometry = getGeometry(deserialize(input), "st_end_point", LINE); | ||
MultiPath lines = (MultiPath) geometry.getEsriGeometry(); | ||
SpatialReference reference = geometry.getEsriSpatialReference(); | ||
return serialize(OGCGeometry.createFromEsriGeometry(lines.getPoint(lines.getPointCount() - 1), reference)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here
thank you @mbasmanova @martint |
|
||
Returns the Y coordinate of point. | ||
|
||
.. function:: st_point_number(geometry) -> bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was renamed to st_point_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum , FWIW, PostGIS has this function under ST_NumPoints name: http://postgis.org/docs/ST_NumPoints.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenGIS standard also mentions ST_NumPoints and ST_NumInteriorRings
@Description("Returns the number of interior rings in the polygon") | ||
@ScalarFunction | ||
@SqlType(StandardTypes.BIGINT) | ||
public static long stInteriorRingNumber(@SqlType(NAME) Slice input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be count instead of number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, PostGIS version is named ST_NumInteriorRings: http://postgis.org/docs/ST_NumInteriorRings.html
I didn't realize until now that SQL/MM is ISO standard 13249-3. I could only find a draft from 2004, and it seems the latest edition was published in 2016, so we should buy a copy for review. Which of these functions are covered by the standard? Does the behavior and naming match? |
@electrum , I found that PostGIS documentation conveniently marks functions which conform to SQL/MM, or OpenGIS, or both. Here is an example of ST_IsSimple from PostGIS: https://postgis.net/docs/ST_IsSimple.html Also, I found an overview of SQL/MM which I suspect is easier to digest than the actual standard document, but it is not as detailed: http://doesen0.informatik.uni-leipzig.de/proceedings/paper/68.pdf |
Thank you @electrum @mbasmanova @martint |
@martint @mbasmanova @electrum updated with a review of all functions matches name and logic in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , @martint , I've been walking through the functions defined in GeoFunctions.java and checking the naming and implementation with the standard document. I've checked functions up to ST_IsClosed and have some comments. I'll check the rest of the functions and post further comments in a separate review.
{ | ||
private GeoFunctions() {} | ||
|
||
@Description("Return a geometry type Line object from Well-Known Text representation (WKT)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the standard ST_LineFromText function is defined as a constructor for ST_LineString which is a more generic type then line. Would you replace "Line" with "LineString" in this documentation?
In addition, ST_LineFromText doesn't allow constructing multi-line strings and therefore should not allow MULTILINESTRING WKT input. Current implementation does. Would you restrict the implementation to just LINESTRING ? (The standard defines ST_MLineFromText as a constructor for multi-line-strings.)
@SqlType(NAME) | ||
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double x, @SqlType(StandardTypes.DOUBLE) double y) | ||
{ | ||
OGCGeometry geometry = getGeometry(OGCGeometry.createFromEsriGeometry(new Point(x, y), null), "ST_Point", POINT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getGeometry call here is redundant, because OGCGeometry.createFromEsriGeometry(new Point(x, y), null) can't return anything other than point. Perhaps, remove it.
return serialize(geometry); | ||
} | ||
|
||
@Description("Returns a polygon built from Well-Known Text representation (WKT)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent documentation: "a polygon built" vs. "a geometry type Line object"
return serialize(geometry); | ||
} | ||
|
||
@Description("Returns the area of the surface if it is a polygon") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference to "surface" here might be confusing because it may suggest the measurement around the sphere or ellipsoid as opposed to simple Euclidean measurement on a 2D plane. @zhenxiao , it might be useful to clarify how exactly the measurement is taken.
return geometry.getEsriGeometry().calculateArea2D(); | ||
} | ||
|
||
@Description("Return a Geometry from Well-Known Text representation (WKT)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency sake, consider replacing "Return a Geometry" with "Returns a geometry type object"
} | ||
|
||
@SqlNullable | ||
@Description("Return the Well-Known Text (WKT) representation of the geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Return -> Returns
return Slices.utf8Slice(deserialize(input).asText()); | ||
} | ||
|
||
@Description("Returns the geometric center of a geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, borrow the documentation from the standard and clarify that this method applies to polygons only?
"Returns the Point value that is the mathematical centroid of the Polygon"
However, the implementation doesn't seem to match the definition of centroid [1]: "the arithmetic mean ("average") position of all the points in the shape". Instead, the implementation calculates the center of the bounding box. Is this intended? The standard calls for "mathematical centroid" and PostGIS implements it that way [2]. I also noticed that ESRI-for-Hadoop implemented ST_Centroid as in this PR [3]: "ST_Centroid(polygon) returns the point that is the center of the polygon's envelope"
[1] https://en.wikipedia.org/wiki/Centroid
[2] https://postgis.net/docs/ST_Centroid.html
[3] https://github.com/Esri/spatial-framework-for-hadoop/wiki/UDF-Accessors#st_centroid
|
||
@Description("Return the coordinate dimension of the Geometry") | ||
@ScalarFunction("ST_CoordDim") | ||
@SqlType(StandardTypes.BIGINT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since dimension values are limited to 2, 3 or 4 would it make sense to use TINYINT instead of BIGINT?
|
||
@Description("Returns the inherent dimension of this Geometry object, which must be less than or equal to the coordinate dimension") | ||
@ScalarFunction("ST_Dimension") | ||
@SqlType(StandardTypes.BIGINT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since dimension values are limited to 2, 3 or 4 would it make sense to use TINYINT instead of BIGINT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some further comments on functions up to but not including ST_NPoints.
for (int i = 0; result && i < pathCount; i++) { | ||
Point start = lines.getPoint(lines.getPathStart(i)); | ||
Point end = lines.getPoint(lines.getPathEnd(i) - 1); | ||
result = result && end.equals(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this method can return false early if it detects a non-closed line
} | ||
|
||
@SqlNullable | ||
@Description("Returns TRUE if the LINESTRING's start and end points are coincident") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency sake, consider replacing LINESTRING with LineString and also mention Multi-LineString as the implementation allows both types.
} | ||
|
||
@SqlNullable | ||
@Description("Returns true if this Geometry is an empty geometrycollection, polygon, point etc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, borrow the documentation from the standard
"Returns TRUE if a geometry value corresponds to the empty set"
It would be helpful to decide how to refer to TRUE value in the docs. Some comments use all-upper-case TRUE while others use lower-case true.
return deserialize(input).isEmpty(); | ||
} | ||
|
||
@Description("Returns the 2d length of the geometry if it is a linestring or multilinestring") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, like for ST_Area, clarify what kind of computation is provided? Planar or geodesic?
} | ||
|
||
@Description("Returns X maxima of a bounding box 2d or 3d or a geometry") | ||
@ScalarFunction("ST_XMax") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this function in the SQL/MM standard, but I do see it available in PostGIS. What's the use case for this function? The documentation refers to 2d and 3d box objects, but I don't think this PR supports them. Ditto ST_YMax, ST_YMin, ST_XMin.
} | ||
|
||
@Description("Return the number of interior rings of the first polygon in the geometry") | ||
@ScalarFunction("ST_NumInteriorRings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the standard name doesn't have 's' at the end: ST_NumInteriorRing
return envelope.getYMin(); | ||
} | ||
|
||
@Description("Return the number of interior rings of the first polygon in the geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, borrow the documentation from the standard and clarify that this function only applies to polygons
"Returns the cardinality of the collection of interior rings of a polygon"
thank you @mbasmanova |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , I made a full pass other the functions in GeoFunctions.java . Here are remaining comments.
} | ||
|
||
@Description("Returns the number of points in the geometry") | ||
@ScalarFunction("ST_NPoints") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL/MM only defines ST_NumPoints which "returns the cardinality of the ST_Point collection in the ST_LineString value". ST_NPoints function comes from PostGIS and is non-standard extension of ST_NumPoints which applies to any geometry type.
Would it make sense to rename this function to ST_NumPoints and document is as a natural extension of the standard one?
else if (geometry.geometryType().equals(POINT)) { | ||
return 1; | ||
} | ||
else if (geometry.geometryType().equals(POLYGON)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polygon, MultiPoint and MultiPath have common base class MultiVertexGeometry which has a getPointCount method. Therefore, all three types can be handled with a single cast to MultiVertexGeometry.
return ((MultiVertexGeometry) geometry.getEsriGeometry()).getPointCount();
{ | ||
OGCGeometry geometry = deserialize(input); | ||
OGCGeometry bound = geometry.boundary(); | ||
if (bound.geometryType().equals(MULTI_LINE_STRING) && ((OGCMultiLineString) bound).numGeometries() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems unnecessary. I noticed that serialize function outputs 3 pieces of data: (1) SRID, (2) geometry type, (3) ESRI Shape File. However, deserialize function uses only (1) and (3) and implicitly derives (2) from (3). serialize-deserialize cycle applied to a MultiLineString of one LineString produces the same result as converting MultiLineString to a LineString first. Hence, two suggestions:
(1) remove this logic
(2) update serialize to not output shape type
return serialize(bound); | ||
} | ||
|
||
@Description("Returns a geometry representing the double precision (float8) bounding box of the supplied geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, borrow from the standard to clarify that return value is a polygon
"Returns the bounding rectangular polygon of a geometry"
return serialize(OGCGeometry.createFromEsriGeometry(envelope, reference)); | ||
} | ||
|
||
@Description("Returns a geometry that represents that part of geometry A that does not intersect with geometry B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: references to A and B don't match parameter names left and right
perhaps, copy from the SQL/MM: "Returns the Geometry value that represents the point set difference of two geometries"
OGCGeometry leftGeometry = deserialize(left); | ||
OGCGeometry rightGeometry = deserialize(right); | ||
OperatorSimpleRelation containsOperator = OperatorContains.local(); | ||
return containsOperator.execute(leftGeometry.getEsriGeometry(), rightGeometry.getEsriGeometry(), leftGeometry.getEsriSpatialReference(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not leftGeometry.contains(rightGeometry) ?
{ | ||
OGCGeometry leftGeometry = deserialize(left); | ||
OGCGeometry rightGeometry = deserialize(right); | ||
OperatorSimpleRelation crossesOperator = OperatorCrosses.local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not leftGeometry.crosses(rightGeometry) ?
} | ||
|
||
@SqlNullable | ||
@Description("Returns TRUE if the supplied geometries have some, but not all, interior points in common") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition seems correct. However, when I called this function with a line and a point on the line, I got FALSE. I believe this function should return TRUE in such a case. Does it work like this for you?
This is what I tried:
assertQuery("select st_crosses(st_line('linestring(0 0, 1 1)'), st_point(0, 0))", "select true");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mbasmanova I think:
st_crosses(st_line('linestring(0 0, 1 1)'), st_point(0, 0))
should return FALSE
as Point(0 0) is one point in linestring(0 0, 1 1), it is ALL in the LineString. ST_Crosses requires Not All in.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , I think you are right. Thanks for clarifying.
{ | ||
OGCGeometry leftGeometry = deserialize(left); | ||
OGCGeometry rightGeometry = deserialize(right); | ||
OperatorSimpleRelation disjointOperator = OperatorDisjoint.local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not leftGeometry.disjoint(rightGeometry) ? ditto ST_Intersects, ST_Overlaps, ST_Within, etc.
{ | ||
OGCGeometry leftGeometry = deserialize(left); | ||
OGCGeometry rightGeometry = deserialize(right); | ||
OperatorSimpleRelation equalsOperator = OperatorEquals.local(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call OGCGeometry.equals method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thanks for addressing my comments. I'm wondering why I can't see the changes you made in isolation. The only guess I have is that you are using force-push to update the PR (git push -f origin geoSpatial). If that's the case, could you omit '-f' in the next pushes? This will generate a commit per change and allow me to see changes between revisions. The PR will automatically combine all commits together and allow the final version to be merged as a single commit.
xCoordinate += polygon.getPoint(i).getX(); | ||
yCoordinate += polygon.getPoint(i).getY(); | ||
} | ||
Point centroid = new Point(xCoordinate / pointCount, yCoordinate / pointCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://en.wikipedia.org/wiki/Centroid this implementation calculates centroid of a set of points and not centroid of a polygon. The formula for the latter is more complex and includes an area of the polygon.
Also, for an polygon "POLYGON EMPTY", will pointCount be 0 and will it cause a division-by-zero error?
@martint , @electrum , re: "are all the functions in this PR defined in the standard?" - Yes, with the exception of ST_XMin, ST_XMax, ST_YMin, ST_YMax. These functions are missing from SQL/MM, but are available in PostGIS. They return the coordinates of the bounding box. How would you like to proceed with these? Would it be sufficient to document that these are non-standard? re: "Have we looked at what's their expected behavior and whether our implementation matches?" - I looked at all of the functions in GeoFunctions.java and cross-checked their names and implementation with SQL/MM. With a couple exceptions, things are looking good. We need to work through the implementation of ST_Centroid and I have a question about ST_Crosses. Overall, I'm wondering what are your thoughts on documentation for the functions. Given that SQL/MM standard is not freely available, should we provide more comprehensive explanation of the functions and include examples? Or should we refer folks to PostGIS documentation? Personally, I found PostGIS docs very clear. Should we also mention somewhere that while these functions follow SQL/MM standard they do not implement it in full? |
thank you @mbasmanova |
return Slices.utf8Slice(deserialize(input).asText()); | ||
} | ||
|
||
@Description("Returns the Point value that is the mathematical centroid of the set of points in Polygon") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL/MM defines ST_Centroid as "returns the ST_Point value that is the mathematical centroid of the ST_Surface value" which is different from the implementation of this function. PostGIS implements ST_Centroid according to the standard [1].
Why do you want a different implementation for this function?
} | ||
|
||
@SqlNullable | ||
@Description("Returns TRUE if the supplied geometries have some, but not all, interior points in common") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , I think you are right. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static Slice stBoundary(@SqlType(NAME) Slice input) | ||
{ | ||
OGCGeometry geometry = deserialize(input); | ||
OGCGeometry bound = geometry.boundary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this variable seems not needed; if you like to keep it, consider renaming to 'boundary'
{ | ||
OGCGeometry leftGeometry = deserialize(left); | ||
OGCGeometry rightGeometry = deserialize(right); | ||
checkArgument(leftGeometry.getEsriSpatialReference().equals(rightGeometry.getEsriSpatialReference()), "Input Geometries Spatial Reference do not match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this check is copy-pasted in multiple places; consider moving it into a helper method?
verifySameSR(leftGeometry, rightGeometry);
int wkid = input.SRID(); | ||
Geometry geometry = input.getEsriGeometry(); | ||
|
||
if (geometry == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When input is an instance of GeometryCollection, the geometry will be null. Therefore serializing GeometryCollection will always produce an empty slice. Is this intentional?
--- from com.esri.core.geometry.ogc.OGCConcreteGeometryCollection
public Geometry getEsriGeometry() {
return null;
}
public GeometryCursor getEsriGeometryCursor() {
return new OGCConcreteGeometryCollection.GeometryCursorOGC(this.geometries);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I did not find a good way to serialize GeometryCollection. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , at first I thought that it might be OK to not support GeometryCollection by disallowing GeometryCollection in GeometryFromText. However, the problem is that GeometryCollection may be created as a result of some of the operations. For example, ST_Intersection may return a collection:
ST_Intersection(ST_GeometryFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))'), ST_GeometryFromText('LINESTRING (-1 1, 1 -1, 1 2)'))
returns a point and a linestring: GEOMETRYCOLLECTION (POINT (0 0), LINESTRING (1 0, 1 1))
GeometryCollection can be serialized as an array of geometries. ArrayBlockBuilder may be handy. One option is for serialize to always produce an array of geometries and for deserialize to create GEOMETRYCOLLECTION when array size > 1 and GEOMETRYCOLLECTION EMPTY when array is empty. What do you think?
} | ||
byte[] shape = GeometryEngine.geometryToEsriShape(geometry); | ||
|
||
if (shape == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible? Under what circumstances GeometryEngine.geometryToEsriShape(geometry) returns null for non-null input? Is this the case for empty geometries? Wouldn't it be clearer to check that before calling geometryToEsriShape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shape will never be null, will update
public static OGCGeometry deserialize(Slice shape) | ||
{ | ||
requireNonNull(shape, "shape is null"); | ||
int wkid = shape.getInt(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this throw an exception If shape is an EMPTY_SLICE?
It might be helpful to add tests cases for serialize and deserialize functions to make sure all the cases work properly.
requireNonNull(shape, "shape is null"); | ||
int wkid = shape.getInt(0); | ||
ByteBuffer shapeBuffer = getShapeBuffer(shape); | ||
if (shapeBuffer.limit() < SIZE_WKID || shapeBuffer.getInt(0) == Geometry.Type.Unknown.value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'shape' slice has 2 fileds: (1) wkid, (2) shape buffer. It seems strange to check the size of the shape buffer against the size of the WKID field. Would you clarify?
@mbasmanova get comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thanks for updating the implementation of ST_Centroid. I think it is now correctly computing centroid for a point, a set of points, a line and a set of lines. The computation for polygons doesn't seem correct though. See details below.
double ySum = 0; | ||
for (int i = 0; i < pointCount; i++) { | ||
Point point = null; | ||
if (type.equals(MULTI_POINT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic can be simplified by capturing the result of (MultiVertexGeometry) geometry.getEsriGeometry() in a variable in the beginning of the method and using MultiVertexGeometry.getPoint(i) method.
weightSum += weight; | ||
double polygonXSum = 0; | ||
double polygonYSum = 0; | ||
for (int j = 0; j < polygon.getPointCount(); j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you compute the centroid of a polygon as a mean of its vertexes? Based on [1] and [2] the formula is different.
[1] http://www.seas.upenn.edu/~sys502/extra_materials/Polygon%20Area%20and%20Centroid.pdf
[2] https://en.wikipedia.org/wiki/Centroid#Properties
@mbasmanova thanks for your review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thanks for updating the implementation of ST_Centroid. It looks correct to me now.
for (int i = 0; i < polygon.getPointCount(); i++) { | ||
Point current = polygon.getPoint(i); | ||
Point next = (i + 1 == polygon.getPointCount()) ? polygon.getPoint(0) : polygon.getPoint(i + 1); | ||
xSum += (current.getX() + next.getX()) * (current.getX() * next.getY() - next.getX() * current.getY()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (current.getX() * next.getY() - next.getX() * current.getY()) computation repeats 3 times; consider saving the results in a variable for efficiently and readability
// Points centroid is arithmetic mean of the input points | ||
private static Point getPointsCentroid(OGCGeometry geometry) | ||
{ | ||
String type = geometry.geometryType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'type' variable doesn't seem to be used
// Polygon centroid: | ||
// c[x] = (Sigma(x[i] + x[i + 1]) * (x[i] * y[i + 1] - x[i + 1] * y[i]), for i = 0 to N - 1) / (6 * signedArea) | ||
// c[y] = (Sigma(y[i] + y[i + 1]) * (x[i] * y[i + 1] - x[i + 1] * y[i]), for i = 0 to N - 1) / (6 * signedArea) | ||
private static Point getPolygonCentroid(OGCGeometry geometry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , I just realized that this formula applies only to polygons without holes. The general case of polygon with one or more holes requires a bit more computation. I haven't found an official formula yet, but the following Stack Overflow answer may be correct. They compute the centroid as a weighted diff between outer inner polygons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that is more computation for it. How about we add a comment, and explicitly not support polygons with holes?
double signedArea = 0; | ||
for (int i = 0; i < polygon.getPointCount(); i++) { | ||
Point current = polygon.getPoint(i); | ||
Point next = (i + 1 == polygon.getPointCount()) ? polygon.getPoint(0) : polygon.getPoint(i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: WKT format for polygon requires that first and last points be the same. Hence, it may be sufficient to loop from 0 to polygon.getPointCount() - 1.
with @mbasmanova 's help, added serialization/deserialization for GeometryCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thanks for updating the tests.
public void testSTLineFromText() | ||
throws Exception | ||
{ | ||
assertQuery("select ST_AsText(ST_LineFromText('LINESTRING EMPTY'))", "select 'MULTILINESTRING EMPTY'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ST_AsText(ST_Point(input)) doesn't preserve the type of the input geometry when geometry is empty. Here LineString got converted to MultiLineString. Further down in this file I also see that Polygon gets converted to MultiPolygon. Is this expected? Perhaps, worth documenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is expected. Since when ST_AsText trying to deserialize, it only knows it is a polyline, not clear whether it is a line or multiline. I will document it.
{ | ||
assertQuery("select ST_AsText(ST_LineFromText('LINESTRING EMPTY'))", "select 'MULTILINESTRING EMPTY'"); | ||
assertQuery("select ST_AsText(ST_LineFromText('LineString(1 1, 2 2, 1 3)'))", "select 'LINESTRING (1 1, 2 2, 1 3)'"); | ||
assertQueryFails("select ST_AsText(ST_LineFromText('MULTILineString EMPTY'))", "ST_LineFromText only applies to LineString. Input type is: MultiLineString"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: throughout this test suite, the WKT strings are not consistent. For example, there are LineString, MULTILineString, MULTILINESTRING, etc. Would you take a pass and make them all align. Perhaps, uppercase all the WKT type prefixes?
'POLYGON...', 'POINT...', 'LINESTRING...'
{ | ||
assertQuery("select ST_Area(ST_GeometryFromText('Polygon ((2 2, 2 6, 6 6, 6 2))'))", "select 16.0"); | ||
assertQueryFails("select ST_Area(ST_GeometryFromText('Point (1 4)'))", "ST_Area only applies to Polygon. Input type is: Point"); | ||
assertQueryFails("select ST_Area(ST_GeometryFromText('POLYGON EMPTY'))", "ST_Area only applies to Polygon. Input type is: MultiPolygon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ST_Area for an empty polygon should not fail. According to SQL/MM spec it should return null. Also, I noticed that ST_Length for empty line-string returns 0. Let's make these two method consistent: both return 0 or null.
8.1.2 ST_Area Methods ...If SELF is an empty set, then return the null value.
BTW, why does ST_Area not allow multi-polygons?
assertQuery("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0))'))", "select 0"); | ||
assertQuery("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON ((0 0, 8 0, 0 8, 0 0), (1 1, 1 5, 5 1, 1 1))'))", "select 1"); | ||
assertQueryFails("select ST_NumInteriorRing(ST_GeometryFromText('LineString(8 4, 5 7)'))", "ST_NumInteriorRing only applies to Polygon. Input type is: LineString"); | ||
assertQueryFails("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON EMPTY'))", "ST_NumInteriorRing only applies to Polygon. Input type is: MultiPolygon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, ST_NumInteriorRing of an empty Polygon should return null.
assertQuery("select ST_AsText(ST_EndPoint(ST_GeometryFromText('LineString(8 4, 4 8, 5 6)')))", "select 'POINT (5 6)'"); | ||
assertQueryFails("select ST_AsText(ST_StartPoint(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))')))", "ST_StartPoint only applies to LineString. Input type is: Polygon"); | ||
assertQueryFails("select ST_AsText(ST_EndPoint(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))')))", "ST_EndPoint only applies to LineString. Input type is: Polygon"); | ||
assertQueryFails("select ST_AsText(ST_StartPoint(ST_GeometryFromText('LINESTRING EMPTY')))", "ST_StartPoint only applies to LineString. Input type is: MultiLineString"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ST_StartPoint of a empty line-string should return null
assertQuery("select ST_X(ST_GeometryFromText('Point (1 2)'))", "select 1.0"); | ||
assertQuery("select ST_Y(ST_GeometryFromText('Point (1 2)'))", "select 2.0"); | ||
assertQueryFails("select ST_Y(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))'))", "ST_Y only applies to Point. Input type is: Polygon"); | ||
assertQueryFails("select ST_Y(ST_GeometryFromText('POINT EMPTY'))", "This operation should not be performed on an empty geometry."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the spec, ST_Y of empty point should return null
assertQuery("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON ((1 1, 1 4, 4 1))')))", "select 'LINESTRING (1 1, 4 1, 1 4, 1 1)'"); | ||
assertQuery("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1))')))", "select 'LINESTRING (0 0, 5 0, 5 5, 0 5, 0 0)'"); | ||
assertQueryFails("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('LineString(1 1, 2 2, 1 3)')))", "ST_ExteriorRing only applies to Polygon. Input type is: LineString"); | ||
assertQueryFails("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON EMPTY')))", "ST_ExteriorRing only applies to Polygon. Input type is: MultiPolygon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the spec, ST_ExteriorRing of empty polygon should return null
@mbasmanova get comments addressed |
|
||
Presto GeoSpatial functions are compliant with the Open Geospatial Consortium’s (OGC) OpenGIS Specifications. As such, many Presto GeoSpatial functions require, or more accurately, assume that geometries that are operated on are both simple and valid. For example, it does not make sense to calculate the area of a polygon that has a hole defined outside of the polygon, or to construct a polygon from a non-simple boundary line. | ||
|
||
Presto GeoSpatial fucntions support the Well-Known Text (WKT) form of spatial objects. Here are the examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: fucntions
@Description("Returns a Geometry type LineString object from Well-Known Text representation (WKT)") | ||
@ScalarFunction("ST_LineFromText") | ||
@SqlType(NAME) | ||
public static Slice stLineString(@SqlType(StandardTypes.VARCHAR) Slice input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this method parseLine
.
import io.airlift.slice.Slices; | ||
import org.apache.commons.lang.StringUtils; | ||
|
||
import static com.facebook.presto.geospatial.GeometryType.NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAME
is too generic to import it statically.
|
||
@Description("Returns the area of a polygon using Euclidean measurement on a 2D plane (based on spatial ref) in projected units") | ||
@ScalarFunction("ST_Area") | ||
@SqlType(StandardTypes.DOUBLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these functions defined as returning double
in the spec? (vs Decimal, for instance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is defined as float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return leftGeometry.within(rightGeometry); | ||
} | ||
|
||
private static OGCGeometry getGeometry(OGCGeometry geometry, String function, String...validTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this method is misleading. It's just validating that the type is one of validTypes
.
BTW, the cost of this function could be non-trivial (due to having to create an array for the varargs argument and the loop/equality check against multiple types). A better approach might be to create an enum of well-known types and rewrite this as:
private static void validateType(String function, OGCGeometry geometry, Set<GeometryTypeName> validTypes)
{
GeometryTypeName actual = GeometryTypeName.valueOf(geometry.getType());
if (!validTypes.contains(actual)) {
throw ...
}
}
and call it like:
validateType("ST_xyz", geometry, EnumSet.of(ST_POINT, ST_LINE));
(also, note the slight change in order of arguments. I think that makes a little more logical sense)
ySum += (current.getY() + next.getY()) * ladder; | ||
signedArea += ladder; | ||
} | ||
return new Point(xSum / (signedArea * 3), ySum / (signedArea * 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments before the method say "6 * signedArea", but this multiplies by 3. Which one is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is 6. sum of ladder actually is 2 * signedArea, I merged computation there. Will update
return null; | ||
} | ||
|
||
return format("<%s>", getTypeSignature()).getBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there isn't a strict contract for what getObjectValue
should return, it's a good idea to return a json-serializable object that represents the value. This is currently used by the client protocol and can be handy for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very clear about it, is it good to return typeSignature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to return the WKT for the value, if possible. This is what's going to be rendered by the CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
} | ||
final BasicSliceInput input = shape.getInput(); | ||
|
||
int wkid = input.readInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the rest of the code, this is the ID of the coordinate system (or spatial reference), so I'd call it just that (i.e., spatialReferenceId, coordinateSystemId, etc). wkid
is not descriptive enough.
List<OGCGeometry> geometries = new ArrayList<>(); | ||
while (input.available() > 0) { | ||
int length = input.readInt(); | ||
byte[] geometryBytes = new byte[length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just do:
ByteBuffer buffer = input.readSlice(length).toByteBuffer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could keep track of the position you're reading at and just call shape.toByteBuffer(offset, length)
to avoid having to create the intermediate geometryBytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it. Will fix it
@@ -86,6 +86,12 @@ | |||
</artifact> | |||
</artifactSet> | |||
|
|||
<artifactSet to="plugin/geo"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geospatial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhenxiao , thanks for addressing the comments. This PR is looking good to me.
{ | ||
OGCGeometry geometry = getGeometry(deserialize(input), "ST_NumInteriorRing", POLYGON); | ||
return ((OGCPolygon) geometry).numInteriorRing(); | ||
OGCGeometry geometry = getGeometry(deserialize(input), "ST_NumInteriorRing", POLYGON, MULTI_POLYGON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be a type. ST_NumInteriorRing applies only to polygons. It doesn't apply to multi-polygon. Also, the implementation casts geometry to OGCPolygon. This cast will fail if geometry is multi-polygon. Right?
@Description("Returns the first point of a LINESTRING geometry as a Point") | ||
@ScalarFunction("ST_StartPoint") | ||
@SqlType(NAME) | ||
public static Slice stStartPoint(@SqlType(NAME) Slice input) | ||
{ | ||
OGCGeometry geometry = getGeometry(deserialize(input), "ST_StartPoint", LINE_STRING); | ||
OGCGeometry geometry = getGeometry(deserialize(input), "ST_StartPoint", LINE_STRING, MULTI_LINE_STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of the SQL/MM suggests that ST_StartPoint and ST_EndPoint are defined only for line strings. These are not defined for multi-line strings.
@mbasmanova @martint thanks for your comments |
4da8a5f
to
501e33c
Compare
Thanks @zhenxiao. It's getting very close. You seem to have missed this comment in your latest batch of changes: 501e33c#r139566928 |
thank you, @martint just addressed missing comments |
import static com.facebook.presto.tpch.TpchMetadata.TINY_SCHEMA_NAME; | ||
|
||
public class TestGeoQueries | ||
extends AbstractTestQueryFramework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason these test run full queries instead of using AbstractTestFunctions (see, for instance, TestJsonFunctions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
Thank you, @martint I get comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but there seem to be some build failures. Can you take a look?
@martint seems when running presto-product-tests, it does not build main test module, which I added getTypeRegistry() method. |
@zhenxiao probably it's a testbed issue; I have restarted |
nice of you, @highker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the last minute comments everyone. @zhenxiao please also squash the commits before we merge this.
GeoSpatial Functions | ||
===================== | ||
|
||
Presto GeoSpatial functions support SQL/MM specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence is somewhat isolated from the rest of the doc.
btw the SQL/MM specification.
|
||
.. function:: ST_Contains(Geometry, Geometry) -> boolean | ||
|
||
Returns TRUE if and only if no points of right lie in the exterior of left, and at least one point of the interior of left lies in the interior of right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left, right meaning the first and second argument I guess?
|
||
.. function:: ST_Intersects(Geometry, Geometry) -> boolean | ||
|
||
Returns TRUE if the Geometries spatially intersect in 2D - (share any portion of space) and FALSE if they don't (they are Disjoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disjoint
-> disjoint
.
|
||
.. function:: ST_Disjoint(Geometry, Geometry) -> boolean | ||
|
||
Returns TRUE if the Geometries do not "spatially intersect" - if they do not share any space together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given geometries
(as you used in the function below, also doesn't make sense to capitalize Geometries
).
|
||
.. function:: ST_Intersects(Geometry, Geometry) -> boolean | ||
|
||
Returns TRUE if the Geometries spatially intersect in 2D - (share any portion of space) and FALSE if they don't (they are Disjoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given geometries
|
||
.. function:: ST_Distance(Geometry, Geometry) -> double | ||
|
||
Returns the 2-dimensional cartesian minimum distance (based on spatial ref) between two Geometries in projected units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you use 2D above but 2-dimensional here, intentional?
|
||
Returns the last point of a LineString Geometry as a Point. | ||
|
||
.. function:: ST_X(point) -> double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why you don't capitalize as Point
as you do for other functions?
presto-geospatial/pom.xml
Outdated
<version>2.4</version> | ||
</dependency> | ||
|
||
<!-- Presto SPI --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary comment
@SqlType(GEOMETRY_TYPE_NAME) | ||
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double x, @SqlType(StandardTypes.DOUBLE) double y) | ||
{ | ||
OGCGeometry geometry = OGCGeometry.createFromEsriGeometry(new Point(x, y), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import createFromEsriGeometry
public static Slice serialize(OGCGeometry input) | ||
{ | ||
int spatialReferenceId = input.SRID(); | ||
final DynamicSliceOutput sliceOutput = new DynamicSliceOutput(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't usually declare local variables as final
.
there is a compilation problem @zhenxiao
|
Thank you @nezihyigitbasi I will work on it |
6ff8b89
to
6d1492d
Compare
@zhenxiao why did you add |
6d1492d
to
dd1dc72
Compare
@nezihyigitbasi getting complain about using undeclared dependencies without adding test-jar for presto-main |
please update the |
dd1dc72
to
1aa3e84
Compare
@nezihyigitbasi get your comments addressed |
aha, product tests running good now |
The tests look good (except a known flaky one) so I merged the PR. Thanks @zhenxiao! |
Extract Geometry Type and functions in this commit from:
#5435
@mbasmanova @martint @dain