Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GeoSpatial functions in Presto #8878

Closed
wants to merge 1 commit into from

Conversation

zhenxiao
Copy link
Collaborator

Extract Geometry Type and functions in this commit from:
#5435

@mbasmanova @martint @dain

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thank you for extracting spatial functions into a separate PR and thanks for addressing comments. The code looks good. All functions seem to have tests for happy paths. It would be helpful to add some negative test cases to the test suite as well. I have a question about handling nulls returned by deserialize function below.


public final class GeoFunctions
{
private static final String LINE = "LineString";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeometryUtils.LINE_STRING to avoid copy-pasting hard-coded strings

Ditto other constants.

private static final String MULTIPOINT = "MultiPoint";
private static final String POLYGON = "Polygon";
private static final String POINT = "Point";
private static final String GEOMETRY = "Geometry";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeometryType.NAME to avoid copy-pasting hard-coded strings

return getOGCType(s);
}

public static int getWkid(Slice slice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't seem to be used. Did you mean to make it private and call it from deserialize line 71?

return result;
}

public static OGCType getGeomType(Slice slice)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't seem to be used.

}
}

public static final OGCType getOGCType(int value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used only by getGeomType which itself it not used.

public void testSTLineString()
throws Exception
{
assertQuery("select st_as_text(st_line_string('linestring(1 1, 2 2, 1 3)'))", "select 'LINESTRING (1 1, 2 2, 1 3)'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, test 'linestring empty' as well

public void testSTCentroid()
throws Exception
{
assertQuery("select st_as_text(st_centroid(st_geometry_from_text('polygon ((1 1, 1 4, 4 4, 4 1))')))", "select 'POINT (2.5 2.5)'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider testing st_centroid on a non-polygon shape

ditto other methods which apply to a restricted set of shape types

public void testSTIsClosed()
throws Exception
{
assertQuery("select st_is_closed(st_geometry_from_text('linestring(1 1, 2 2, 1 3, 1 1)'))", "select true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider testing linestring which is not closed

public void testSTIsEmpty()
throws Exception
{
assertQuery("select st_is_empty(st_geometry_from_text('POINT (1.5 2.5)'))", "select false");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider testing empty shape

throws Exception
{
assertQuery("select st_max_x(st_geometry_from_text('linestring(8 4, 4 8)'))", "select 8.0");
assertQuery("select st_max_y(st_geometry_from_text('linestring(8 4, 4 8)'))", "select 8.0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider testing using a linestring with different max_x and max_y; otherwise a typo in the implementation which makes max_x and max_y always return same value will go unnoticed

Copy link
Contributor

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

@SqlType(GEOMETRY)
public static Slice stLineString(@SqlType(StandardTypes.VARCHAR) Slice input)
{
OGCGeometry geometry = getGeometry(OGCGeometry.fromText(input.toStringUtf8()), "st_line_string", LINE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there well-known constants for things like "st_line_string"?

@Description("Construct Point Geometry")
@ScalarFunction
@SqlType(GEOMETRY)
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double longitude, @SqlType(StandardTypes.DOUBLE) double latitude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "latitude" and "longitude" vs just "x" and "y"? This is mixing concepts of euclidean geometry with geospatial systems, which include reference systems and geodetic datums.

This distinction is important when performing computations like calculating areas and lengths. E.g. the area of a "polygon" whose vertices are defined by latitude/longitude pairs (which is not really a polygon, but a slice of curved surface) is not the same as the area of a 2-d polygon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will fix it

}

@Description("Returns Geometry of input WKT text")
@ScalarFunction("st_geometry_from_text")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this from_wkt / to_wkt (similar to from_iso8601/to_iso8601)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it

Copy link
Contributor

@mbasmanova mbasmanova Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint, ST_GeometryFromText and ST_AsText are standard names for conversion functions to/from WKT. The way I think about it is "text" in the context of "spatial" (provided by ST_ prefix) implies Well-Known-Text. Similarly, ST_GeometryFromBinary and ST_AsBinary names are used for conversions to/from Well-Known-Binary format.

PostGIS: http://www.postgis.org/docs/ST_AsText.html
ESRI-for-Hadoop: https://github.com/Esri/spatial-framework-for-hadoop/wiki/UDF-Constructors

Speaking of function names, I noticed that standard names don't match Presto's convention: ST_GeometryFromText vs. st_geometry_from_text. Do you think it is better to keep standard names or change them to match Presto's style?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we should keep the name. We should stay as close to the standard as possible.

BTW, are all the functions in this PR defined in the standard? Have we looked at what's their expected behavior and whether our implementation matches? Can you comment on the question I had above whether the library is meant to operate on euclidean geometry or with respect to a reference system and datum (WGS84, etc)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , @martint , re: euclidean geometry - it is my understanding is that all computations in this PR are performed on a 2D plane. @zhenxiao , would you confirm?

The new Geometry type uses ESRI shape file format for serialization. That format allows to specify spatial reference, but also accepts null. Constructor functions ST_Point, ST_Polygon, ST_GeometryFromText accept WKT which doesn't include SRID, hence, all the instances of Geometry type have null spatial reference. Computations accept Geometry arguments, but strip spatial references via calls to OGCGeometry.getEsriGeometry() before invoking underlying library functions to perform the computation. For example,

public static double stArea(@SqlType(NAME) Slice input)
{
// Convert ESRI shape file to OGCGeometry
OGCGeometry geometry = getGeometry(deserialize(input), "ST_Area", POLYGON);
// Strip spatial reference and call calculateArea2D
return geometry.getEsriGeometry().calculateArea2D();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint I confirm all the implementations are 2D and euclidean geometry.
Yes, as @mbasmanova said, this PR is using DSRI shape for serialization/deserialization, and calling underlying library function calls to perform the computation

}

@SqlNullable
@Description("Returns true if and only if the line is closed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean the line is closed? Does it mean that if it's made of multiple segments all segments are connected? Or that it's a completely closed loop (i.e., the last segment connects to the first)? If the latter, this implementation seems incorrect.

boolean result = true;
for (int i = 0; result && i < pathCount; i++) {
Point start = lines.getPoint(lines.getPathStart(i));
Point end = lines.getPoint(lines.getPathEnd(i) - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the getPoint/getPathEnd APIs, but shouldn't this be getPathEnd(i - 1) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPathEnd: Returns the index immediately following the last index of the path. So use MultiPath.getPathEnd(i) -1 to get the last index of the path

return polygon.getPointCount() + polygon.getPathCount();
}
else if (geometry.geometryType().equals(POINT)) {
return geometry.getEsriGeometry().isEmpty() ? 0 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange. If I have an st_point, shouldn't count always be 1? What does it mean for the point to be "empty"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will fix it

}

@SqlNullable
@Description("Returns true if and only if the line is closed and simple")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does closed and simple mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple means:
Each figure of the instance must not intersect itself, except at its endpoints.
No two figures of the instance can intersect each other at a point that is not in both of their boundaries.

return line.isClosed() && line.isSimple();
}

@Description("Returns the start point of an line")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of a line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

OGCGeometry geometry = getGeometry(deserialize(input), "st_start_point", LINE);
MultiPath lines = (MultiPath) geometry.getEsriGeometry();
SpatialReference reference = geometry.getEsriSpatialReference();
return serialize(OGCGeometry.createFromEsriGeometry(lines.getPoint(0), reference));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the SpatialReference? shouldn't this just return lines.getPoint(0)? (See my comment above regarding euclidean geometry vs geospatial)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to serialize an OGCGeometry, Point is not OGCGeometry

OGCGeometry geometry = getGeometry(deserialize(input), "st_end_point", LINE);
MultiPath lines = (MultiPath) geometry.getEsriGeometry();
SpatialReference reference = geometry.getEsriSpatialReference();
return serialize(OGCGeometry.createFromEsriGeometry(lines.getPoint(lines.getPointCount() - 1), reference));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

@zhenxiao
Copy link
Collaborator Author

thank you @mbasmanova @martint
Get your comments addressed with updated PR


Returns the Y coordinate of point.

.. function:: st_point_number(geometry) -> bigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was renamed to st_point_count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum , FWIW, PostGIS has this function under ST_NumPoints name: http://postgis.org/docs/ST_NumPoints.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenGIS standard also mentions ST_NumPoints and ST_NumInteriorRings

@Description("Returns the number of interior rings in the polygon")
@ScalarFunction
@SqlType(StandardTypes.BIGINT)
public static long stInteriorRingNumber(@SqlType(NAME) Slice input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be count instead of number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, PostGIS version is named ST_NumInteriorRings: http://postgis.org/docs/ST_NumInteriorRings.html

@electrum
Copy link
Contributor

I didn't realize until now that SQL/MM is ISO standard 13249-3. I could only find a draft from 2004, and it seems the latest edition was published in 2016, so we should buy a copy for review.

Which of these functions are covered by the standard? Does the behavior and naming match?

@mbasmanova
Copy link
Contributor

@electrum , I found that PostGIS documentation conveniently marks functions which conform to SQL/MM, or OpenGIS, or both.

Here is an example of ST_IsSimple from PostGIS: https://postgis.net/docs/ST_IsSimple.html
And here is a list of all ST_* functions supported by PostGIS: http://postgis.org/docs/reference.html I've been referring to this documentation for checking whether functions in this PR are conforming to the standard or not. (One function which seems non-standard is st_envelope_intersect. It can be replaced with a composition of standard functions: st_intersect(st_envelope(g1), st_envelope(g2)).)

Also, I found an overview of SQL/MM which I suspect is easier to digest than the actual standard document, but it is not as detailed: http://doesen0.informatik.uni-leipzig.de/proceedings/paper/68.pdf

@zhenxiao
Copy link
Collaborator Author

Thank you @electrum @mbasmanova @martint
I will review postGIS and updated any function names that do not match

@zhenxiao
Copy link
Collaborator Author

@martint @mbasmanova @electrum updated with a review of all functions matches name and logic in:
http://doesen0.informatik.uni-leipzig.de/proceedings/paper/68.pdf

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , @martint , I've been walking through the functions defined in GeoFunctions.java and checking the naming and implementation with the standard document. I've checked functions up to ST_IsClosed and have some comments. I'll check the rest of the functions and post further comments in a separate review.

{
private GeoFunctions() {}

@Description("Return a geometry type Line object from Well-Known Text representation (WKT)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the standard ST_LineFromText function is defined as a constructor for ST_LineString which is a more generic type then line. Would you replace "Line" with "LineString" in this documentation?

In addition, ST_LineFromText doesn't allow constructing multi-line strings and therefore should not allow MULTILINESTRING WKT input. Current implementation does. Would you restrict the implementation to just LINESTRING ? (The standard defines ST_MLineFromText as a constructor for multi-line-strings.)

@SqlType(NAME)
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double x, @SqlType(StandardTypes.DOUBLE) double y)
{
OGCGeometry geometry = getGeometry(OGCGeometry.createFromEsriGeometry(new Point(x, y), null), "ST_Point", POINT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getGeometry call here is redundant, because OGCGeometry.createFromEsriGeometry(new Point(x, y), null) can't return anything other than point. Perhaps, remove it.

return serialize(geometry);
}

@Description("Returns a polygon built from Well-Known Text representation (WKT)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent documentation: "a polygon built" vs. "a geometry type Line object"

return serialize(geometry);
}

@Description("Returns the area of the surface if it is a polygon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to "surface" here might be confusing because it may suggest the measurement around the sphere or ellipsoid as opposed to simple Euclidean measurement on a 2D plane. @zhenxiao , it might be useful to clarify how exactly the measurement is taken.

return geometry.getEsriGeometry().calculateArea2D();
}

@Description("Return a Geometry from Well-Known Text representation (WKT)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency sake, consider replacing "Return a Geometry" with "Returns a geometry type object"

}

@SqlNullable
@Description("Return the Well-Known Text (WKT) representation of the geometry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Return -> Returns

return Slices.utf8Slice(deserialize(input).asText());
}

@Description("Returns the geometric center of a geometry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, borrow the documentation from the standard and clarify that this method applies to polygons only?

"Returns the Point value that is the mathematical centroid of the Polygon"

However, the implementation doesn't seem to match the definition of centroid [1]: "the arithmetic mean ("average") position of all the points in the shape". Instead, the implementation calculates the center of the bounding box. Is this intended? The standard calls for "mathematical centroid" and PostGIS implements it that way [2]. I also noticed that ESRI-for-Hadoop implemented ST_Centroid as in this PR [3]: "ST_Centroid(polygon) returns the point that is the center of the polygon's envelope"

[1] https://en.wikipedia.org/wiki/Centroid
[2] https://postgis.net/docs/ST_Centroid.html
[3] https://github.com/Esri/spatial-framework-for-hadoop/wiki/UDF-Accessors#st_centroid


@Description("Return the coordinate dimension of the Geometry")
@ScalarFunction("ST_CoordDim")
@SqlType(StandardTypes.BIGINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since dimension values are limited to 2, 3 or 4 would it make sense to use TINYINT instead of BIGINT?


@Description("Returns the inherent dimension of this Geometry object, which must be less than or equal to the coordinate dimension")
@ScalarFunction("ST_Dimension")
@SqlType(StandardTypes.BIGINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since dimension values are limited to 2, 3 or 4 would it make sense to use TINYINT instead of BIGINT?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some further comments on functions up to but not including ST_NPoints.

for (int i = 0; result && i < pathCount; i++) {
Point start = lines.getPoint(lines.getPathStart(i));
Point end = lines.getPoint(lines.getPathEnd(i) - 1);
result = result && end.equals(start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like this method can return false early if it detects a non-closed line

}

@SqlNullable
@Description("Returns TRUE if the LINESTRING's start and end points are coincident")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency sake, consider replacing LINESTRING with LineString and also mention Multi-LineString as the implementation allows both types.

}

@SqlNullable
@Description("Returns true if this Geometry is an empty geometrycollection, polygon, point etc")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, borrow the documentation from the standard

"Returns TRUE if a geometry value corresponds to the empty set"

It would be helpful to decide how to refer to TRUE value in the docs. Some comments use all-upper-case TRUE while others use lower-case true.

return deserialize(input).isEmpty();
}

@Description("Returns the 2d length of the geometry if it is a linestring or multilinestring")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, like for ST_Area, clarify what kind of computation is provided? Planar or geodesic?

}

@Description("Returns X maxima of a bounding box 2d or 3d or a geometry")
@ScalarFunction("ST_XMax")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function in the SQL/MM standard, but I do see it available in PostGIS. What's the use case for this function? The documentation refers to 2d and 3d box objects, but I don't think this PR supports them. Ditto ST_YMax, ST_YMin, ST_XMin.

}

@Description("Return the number of interior rings of the first polygon in the geometry")
@ScalarFunction("ST_NumInteriorRings")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the standard name doesn't have 's' at the end: ST_NumInteriorRing

return envelope.getYMin();
}

@Description("Return the number of interior rings of the first polygon in the geometry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, borrow the documentation from the standard and clarify that this function only applies to polygons

"Returns the cardinality of the collection of interior rings of a polygon"

@zhenxiao
Copy link
Collaborator Author

zhenxiao commented Sep 5, 2017

thank you @mbasmanova
I get your comments addressed.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , I made a full pass other the functions in GeoFunctions.java . Here are remaining comments.

}

@Description("Returns the number of points in the geometry")
@ScalarFunction("ST_NPoints")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL/MM only defines ST_NumPoints which "returns the cardinality of the ST_Point collection in the ST_LineString value". ST_NPoints function comes from PostGIS and is non-standard extension of ST_NumPoints which applies to any geometry type.

Would it make sense to rename this function to ST_NumPoints and document is as a natural extension of the standard one?

else if (geometry.geometryType().equals(POINT)) {
return 1;
}
else if (geometry.geometryType().equals(POLYGON)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polygon, MultiPoint and MultiPath have common base class MultiVertexGeometry which has a getPointCount method. Therefore, all three types can be handled with a single cast to MultiVertexGeometry.

return ((MultiVertexGeometry) geometry.getEsriGeometry()).getPointCount();

{
OGCGeometry geometry = deserialize(input);
OGCGeometry bound = geometry.boundary();
if (bound.geometryType().equals(MULTI_LINE_STRING) && ((OGCMultiLineString) bound).numGeometries() == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems unnecessary. I noticed that serialize function outputs 3 pieces of data: (1) SRID, (2) geometry type, (3) ESRI Shape File. However, deserialize function uses only (1) and (3) and implicitly derives (2) from (3). serialize-deserialize cycle applied to a MultiLineString of one LineString produces the same result as converting MultiLineString to a LineString first. Hence, two suggestions:

(1) remove this logic
(2) update serialize to not output shape type

return serialize(bound);
}

@Description("Returns a geometry representing the double precision (float8) bounding box of the supplied geometry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, borrow from the standard to clarify that return value is a polygon

"Returns the bounding rectangular polygon of a geometry"

return serialize(OGCGeometry.createFromEsriGeometry(envelope, reference));
}

@Description("Returns a geometry that represents that part of geometry A that does not intersect with geometry B")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: references to A and B don't match parameter names left and right

perhaps, copy from the SQL/MM: "Returns the Geometry value that represents the point set difference of two geometries"

OGCGeometry leftGeometry = deserialize(left);
OGCGeometry rightGeometry = deserialize(right);
OperatorSimpleRelation containsOperator = OperatorContains.local();
return containsOperator.execute(leftGeometry.getEsriGeometry(), rightGeometry.getEsriGeometry(), leftGeometry.getEsriSpatialReference(), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not leftGeometry.contains(rightGeometry) ?

{
OGCGeometry leftGeometry = deserialize(left);
OGCGeometry rightGeometry = deserialize(right);
OperatorSimpleRelation crossesOperator = OperatorCrosses.local();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not leftGeometry.crosses(rightGeometry) ?

}

@SqlNullable
@Description("Returns TRUE if the supplied geometries have some, but not all, interior points in common")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition seems correct. However, when I called this function with a line and a point on the line, I got FALSE. I believe this function should return TRUE in such a case. Does it work like this for you?

This is what I tried:

assertQuery("select st_crosses(st_line('linestring(0 0, 1 1)'), st_point(0, 0))", "select true");

https://postgis.net/docs/ST_Crosses.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mbasmanova I think:
st_crosses(st_line('linestring(0 0, 1 1)'), st_point(0, 0))

should return FALSE

as Point(0 0) is one point in linestring(0 0, 1 1), it is ALL in the LineString. ST_Crosses requires Not All in.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , I think you are right. Thanks for clarifying.

{
OGCGeometry leftGeometry = deserialize(left);
OGCGeometry rightGeometry = deserialize(right);
OperatorSimpleRelation disjointOperator = OperatorDisjoint.local();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not leftGeometry.disjoint(rightGeometry) ? ditto ST_Intersects, ST_Overlaps, ST_Within, etc.

{
OGCGeometry leftGeometry = deserialize(left);
OGCGeometry rightGeometry = deserialize(right);
OperatorSimpleRelation equalsOperator = OperatorEquals.local();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call OGCGeometry.equals method?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thanks for addressing my comments. I'm wondering why I can't see the changes you made in isolation. The only guess I have is that you are using force-push to update the PR (git push -f origin geoSpatial). If that's the case, could you omit '-f' in the next pushes? This will generate a commit per change and allow me to see changes between revisions. The PR will automatically combine all commits together and allow the final version to be merged as a single commit.

xCoordinate += polygon.getPoint(i).getX();
yCoordinate += polygon.getPoint(i).getY();
}
Point centroid = new Point(xCoordinate / pointCount, yCoordinate / pointCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://en.wikipedia.org/wiki/Centroid this implementation calculates centroid of a set of points and not centroid of a polygon. The formula for the latter is more complex and includes an area of the polygon.

Also, for an polygon "POLYGON EMPTY", will pointCount be 0 and will it cause a division-by-zero error?

@mbasmanova
Copy link
Contributor

@martint , @electrum , re: "are all the functions in this PR defined in the standard?" - Yes, with the exception of ST_XMin, ST_XMax, ST_YMin, ST_YMax. These functions are missing from SQL/MM, but are available in PostGIS. They return the coordinates of the bounding box. How would you like to proceed with these? Would it be sufficient to document that these are non-standard?

re: "Have we looked at what's their expected behavior and whether our implementation matches?" - I looked at all of the functions in GeoFunctions.java and cross-checked their names and implementation with SQL/MM. With a couple exceptions, things are looking good. We need to work through the implementation of ST_Centroid and I have a question about ST_Crosses.

Overall, I'm wondering what are your thoughts on documentation for the functions. Given that SQL/MM standard is not freely available, should we provide more comprehensive explanation of the functions and include examples? Or should we refer folks to PostGIS documentation? Personally, I found PostGIS docs very clear. Should we also mention somewhere that while these functions follow SQL/MM standard they do not implement it in full?

@zhenxiao
Copy link
Collaborator Author

zhenxiao commented Sep 6, 2017

thank you @mbasmanova
get your comments addressed. I also put a comment for ST_Crosses
Added additional commits to previous ones

return Slices.utf8Slice(deserialize(input).asText());
}

@Description("Returns the Point value that is the mathematical centroid of the set of points in Polygon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL/MM defines ST_Centroid as "returns the ST_Point value that is the mathematical centroid of the ST_Surface value" which is different from the implementation of this function. PostGIS implements ST_Centroid according to the standard [1].

Why do you want a different implementation for this function?

https://postgis.net/docs/ST_Centroid.html

}

@SqlNullable
@Description("Returns TRUE if the supplied geometries have some, but not all, interior points in common")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , I think you are right. Thanks for clarifying.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , @martint , the functions look good to me, except for ST_Centroid (see comments below).

public static Slice stBoundary(@SqlType(NAME) Slice input)
{
OGCGeometry geometry = deserialize(input);
OGCGeometry bound = geometry.boundary();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this variable seems not needed; if you like to keep it, consider renaming to 'boundary'

{
OGCGeometry leftGeometry = deserialize(left);
OGCGeometry rightGeometry = deserialize(right);
checkArgument(leftGeometry.getEsriSpatialReference().equals(rightGeometry.getEsriSpatialReference()), "Input Geometries Spatial Reference do not match");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this check is copy-pasted in multiple places; consider moving it into a helper method?

verifySameSR(leftGeometry, rightGeometry);

int wkid = input.SRID();
Geometry geometry = input.getEsriGeometry();

if (geometry == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When input is an instance of GeometryCollection, the geometry will be null. Therefore serializing GeometryCollection will always produce an empty slice. Is this intentional?

--- from com.esri.core.geometry.ogc.OGCConcreteGeometryCollection

public Geometry getEsriGeometry() {
return null;
}

public GeometryCursor getEsriGeometryCursor() {
    return new OGCConcreteGeometryCollection.GeometryCursorOGC(this.geometries);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I did not find a good way to serialize GeometryCollection. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , at first I thought that it might be OK to not support GeometryCollection by disallowing GeometryCollection in GeometryFromText. However, the problem is that GeometryCollection may be created as a result of some of the operations. For example, ST_Intersection may return a collection:

ST_Intersection(ST_GeometryFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))'), ST_GeometryFromText('LINESTRING (-1 1, 1 -1, 1 2)'))

returns a point and a linestring: GEOMETRYCOLLECTION (POINT (0 0), LINESTRING (1 0, 1 1))

GeometryCollection can be serialized as an array of geometries. ArrayBlockBuilder may be handy. One option is for serialize to always produce an array of geometries and for deserialize to create GEOMETRYCOLLECTION when array size > 1 and GEOMETRYCOLLECTION EMPTY when array is empty. What do you think?

}
byte[] shape = GeometryEngine.geometryToEsriShape(geometry);

if (shape == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible? Under what circumstances GeometryEngine.geometryToEsriShape(geometry) returns null for non-null input? Is this the case for empty geometries? Wouldn't it be clearer to check that before calling geometryToEsriShape?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape will never be null, will update

public static OGCGeometry deserialize(Slice shape)
{
requireNonNull(shape, "shape is null");
int wkid = shape.getInt(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this throw an exception If shape is an EMPTY_SLICE?

It might be helpful to add tests cases for serialize and deserialize functions to make sure all the cases work properly.

requireNonNull(shape, "shape is null");
int wkid = shape.getInt(0);
ByteBuffer shapeBuffer = getShapeBuffer(shape);
if (shapeBuffer.limit() < SIZE_WKID || shapeBuffer.getInt(0) == Geometry.Type.Unknown.value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'shape' slice has 2 fileds: (1) wkid, (2) shape buffer. It seems strange to check the size of the shape buffer against the size of the WKID field. Would you clarify?

@zhenxiao
Copy link
Collaborator Author

zhenxiao commented Sep 7, 2017

@mbasmanova get comments addressed
Added implementation for ST_Centroid

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thanks for updating the implementation of ST_Centroid. I think it is now correctly computing centroid for a point, a set of points, a line and a set of lines. The computation for polygons doesn't seem correct though. See details below.

double ySum = 0;
for (int i = 0; i < pointCount; i++) {
Point point = null;
if (type.equals(MULTI_POINT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic can be simplified by capturing the result of (MultiVertexGeometry) geometry.getEsriGeometry() in a variable in the beginning of the method and using MultiVertexGeometry.getPoint(i) method.

weightSum += weight;
double polygonXSum = 0;
double polygonYSum = 0;
for (int j = 0; j < polygon.getPointCount(); j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you compute the centroid of a polygon as a mean of its vertexes? Based on [1] and [2] the formula is different.

[1] http://www.seas.upenn.edu/~sys502/extra_materials/Polygon%20Area%20and%20Centroid.pdf
[2] https://en.wikipedia.org/wiki/Centroid#Properties

@zhenxiao
Copy link
Collaborator Author

zhenxiao commented Sep 7, 2017

@mbasmanova thanks for your review
Get ST_Centroid fixed for Polygon
let me think about GeometryCollection

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thanks for updating the implementation of ST_Centroid. It looks correct to me now.

for (int i = 0; i < polygon.getPointCount(); i++) {
Point current = polygon.getPoint(i);
Point next = (i + 1 == polygon.getPointCount()) ? polygon.getPoint(0) : polygon.getPoint(i + 1);
xSum += (current.getX() + next.getX()) * (current.getX() * next.getY() - next.getX() * current.getY());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (current.getX() * next.getY() - next.getX() * current.getY()) computation repeats 3 times; consider saving the results in a variable for efficiently and readability

// Points centroid is arithmetic mean of the input points
private static Point getPointsCentroid(OGCGeometry geometry)
{
String type = geometry.geometryType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'type' variable doesn't seem to be used

// Polygon centroid:
// c[x] = (Sigma(x[i] + x[i + 1]) * (x[i] * y[i + 1] - x[i + 1] * y[i]), for i = 0 to N - 1) / (6 * signedArea)
// c[y] = (Sigma(y[i] + y[i + 1]) * (x[i] * y[i + 1] - x[i + 1] * y[i]), for i = 0 to N - 1) / (6 * signedArea)
private static Point getPolygonCentroid(OGCGeometry geometry)
Copy link
Contributor

@mbasmanova mbasmanova Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , I just realized that this formula applies only to polygons without holes. The general case of polygon with one or more holes requires a bit more computation. I haven't found an official formula yet, but the following Stack Overflow answer may be correct. They compute the centroid as a weighted diff between outer inner polygons.

https://math.stackexchange.com/questions/623841/finding-centroid-of-polygon-with-holes-polygons/623849#623849

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that is more computation for it. How about we add a comment, and explicitly not support polygons with holes?

double signedArea = 0;
for (int i = 0; i < polygon.getPointCount(); i++) {
Point current = polygon.getPoint(i);
Point next = (i + 1 == polygon.getPointCount()) ? polygon.getPoint(0) : polygon.getPoint(i + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: WKT format for polygon requires that first and last points be the same. Hence, it may be sufficient to loop from 0 to polygon.getPointCount() - 1.

@zhenxiao
Copy link
Collaborator Author

zhenxiao commented Sep 8, 2017

with @mbasmanova 's help, added serialization/deserialization for GeometryCollection
comments addressed

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thanks for updating the tests.

public void testSTLineFromText()
throws Exception
{
assertQuery("select ST_AsText(ST_LineFromText('LINESTRING EMPTY'))", "select 'MULTILINESTRING EMPTY'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like ST_AsText(ST_Point(input)) doesn't preserve the type of the input geometry when geometry is empty. Here LineString got converted to MultiLineString. Further down in this file I also see that Polygon gets converted to MultiPolygon. Is this expected? Perhaps, worth documenting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is expected. Since when ST_AsText trying to deserialize, it only knows it is a polyline, not clear whether it is a line or multiline. I will document it.

{
assertQuery("select ST_AsText(ST_LineFromText('LINESTRING EMPTY'))", "select 'MULTILINESTRING EMPTY'");
assertQuery("select ST_AsText(ST_LineFromText('LineString(1 1, 2 2, 1 3)'))", "select 'LINESTRING (1 1, 2 2, 1 3)'");
assertQueryFails("select ST_AsText(ST_LineFromText('MULTILineString EMPTY'))", "ST_LineFromText only applies to LineString. Input type is: MultiLineString");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: throughout this test suite, the WKT strings are not consistent. For example, there are LineString, MULTILineString, MULTILINESTRING, etc. Would you take a pass and make them all align. Perhaps, uppercase all the WKT type prefixes?

'POLYGON...', 'POINT...', 'LINESTRING...'

{
assertQuery("select ST_Area(ST_GeometryFromText('Polygon ((2 2, 2 6, 6 6, 6 2))'))", "select 16.0");
assertQueryFails("select ST_Area(ST_GeometryFromText('Point (1 4)'))", "ST_Area only applies to Polygon. Input type is: Point");
assertQueryFails("select ST_Area(ST_GeometryFromText('POLYGON EMPTY'))", "ST_Area only applies to Polygon. Input type is: MultiPolygon");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ST_Area for an empty polygon should not fail. According to SQL/MM spec it should return null. Also, I noticed that ST_Length for empty line-string returns 0. Let's make these two method consistent: both return 0 or null.

8.1.2 ST_Area Methods ...If SELF is an empty set, then return the null value.

BTW, why does ST_Area not allow multi-polygons?

assertQuery("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0))'))", "select 0");
assertQuery("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON ((0 0, 8 0, 0 8, 0 0), (1 1, 1 5, 5 1, 1 1))'))", "select 1");
assertQueryFails("select ST_NumInteriorRing(ST_GeometryFromText('LineString(8 4, 5 7)'))", "ST_NumInteriorRing only applies to Polygon. Input type is: LineString");
assertQueryFails("select ST_NumInteriorRing(ST_GeometryFromText('POLYGON EMPTY'))", "ST_NumInteriorRing only applies to Polygon. Input type is: MultiPolygon");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, ST_NumInteriorRing of an empty Polygon should return null.

assertQuery("select ST_AsText(ST_EndPoint(ST_GeometryFromText('LineString(8 4, 4 8, 5 6)')))", "select 'POINT (5 6)'");
assertQueryFails("select ST_AsText(ST_StartPoint(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))')))", "ST_StartPoint only applies to LineString. Input type is: Polygon");
assertQueryFails("select ST_AsText(ST_EndPoint(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))')))", "ST_EndPoint only applies to LineString. Input type is: Polygon");
assertQueryFails("select ST_AsText(ST_StartPoint(ST_GeometryFromText('LINESTRING EMPTY')))", "ST_StartPoint only applies to LineString. Input type is: MultiLineString");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ST_StartPoint of a empty line-string should return null

assertQuery("select ST_X(ST_GeometryFromText('Point (1 2)'))", "select 1.0");
assertQuery("select ST_Y(ST_GeometryFromText('Point (1 2)'))", "select 2.0");
assertQueryFails("select ST_Y(ST_GeometryFromText('Polygon ((2 0, 2 1, 3 1))'))", "ST_Y only applies to Point. Input type is: Polygon");
assertQueryFails("select ST_Y(ST_GeometryFromText('POINT EMPTY'))", "This operation should not be performed on an empty geometry.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the spec, ST_Y of empty point should return null

assertQuery("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON ((1 1, 1 4, 4 1))')))", "select 'LINESTRING (1 1, 4 1, 1 4, 1 1)'");
assertQuery("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON ((0 0, 0 5, 5 5, 5 0, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1))')))", "select 'LINESTRING (0 0, 5 0, 5 5, 0 5, 0 0)'");
assertQueryFails("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('LineString(1 1, 2 2, 1 3)')))", "ST_ExteriorRing only applies to Polygon. Input type is: LineString");
assertQueryFails("select ST_AsText(ST_ExteriorRing(ST_GeometryFromText('POLYGON EMPTY')))", "ST_ExteriorRing only applies to Polygon. Input type is: MultiPolygon");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the spec, ST_ExteriorRing of empty polygon should return null

@zhenxiao
Copy link
Collaborator Author

@mbasmanova get comments addressed


Presto GeoSpatial functions are compliant with the Open Geospatial Consortium’s (OGC) OpenGIS Specifications. As such, many Presto GeoSpatial functions require, or more accurately, assume that geometries that are operated on are both simple and valid. For example, it does not make sense to calculate the area of a polygon that has a hole defined outside of the polygon, or to construct a polygon from a non-simple boundary line.

Presto GeoSpatial fucntions support the Well-Known Text (WKT) form of spatial objects. Here are the examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: fucntions

@Description("Returns a Geometry type LineString object from Well-Known Text representation (WKT)")
@ScalarFunction("ST_LineFromText")
@SqlType(NAME)
public static Slice stLineString(@SqlType(StandardTypes.VARCHAR) Slice input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this method parseLine.

import io.airlift.slice.Slices;
import org.apache.commons.lang.StringUtils;

import static com.facebook.presto.geospatial.GeometryType.NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAME is too generic to import it statically.


@Description("Returns the area of a polygon using Euclidean measurement on a 2D plane (based on spatial ref) in projected units")
@ScalarFunction("ST_Area")
@SqlType(StandardTypes.DOUBLE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions defined as returning double in the spec? (vs Decimal, for instance)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined as float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint , @zhenxiao , here is a snippet from the SQL/MM

METHOD ST_Area()
RETURNS DOUBLE PRECISION
LANGUAGE SQL
DETERMINISTIC
CONTAINS SQL
RETURNS NULL ON NULL INPUT,

return leftGeometry.within(rightGeometry);
}

private static OGCGeometry getGeometry(OGCGeometry geometry, String function, String...validTypes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method is misleading. It's just validating that the type is one of validTypes.

BTW, the cost of this function could be non-trivial (due to having to create an array for the varargs argument and the loop/equality check against multiple types). A better approach might be to create an enum of well-known types and rewrite this as:

private static void validateType(String function, OGCGeometry geometry, Set<GeometryTypeName> validTypes)
{
    GeometryTypeName actual = GeometryTypeName.valueOf(geometry.getType());
    if (!validTypes.contains(actual)) {
        throw ...
    }
}

and call it like:

validateType("ST_xyz", geometry, EnumSet.of(ST_POINT, ST_LINE));

(also, note the slight change in order of arguments. I think that makes a little more logical sense)

ySum += (current.getY() + next.getY()) * ladder;
signedArea += ladder;
}
return new Point(xSum / (signedArea * 3), ySum / (signedArea * 3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments before the method say "6 * signedArea", but this multiplies by 3. Which one is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is 6. sum of ladder actually is 2 * signedArea, I merged computation there. Will update

return null;
}

return format("<%s>", getTypeSignature()).getBytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there isn't a strict contract for what getObjectValue should return, it's a good idea to return a json-serializable object that represents the value. This is currently used by the client protocol and can be handy for tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very clear about it, is it good to return typeSignature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to return the WKT for the value, if possible. This is what's going to be rendered by the CLI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

}
final BasicSliceInput input = shape.getInput();

int wkid = input.readInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the rest of the code, this is the ID of the coordinate system (or spatial reference), so I'd call it just that (i.e., spatialReferenceId, coordinateSystemId, etc). wkid is not descriptive enough.

List<OGCGeometry> geometries = new ArrayList<>();
while (input.available() > 0) {
int length = input.readInt();
byte[] geometryBytes = new byte[length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just do:

ByteBuffer buffer = input.readSlice(length).toByteBuffer()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could keep track of the position you're reading at and just call shape.toByteBuffer(offset, length) to avoid having to create the intermediate geometryBytes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it. Will fix it

@@ -86,6 +86,12 @@
</artifact>
</artifactSet>

<artifactSet to="plugin/geo">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geospatial?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenxiao , thanks for addressing the comments. This PR is looking good to me.

{
OGCGeometry geometry = getGeometry(deserialize(input), "ST_NumInteriorRing", POLYGON);
return ((OGCPolygon) geometry).numInteriorRing();
OGCGeometry geometry = getGeometry(deserialize(input), "ST_NumInteriorRing", POLYGON, MULTI_POLYGON);
Copy link
Contributor

@mbasmanova mbasmanova Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be a type. ST_NumInteriorRing applies only to polygons. It doesn't apply to multi-polygon. Also, the implementation casts geometry to OGCPolygon. This cast will fail if geometry is multi-polygon. Right?

@Description("Returns the first point of a LINESTRING geometry as a Point")
@ScalarFunction("ST_StartPoint")
@SqlType(NAME)
public static Slice stStartPoint(@SqlType(NAME) Slice input)
{
OGCGeometry geometry = getGeometry(deserialize(input), "ST_StartPoint", LINE_STRING);
OGCGeometry geometry = getGeometry(deserialize(input), "ST_StartPoint", LINE_STRING, MULTI_LINE_STRING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of the SQL/MM suggests that ST_StartPoint and ST_EndPoint are defined only for line strings. These are not defined for multi-line strings.

@zhenxiao
Copy link
Collaborator Author

@mbasmanova @martint thanks for your comments
get comments addressed

@martint
Copy link
Contributor

martint commented Sep 19, 2017

Thanks @zhenxiao. It's getting very close.

You seem to have missed this comment in your latest batch of changes: 501e33c#r139566928

@zhenxiao
Copy link
Collaborator Author

thank you, @martint just addressed missing comments

import static com.facebook.presto.tpch.TpchMetadata.TINY_SCHEMA_NAME;

public class TestGeoQueries
extends AbstractTestQueryFramework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason these test run full queries instead of using AbstractTestFunctions (see, for instance, TestJsonFunctions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@zhenxiao
Copy link
Collaborator Author

Thank you, @martint I get comments addressed

Copy link
Contributor

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there seem to be some build failures. Can you take a look?

@zhenxiao
Copy link
Collaborator Author

@martint seems when running presto-product-tests, it does not build main test module, which I added getTypeRegistry() method.
Do you know how to run product-tests locally? I think if it first build main test module, should be OK

@highker
Copy link
Contributor

highker commented Sep 22, 2017

@zhenxiao probably it's a testbed issue; I have restarted

@zhenxiao
Copy link
Collaborator Author

nice of you, @highker

Copy link
Contributor

@nezihyigitbasi nezihyigitbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the last minute comments everyone. @zhenxiao please also squash the commits before we merge this.

GeoSpatial Functions
=====================

Presto GeoSpatial functions support SQL/MM specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is somewhat isolated from the rest of the doc.

btw the SQL/MM specification.


.. function:: ST_Contains(Geometry, Geometry) -> boolean

Returns TRUE if and only if no points of right lie in the exterior of left, and at least one point of the interior of left lies in the interior of right.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left, right meaning the first and second argument I guess?


.. function:: ST_Intersects(Geometry, Geometry) -> boolean

Returns TRUE if the Geometries spatially intersect in 2D - (share any portion of space) and FALSE if they don't (they are Disjoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disjoint -> disjoint.


.. function:: ST_Disjoint(Geometry, Geometry) -> boolean

Returns TRUE if the Geometries do not "spatially intersect" - if they do not share any space together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given geometries (as you used in the function below, also doesn't make sense to capitalize Geometries).


.. function:: ST_Intersects(Geometry, Geometry) -> boolean

Returns TRUE if the Geometries spatially intersect in 2D - (share any portion of space) and FALSE if they don't (they are Disjoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given geometries


.. function:: ST_Distance(Geometry, Geometry) -> double

Returns the 2-dimensional cartesian minimum distance (based on spatial ref) between two Geometries in projected units.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you use 2D above but 2-dimensional here, intentional?


Returns the last point of a LineString Geometry as a Point.

.. function:: ST_X(point) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why you don't capitalize as Point as you do for other functions?

<version>2.4</version>
</dependency>

<!-- Presto SPI -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary comment

@SqlType(GEOMETRY_TYPE_NAME)
public static Slice stPoint(@SqlType(StandardTypes.DOUBLE) double x, @SqlType(StandardTypes.DOUBLE) double y)
{
OGCGeometry geometry = OGCGeometry.createFromEsriGeometry(new Point(x, y), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import createFromEsriGeometry

public static Slice serialize(OGCGeometry input)
{
int spatialReferenceId = input.SRID();
final DynamicSliceOutput sliceOutput = new DynamicSliceOutput(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually declare local variables as final.

@nezihyigitbasi
Copy link
Contributor

there is a compilation problem @zhenxiao

[ERROR] /home/travis/build/prestodb/presto/presto-geospatial/src/test/java/com/facebook/presto/geospatial/TestGeoQueries.java:[36,31] cannot find symbol
  symbol:   method getTypeRegistry()
  location: variable functionAssertions of type com.facebook.presto.operator.scalar.FunctionAssertions
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:testCompile (default-testCompile) on project presto-geospatial: Compilation failure
[ERROR] /home/travis/build/prestodb/presto/presto-geospatial/src/test/java/com/facebook/presto/geospatial/TestGeoQueries.java:[36,31] cannot find symbol
[ERROR]   symbol:   method getTypeRegistry()
[ERROR]   location: variable functionAssertions of type com.facebook.presto.operator.scalar.FunctionAssertions
[ERROR] -> [Help 1]

@zhenxiao
Copy link
Collaborator Author

Thank you @nezihyigitbasi I will work on it
@highker do you have any hints how to run the product tests locally? I am trying to reproduce it
Seems my main-test updates did not get compiled for this test

@nezihyigitbasi
Copy link
Contributor

@zhenxiao why did you add <type>test-jar</type> for the presto-main module to the presto-geospatial/pom.xml file?

@zhenxiao
Copy link
Collaborator Author

@nezihyigitbasi getting complain about using undeclared dependencies without adding test-jar for presto-main

@nezihyigitbasi
Copy link
Contributor

please update the version in presto-geospatial/pom.xml to 0.186-SNAPSHOT. we have just released a new version.

@zhenxiao
Copy link
Collaborator Author

@nezihyigitbasi get your comments addressed
Updated to the latest version
not sure about the product tests

@zhenxiao
Copy link
Collaborator Author

aha, product tests running good now
The only failure is a timeout, not related to this PR
@nezihyigitbasi may be let it run one more time?

@nezihyigitbasi
Copy link
Contributor

The tests look good (except a known flaky one) so I merged the PR. Thanks @zhenxiao!

@zhenxiao zhenxiao deleted the geoSpatial branch December 9, 2017 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants