-
Notifications
You must be signed in to change notification settings - Fork 1.9k
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReQL proposal: Geospatial support #2571
Comments
The meaning of a line between two points, of a circle and of the shortest distance is ambiguous. What projection is used? Should we pretend that the earth is a sphere? Can lines and polygons cross the 180° longitude line? How do other API solve these problems? |
@AtnNn Everything here assumes geometry on the surface of a sphere (of maybe ellipsoid, I think S2 supports correcting for the shape of the earth based on the WGS84 system). We would implement this using S2 https://code.google.com/p/s2-geometry-library/ (MongoDB does the same, but also supports Euclidean geometry which we can add later and switch on via opt args). The shortest distance is not ambiguous (apparently the definition of distances we would use is actually called great-circle distance http://en.wikipedia.org/wiki/Great-circle_distance). Lines between two points are ambiguous if the points are on opposite sides of the sphere. We could avoid the problem though by defining one line as the canonical one (add an implicit third point that the line must go through). Crossing the 180 degree line should be possible. |
I really dislike this syntax. It seems very un-ReQLish -- you're introducing two new terms which can only be used inside another term. We do this in some places ( I would prefer one of:
|
I think we should do the simplest thing possible for the first release, so leaving off compound geo-indexing sounds fine to me. |
|
It's sort of unfortunate that this proposal is only for geo data. Since we're doing all the work anyway, how hard would it be to have generic 2-dimensional indexes? They're useful for a lot of other things, too. (e.g. "Get me the experimental subjects in this age range and this weight range.") |
The problem with these is that they cannot be combined for composite indexes. If we ignore that, I would strongly prefer the first variant with new functions defined on tables. The meaning of |
I am wary of S2, it seems unmaintained. The last update is from 2011. |
@mlucy We would have to look for a different library or implement a lot of things ourselves if we want to support other geometry. I'm not sure how much overlap there would actually be in the implementation, though we could certainly use the same interface for such data. |
The second one can be: (I'm having some trouble reasoning about this because I'm not 100% clear how composite indexes where one element is a polygon will work -- how will that be implemented? Will it always be efficient, even for an index like
It may be because I thought of the syntax, but this reads pretty clearly to me: # Get everything in the circle
table.get_all(r.geoCircle([x, y], radius), index:'location')
# Get everything in the circle, ignoring polygons that only intersect it
table.get_all(r.geoCircle([x, y], radius), index:'location', include_intersecting:'false') |
@AtnNn It's very solid though. It's used by a lot of projects including MongoDB, and it provides extremely convenient features for building indexes over geospatial data. I have also used it before for processing geodesic coordinates. We should still look for some alternatives before deciding on it. |
(Also, can you have a compound index with two polygons? How will that work internally? What does it even mean, and what sort of queries can you do on it?) |
@mlucy: How would you do "distance less than ... from ..." with that It also wouldn't work if you had multiple geometric predicates in a composite index, though I guess we would disallow that anyway. |
How is "distance less than ... from ..." different from "contained within this circle"? |
Generally querying by a compound index [p1, p2, p3] with predicates p1, p2, p3 has the semantics of filtering for documents that match all predicates at the corresponding entries of the compound index. We could do it, but I agree that it wouldn't be a good idea. |
Also, what does a |
Sorry, I wasn't thinking properly there. Generally my assumption was that there are a couple of very different criteria that you can apply to geometric data which can be optimized by an index, and that they should be combinable for composite indexes. Introducing such predicates seemed like the reasonable way to represent this. The "r.table().intersects()" etc. syntax doesn't have that problem and is nicely explicit, which is why I prefer it. |
We shouldn't allow that. The |
No, I meant, if you have a compound index |
I can see that. I guess in my mind "abcde" is a single key in a 1-D index, while a geometric shape is a set of keys in a 2-D index. If we had a data type representing a set of keys in a 1-D index (like |
@mlucy: We would not support |
If you think of the shape as a set of keys, "intersection" is just "any entry which matches one of the keys in the set", which seems intuitive to me. But I could see how it would be non-intuitive too. |
Yeah, Interpreting polygons as ranges all the time works well as long as you can only store points in a table. But you can also store polygons in a table, and then you might want to filter by equality. |
I'm sort of getting the sense that compound indexes + geo data don't work well together. Compound indexes are very general and can be used for either point gets or range scans, whereas it sounds like compound indexes with geo data inside of them would basically only be used to let you do a geo-indexed query and only retain elements matching a particular set of tags. It might be better to support that some other way, rather than adding a bunch of very specific rules to compound indexes to produce that behavior. |
Well, a typical query someone might want to do is "find all gas stations within a certain distance". So you would build a compound index |
We could also represent such compount queries through the Edit: Removed the word non-geospatial. X and Y would be interpreted as equality constraint, no matter what their type is. |
What would It's still not too clear to me what it even means to have a compound index where one of the elements is 1-dimensional and another is 2-dimensional. Would we use a projection of the geoJSON as a 1-dimensional stand-in for things like |
(Sorry to be dense on this, but I think I'm still not getting it.) |
@coffeemug What about the alternative of having The latter is also a bit confusing because if you specify a geo index, the intersection test is typically performed over a certain field (the field the index was built over). If you don't specify the |
Regarding naming: If we call |
I agree that the command should be polymorphic on streams, however we don't have precedent for commands that run on objects, streams, and accept an index optarg only when run on streams (that would be pretty weird). I'm not quite sure what to do about that, other than having two functions |
@coffeemug Wait, unless I'm confused about which proposal we agreed on, we already have My preferred solution would be to extend the predicate term I would also like to hear @mlucy 's opinion on this. |
Ah, sorry, I misunderstood. Yes, I think it's a good idea to extend |
Another thing I'd like to revisit is the The way it works now is that it creates a rectangle on the Mercator projection (http://en.wikipedia.org/wiki/Mercator_projection) of the geometry. It can be useful because many maps are drawn that way, and you can use it for example to conveniently query for items visible in a given rectangular piece of the map (like when visualizing a map for a website). I'm undecided whether we should keep it in as it is, remove it, or rename it (to |
@danielmewes Do lines and polygons also follow the mercator projection? |
@AtnNn I'm not sure. If you connect two points that differ only either in their longitude or latitude but not both you will get a straight line in the Mercator projection. That's what The semantics of all polygons and lines (including those constructed by One problem with |
Also to clarify: |
@danielmewes You say that "we connect vertices by the shortest path along the earth's surface" and that, for |
@AtnNn Both. Those are the same lines when you connect two points that differ in only either latitude or longitude (and don't wrap over the poles). |
@danielmewes The only horizontal lines in the Mercator projection that represent the shortest distance between their end points on a sphere are those along the equator. |
Oh wait you are right. What I said is only true for latitudal lines. Yeah, so |
I removed |
That sounds good to me. |
The first part of the implementation is up in CR 1812 by @mlucy.
|
Geospatial support with the limitations mentioned in the previous post has been merged into |
Just a heads up.. since this has been resolved the secondary indexes documentation should be updated: http://cl.ly/YcKV |
Thanks for the heads up @barkerja . We had fixed this in rethinkdb/docs#568 . Trying to find out why it's not online yet... |
Edit: Note the updated proposal below #2571 (comment)
In contrast to #1158, I would like to use this issue solely to track the ReQL API side of things.
This proposal is limited to two-dimensional geodesic geometry (geometry on the earth's surface). We can add support for Euclidean geometry later if necessary.
Geospatial data representation
r.geoJSON(object) : object -> geometry
converts from the GeoJSON objectobject
to the geometry pseudo typer.geoJSON(string) : string -> geometry
(optional): equivalent tor.geoJSON(r.json(string))
geometry.toGeoJSON() : geometry -> object
does the opposite ofr.geoJSON(object)
r.geoPoint(x, y) : float, float -> geometry
r.geoLine(p1, p2, ...) : geometry, geometry, ... -> geometry
where the input arguments must be pointsr.geoLine([x1, y1], [x2, y2], ...) : [float, float], [float, float], ... -> geometry
r.geoPolygon(p1, p2, ...) : geometry, geometry, ... -> geometry
where the input arguments must be pointsr.geoPolygon([x1, y1], [x2, y2], ...) : [float, float], [float, float], ... -> geometry
polygon1.sub(polygon2) : geometry, geometry -> geometry
subtracts polygon2 from polygon1. For now, we should make the following requirement: polygon2 must be completely inside of polygon1. This allows to construct polygons with holes in them.r.geoCircle(center, radius) : geometry, float -> geometry
,r.geoRectangle(bottomLeft, upperRight) : geometry, geometry -> geometry
create a line describing the corresponding shape. Can be combined withfill()
(see Misc) to get spheres / filled rectangles.Creating a geospatial index
a
of a table, some documents in the table can have geometry ina
and others can have strings, numbers or whatever.indexCreate()
when we want to support different types of geometry.Misc
p1.distance(p2) : geometry, geometry -> float
computes the minimal geodesic distance between points p1 and p2. Let's ignore distances to/between polygons and lines for now. Other geometries can be supported through opt args later.l.fill() : geometry -> geometry
Takes a line, makes it the outline of a polygon. The line has to be closed (and possibly must not intersect with itself, not sure about that yet).geometry.isContained(polygon) : geometry, geometry -> bool
tests whether geometry is completely contained in polygongeometry1.intersects(geometry2) : geometry, geometry -> bool
tests whether geometry1 and geometry2 intersectset1.isContained(set2) : array, array -> bool
isContained()
for two arrays. Tests whether all elements of set1 are found in set2.set1.intersects(set2) : array, array -> bool
intersects()
for two arrays.Querying
table.getAll()
such astable.getAll(function (x) { return x('position').isContained(polygon); } )
ortable.getAll(function (x) { return x('position').distance(center).le(5.0); } )
and have an optimizer automatically make use of an index. However to avoid having to analyze the function, I propose introducing simplified predicates for getAll that can use secondary indexes (comparable tor.desc()
andr.asc()
fororderBy()
):table.getAll(geometry, {index: ...})
table.getAll(r.intersects(polygon), {index: ...})
table.getAll(r.isContained(polygon), {index: ...})
table.getAll(r.withinDistance(center, radius), {index: ...})
(sugar forr.intersects(r.geoCircle(center, radius).fill())
, except that we might want to restrict it to points)table.getAll(r.isBetween(left, right))
as an alternative tor.between()
table.getAll([pred1, pred2, ...])
. Note that such a query wouldn't always be efficient, and might have to rely heavily on post-filtering (or alternatively trigger a lot of smaller index lookups). Consider the example oftable.getAll([r.isBetween(-inf, +inf), "foo"])
to see why this is the case. Not sure if we want to support this for the first version.table.getAll(["foo", r.isBetween(-inf, +inf)])
would be legal, while the previously mentionedtable.getAll([r.isBetween(-inf, +inf), "foo"])
would not be allowed. Such queries would always be efficient, and much easier to implement.r.orderBy(r.distance(p), {index: ...})
(note that this is a single-argumentp.distance()
variant)Open questions:
table.getAll(r.contains(geometry))
?geo
? E.g. should we callintersects()
geoIntersects()
instead? This has the advantage that we can provide a reversed variant ofisContained()
calledgeoContains()
without clashing with the existingobject.contains(fieldNames)
term.The text was updated successfully, but these errors were encountered: