Spatial Graph Operations Language (SGOL)
Spatial Graph Operations Language (aka SGOL) is a high-level language for describing graph operations, with a focus on spatial data.
This graph language follows many of the design conventions of the common SQL. The syntax will be familiar, but the actual reserved words are different.
This language describes a chain of graph operations that are executed as a stream. A primary set of objects (usually Graph
elements) are passed from clause to clause. You can save elements to secondary sets while executing the operations by using
FETCH. This is somewhat similiar to how jquery, gulp, and more are designed for
Why not just Gremlin?
Gremlin is the graph traversal language of Apache TinkerPop and provides a common language to interface with multiple graph databases. The key difference between SGOL and Gremlin is that an SGOL chain is a plain string easily built by client applications, with no dependencies. As a plain string, it can be easily sent from web application front-ends, desktop applications, or manually input by users. As with CQL, it be used directly in QGIS and OpenLayers vector source urls.
What about Cypher?
Cypher is the SQL-inspired language of Neo4j. Where as Cypher is specific to the Neo4j application, SGOL is built to be abstract and extensible so that it can be used by multiple applications. Additional SGOL focuses on "operations", which may be more complex than searching or adding nodes to graph. For example, with SGOL you can directly call operations advertised by your system that could transfer data from a NoSQL database into a graph database, at scale. Or you could call operations to send response via another channel (email, etc.) rather than directly in HTTP request-response cycle.
This work is distributed under the MIT License. See LICENSE file.
ADD is used for adding new elements to the graph. It is typically used as the terminal operation in a chain, since it's return value is
VOID. You can use it to commit elements that are created in-memory, but not commited.
... RELATE ADD
DISCARD is an operation for discarding the
primary set of elements. You can use it to essentially start a new chain of clauses. Importantly, you can still reference the secondary sets mentioned in
FETCH, etc. It is useful if you want to sequentially
upsert data to the graph first and then navigate the graph with the new data.
... RELATE ADD DISCARD SELECT $PointOfInterest
FETCH is used for replacing the primary set of elements in the stream with those from a secondary set. It is frequently used as a penultimate operation after a recursive search.
FETCH <id of secondary set>
HAS is the operation for filtering the
current set of entities to only those connected to another literal set of entities. It is frequently used within a chain of statements.
... HAS INPUT $HasType pointofinteresttype_cafe OUTPUT entities
INIT initializes a secondary set of objects, typically elements. It is not strictly necessary if the
SGOL-compiler automatically prepends
INIT clauses as needed before executing a chain of operations. See Compiler Steps below for more information.
INIT <id of secondary set>
LIMIT is the operation for limiting the size of the
current set of entities to the given value. It is frequently used as a penultimate operation after a search, after FETCH and before OUTPUT.
... HAS INPUT $HasType pointofinteresttype_cafe LIMIT 10 OUTPUT entities
NAV is the operation for
hopping around the graph from entity to entity following certain edges. The simplest clause is
NAV A Edge B, but can be expanded in a variety of ways.
You can select an entity set by it's id, select all the entities in a list of groups, or use
INPUT to use the primary set of elements (from the previous operation). If selecting by group, then use a comma-separated list with each group prepended by a
NAV will return adjacent edges rather than continuing through to the adjacent entities. You can optionally, also save the output to a secondary set by id by using
NAV [ INPUT | <entity id> | $ | $<group>[,$<group> ...] ] \ [ $ | $<group>[,$<group> ...] ] \ [ _ | - | <entity id> | $ | $<group>[,$<group> ...] ] \ [UPDATE <id of secondary set> [WITH $<group>[,$<group> ...] ]
NAV $PointOfInterest $HasType pointofinteresttype_cafe SELECT $PointOfInterest NAV INPUT $ _ RUN CountGroups OUTPUT json
Output is used to specify the type of objects returned by the chain of operations. This is used for encoding the response to SGOL clients. void can be used to signal that there is no response (like a SQL update). If the output type is missing or invalid for the chain, the SGOL server may throw an error.
OUTPUT [ bbox | edges | elements | entities | geojson | json | text | void ]
RELATE is used for calculating the geospatial relationship between two sets of entities, the first set being the primary set and the second being a secondary set referenced by id. The total number of calculations is in
O(mn) time, so your SGOL-server may cache results to increase performance.
... RELATE <id of secondary set> ADD OUTPUT void
SELECT location_test UPDATE location DISCARD \ SELECT $PointOfInterest FILTER bbox(-180,-90,180,90) \ RELATE location ADD \ OUTPUT void
RUN is used for directly executing graph operations known to the server. It is frequently used at the end of a stream for transforming the output.
RUN GetConvexPolygon() RUN GetConvexHull() RUN GetExtent() RUN CollectGeometries() RUN UpsertElements()
RUN can also be used for calling custom operations.
RUN AddElementsFromGeoMesa(PointOfInterest, catalog, poi, "-180,-90,180,90")
SELECT is a basic operation, analagous to a SQL
SELECT. It simply filters all the elements in the graph by a group and optionaly other filters. You can select an entity by it's id or select all the entities in a list of groups. If selecting by group, then use a comma-separated list with each group prepended by a
$. You can optionally, also save the output to a secondary set by id.
SELECT [ <entity id> | $<group>[,$<group> ...] ] [UPDATE <id of secondary set> [WITH $<group>[,$<group> ...] ]
SELECT $DATASET SELECT $PointOfIntereset,$PointOfInterestType RUN GetElementsWithinSet() OUTPUT elements
Filters can be used in
NAV operations for filtering a set of elements (primary or secondary)
dwithin(geom_wkt, "Logan Circle, DC", 1000, meters) icontains(attributes, name, "El Centro") collectioncontains(aliases, "bar")
SGOL-compilers can take a few basic steps to pre-process operation chains to decrease burden on users and client applications, including initializing secondary sets, injecting discards, replacing server variables, and appending output.
Initializing Secondary Sets
Compilers can scan for references to secondary sets (e.g.,
... FETCH b ...) and prepend an INIT to the beginning of the chain if missing (e.g.,
Certain clauses, by semantic definition, never take any input, such as SELECT and FETCH. Compilers can inject DISCARD clauses into a chain when they can be inferred. For example:
SELECT A UPDATE c FETCH b... becomes
SELECT A UPDATE c DISCARD FETCH b...
Replacing Server Variables
An SGOL statement may include "variables" that are replaced by the server right before intialization. A simple way to represent the variables is with an enclosing
[...], such as
[limit] to represent the global API limit. Other variables could be context-specific, such as
[units]. Simple text replace may be all that is needed; however, the method used is left to the implementation.
... LIMIT [limit] OUTPUT entities
If the final clause in a chain is not OUTPUT, then a compiler can attempt to infer the final output type. For example,
SELECT $PointOfInterest becomes
SELECT $PointOfInterest OUTPUT entities. However, the compiler may not be aware of the output type for RUN commands, e.g.,
RUN XYZ(a, b, c). If there is no final OUTPUT, then the response is up to the SGOL-comptaible service.
These are some full real-world examples.
Select all datasets where an alias for the dataset is "bank" and return as a geojson.
SELECT $Dataset FILTER collectioncontains(aliases, "bank") OUTPUT geojson
Find all medical facilities and then wrap a convex polygon around the set and export as geojson.
SELECT $PointOfInterest FILTER collectioncontains(keywords, medical) RUN GetConvexPolygon(geom_wkt, geojson) OUTPUT geojson
Finding Points Of Interest
NAV $PointOfInterest $HasType pointofinteresttype_cafe UPDATE collection WITH $PointOfInterest FILTER dwithin(geom_wkt, "Logan Circle, DC", 1000, meters)