Authors: Petra Selmer <petra.selmer@neotechnology.com>, Stefan Plantikow <stefan.plantikow@neotechnology.com>
This CIP proposes the incorporation of nested subqueries into Cypher.
Subqueries - i.e. queries within queries - are a powerful and expressive feature allowing for:
-
Increased query expressivity
-
Better query construction and readability
-
Easier query composition and reuse
-
Post-processing results from multiple queries as a single unit
-
Performing a sequence of multiple write commands for each record
This CIP may be viewed in light of the EXISTS CIP, the Scalar Subqueries and List Subqueries CIP, and the Map Projection CIP, all of which propose variants of subqueries. In contrast, this CIP focusses on subqueries operating at a clause level while the EXISTS CIP and Map Projection CIP propose subqueries operating at an expression level.
Nested subqueries are self-contained Cypher queries that are usually run within the scope of an outer Cypher query.
This proposal suggests the introduction of new nested subquery constructs to Cypher.
-
Read-only nested simple subqueries of the form
{ … RETURN … } -
Read-only nested chained subqueries of the form
THEN { … RETURN … } -
Read-only nested optional subqueries of the form
OPTIONAL { … RETURN … } -
Read-only nested mandatory subqueries of the form
MANDATORY { … RETURN … } -
Read/Write nested simple updating subqueries of the form
DO { … }(inner query not ending withRETURN) -
Read/Write nested conditionally-updating subqueries of the form
DO [WHEN cond THEN { … }]+ [ELSE { … }] END(inner queries not ending withRETURN)
A nested simple subquery consists of an inner query in curly braces.
All other nested subquery constructs are introduced with a keyword in conjunction with an inner query in curly braces.
Nested subqueries may be correlated - i.e. the inner query may use variables from the outer query - or uncorrelated.
Nested subqueries can be contained within other nested subqueries at an arbitrary (but finite) depth.
Read/Write nested subqueries cannot be contained within other read-only nested subqueries.
Finally, this CIP proposes new shorthand syntax for starting a query with WHERE, along with the ability to specify that no fields are to be returned through the introduction of WITH -, RETURN -, and YIELD -.
1. Read-only nested simple subqueries
We propose the addition of read-only nested simple subqueries as a new form of read-only Cypher query.
A nested read-only simple subquery is denoted using the following syntax: { <inner-query> }.
The inner query can be any complete read-only Cypher query.
A nested read-only simple subquery may only be used as a primary clause, i.e. as a
-
top-level Cypher query,
-
inner query of another nested subquery,
-
inner query of another expression-level subquery (such as a pattern comprehension, or an
EXISTSsubquery), -
argument query to
UNIONand similar clause-level binary operators
A nested read-only simple subquery may not be used as a secondary clause after a preceding primary clause. (However, a nested read-only chained subquery may be used in this case.)
2. Read-only nested chained subqueries
We propose the addition of read-only nested chained subqueries for using nested subqueries in a similar position as a secondary clause. This is called subquery chaining.
After a chain of clauses that together form a query, a new nested chained subquery may be introduced as a secondary clause using the THEN keyword followed by an inner query in curly braces, i.e. it is denoted using the following syntax: … THEN { <inner-query> }.
THEN is a query combinator and more details may be found in the Query Combinator CIP.
3. Read-only nested optional subqueries
We propose the addition of a new OPTIONAL clause for expressing read-only nested optional subqueries.
A read-only nested optional subquery is denoted by the following syntax: OPTIONAL { <inner-query> }.
4. Read-only nested mandatory subqueries
We propose the addition of a new MANDATORY clause for expressing read-only nested mandatory subqueries.
A read-only nested mandatory subquery is denoted by the following syntax: MANDATORY { <inner-query> }.
4. Read/Write nested simple updating subqueries
We propose the addition of a new DO clause for expressing read/write nested simple updating subqueries that do not return any data.
A read/write nested simple updating subquery is denoted by the following syntax: DO { <inner-update-query> }.
Any updating Cypher query from which the trailing final RETURN clause has been omitted may be used as an inner update query.
We additionally propose removing the FOREACH clause from the current language as it is rendered obsolete by the introduction of DO.
5. Read/Write nested conditionally-updating subqueries
We propose the addition of a second form of the DO clause for expressing read/write nested conditionally-updating subqueries that do not return any data.
A read/write nested conditionally-updating subquery is denoted by the following syntax:
DO
[WHEN <cond> THEN <inner-update-query>]+
[ELSE <inner-update-query>]
ENDEvaluation proceeds as follows:
-
Semantically, the
WHENconditions are tested in the order given, and the inner updating query is executed for only the first condition that evaluates totrue. -
If no given
WHENcondition evaluates totrueand anELSEbranch is provided, the inner updating query of theELSEbranch is executed. -
If no given
WHENcondition evaluates totrueand noELSEbranch is provided, no updates will be executed.
6. Shorthand syntax
We propose the addition of a new clause WHERE <cond> <subclauses> as a shorthand syntax for WITH * WHERE <cond> THEN { <subclauses> }.
The idea is for this to be used exclusively as a primary clause; for example, as the first clause of a nested subquery.
We propose the addition of a new projection clauses of the form WITH - and RETURN -, which will retain the input cardinality but project no result fields.
This allows for only checking the cardinality in a read-only nested mandatory subquery.
We propose the addition of a new subclause to CALL of the form YIELD -, which will retain the output cardinality of a call but project no result fields.
This allows for only checking the cardinality in an EXISTS subquery.
1. Read-only nested subqueries
Conceptually, a nested subquery is evaluated for each incoming record and may produce an arbitrary number of result records.
The rules regarding variable scoping are detailed as follows:
-
All incoming variables remain in scope throughout the whole subquery.
-
When evaluating the subquery, any new variable bindings introduced by the final
RETURNclause will augment the variable bindings of the initial record. -
It is valid (though redundant) if incoming variables from the outer scope are passed on explicitly by any projection clause of the subquery (including the final
RETURN). -
Nested subqueries therefore cannot shadow variables present in the outer scope, and thus behave in the same way as
UNWINDandCALLwith regard to the introduction of new variable bindings. -
Any other variable bindings that are introduced temporarily in the subquery will not be visible to the outer scope.
Subqueries interact with write clauses in the same way as MATCH does.
2. Read/Write subqueries
Execution of a DO subquery does not change the cardinality; i.e. the inner update query is run for each incoming record.
Any input record is always passed on to the clause succeeding the DO subquery, irrespective of whether it was eligible for processing by any inner update query.
A DO clause that uses WHEN sub-clause is called a conditional DO.
A query may end with a DO subquery in the same way that a query can currently end with any update clause.
1. Read-only nested simple and chained subqueries
Post-UNION processing:
{
// authored tweets
MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
(user)<-[:AUTHORED]-(tweet:Tweet)
RETURN tweet, tweet.time AS time, user.country AS country
UNION
// favorited tweets
MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
(user)<-[:HAS_FAVOURITE]-(favorite:Favorite)-[:TARGETS]->(tweet:Tweet)
RETURN tweet, favourite.time AS time, user.country AS country
}
WHERE country = 'se'
RETURN DISTINCT tweet
ORDER BY time DESC
LIMIT 10Uncorrelated nested subquery:
MATCH (f:Farm {id: $farmId})
THEN {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
UNION
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
RETURN b.name AS name, p.code AS code
}
RETURN f, name, codeCorrelated nested subquery:
MATCH (f:Farm {id: $farmId})-[:IS_IN]->(country:Country)
THEN {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
UNION
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
WHERE v.leftHandDrive = country.leftHandDrive
RETURN b.name AS name, p.code AS code
}
RETURN f, name, codeFiltered and correlated nested subquery:
MATCH (f:Farm)-[:IS_IN]->(country:Country)
WHERE country.name IN $countryNames
THEN {
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b AS brand, p.code AS code
UNION
MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
WHERE v.leftHandDrive = country.leftHandDrive
RETURN b AS brand, p.code AS code
}
WHERE f.type = 'organic'
AND b.certified
RETURN f, brand.name AS name, codeDoubly-nested subquery:
MATCH (f:Farm {id: $farmId})
THEN {
MATCH (c:Customer)-[:BUYS_FOOD_AT]->(f)
THEN {
MATCH (c)-[:RETWEETS]->(t:Tweet)<-[:TWEETED_BY]-(f)
RETURN c, count(*) AS count
UNION
MATCH (c)-[:LIKES]->(p:Posting)<-[:POSTED_BY]-(f)
RETURN c, count(*) AS count
}
RETURN c, 'customer' AS type, sum(count) AS endorsement
UNION
MATCH (s:Shop)-[:BUYS_FOOD_AT]->(f)
MATCH (s)-[:PLACES]->(a:Advertisement)-[:ABOUT]->(f)
RETURN s, 'shop' AS type, count(a) * 100 AS endorsement
}
RETURN f.name AS name, type, sum(endorsement) AS endorsement2. Read-only nested optional match and mandatory subqueries
This proposal also provides nested subquery forms of OPTIONAL MATCH and MANDATORY MATCH:
MANDATORY MATCH (p:Person {name: 'Petra'})
MANDATORY MATCH (conf:Conference {name: $conf})
MANDATORY {
WHERE conf.impact > 5
MATCH (p)-[:ATTENDS]->(conf)
RETURN conf
UNION
MATCH (p)-[:LIVES_IN]->(:City)<-[:IN]-(conf)
RETURN conf
}
OPTIONAL {
MATCH (p)-[:KNOWS]->(a:Attendee)-[:PUBLISHED_AT]->(conf)
RETURN a.name AS name
UNION
MATCH (p)-[:KNOWS]->(a:Attendee)-[:PRESENTED_AT]->(conf)
RETURN a.name AS name
}
RETURN name3. Read/Write nested simple and conditionally-updating subqueries
We illustrate these by means of an 'old' version of the query, in which FOREACH is used, followed by the 'new' version, using DO.
Using a single subquery - old version using FOREACH:
MATCH (r:Root)
FOREACH(x IN range(1, 10) |
MERGE (c:Child {id: x})
MERGE (r)-[:PARENT]->(c)
)Using a single subquery - new version using DO:
MATCH (r:Root)
UNWIND range(1, 10) AS x
DO {
MERGE (c:Child {id: x})
MERGE (r)-[:PARENT]->(c)
}Note how FOREACH is addressing two semantic concerns simultaneously; namely looping, and performing updates without affecting the cardinality of the outer query.
In the new version of the query shown above, these orthogonal concerns have been separated.
Looping is already handled by UNWIND, while DO suppresses the increased cardinality from the inner query.
DO also hides all new variable bindings introduced by the inner query from the outer query.
If DO is omitted from the new version of the query shown above, the variable c would become visible to the remainder of the query.
Doubly-nested subquery - old version using FOREACH:
MATCH (r:Root)
FOREACH (x IN range(1, 10) |
CREATE (r)-[:PARENT]->(c:Child {id: x})
MERGE (r)-[:PUBLISHES]->(t:Topic {id: r.id + x})
FOREACH (y IN range(1, 10) |
CREATE (c)-[p:PARENT]->(:Child {id: c.id * 10 + y})
SET p.id = c.id * 5 + y
)
)Doubly-nested subquery - new version using DO:
MATCH (r:Root)
UNWIND range(1, 10) AS x AS x
DO {
CREATE (r)-[:PARENT]->(c:Child {id: x})
MERGE (r)-[:PUBLISHES]->(t:Topic {id: r.id + x})
UNWIND range(1, 10) AS y
DO {
CREATE (c)-[p:PARENT]->(:Child {id: c.id * 10 + y})
SET p.id = c.id * 5 + y
}
}Conditional DO
MATCH (r:Root)
UNWIND range(1, 10) AS x
DO WHEN x % 2 = 1 THEN {
MERGE (c:Odd:Child {id: x})
MERGE (r)-[:PARENT]->(c)
}
ELSE {
MERGE (c:Even:Child {id: x})
MERGE (r)-[:PARENT]->(c)
}
ENDApart from the suggested deprecation of the FOREACH clause, nested read-only, write-only and read-write subqueries do not interact directly with any existing features.
The following types of subqueries are supported in SQL:
Scalar:
SELECT orderID
FROM Orders
WHERE orderID =
(SELECT max(orderID) FROM Orders)Multi-valued:
SELECT customerID
FROM Customers
WHERE customerID IN
(SELECT customerID FROM Orders)Correlated:
SELECT orderID, customerID
FROM Orders AS O1
WHERE orderID =
(SELECT max(O2.orderID) FROM Orders AS O2
WHERE O2.customerID = O1.customerID)Table-valued/table expression:
SELECT orderYear
FROM
(SELECT YEAR(orderDate) AS orderYear
FROM Orders) AS DScalar and list subqueries are addressed in the Scalar Subqueries and List Subqueries CIP.
SPARQL supports uncorrelated subqueries in the standard, exemplified by:
SELECT ?y ?minName
WHERE {
:alice :knows ?y .
{
SELECT ?y (MIN(?name) AS ?minName)
WHERE {
?y :name ?name .
} GROUP BY ?y
}
}Owing to the bottom-up nature of SPARQL query evaluation, the supported forms of subqueries are evaluated logically first, and the results are projected up to the outer query. Variables projected out of the subquery will be visible, or in scope, to the outer query.
-
Increasing the expressivity of the language.
-
Allowing unified post-processing on results from multiple (sub)queries; this is exemplified by the request for post-UNION processing.
-
Facilitating query readability, construction and maintainability.
-
Providing a feature familiar to users of SQL.