Skip to content

Latest commit

 

History

History
470 lines (355 loc) · 15.6 KB

File metadata and controls

470 lines (355 loc) · 15.6 KB

CIP2016-06-22 - Nested subqueries

1. Motivation

Subqueries - i.e. queries within queries - are a powerful and expressive feature allowing for:

  • Increased query expressivity

  • Better query construction and readability

  • Easier query composition and reuse

  • Post-processing results from multiple queries as a single unit

  • Performing a sequence of multiple write commands for each record

2. Background

This CIP may be viewed in light of the EXISTS CIP, the Scalar Subqueries and List Subqueries CIP, and the Map Projection CIP, all of which propose variants of subqueries. In contrast, this CIP focusses on subqueries operating at a clause level while the EXISTS CIP and Map Projection CIP propose subqueries operating at an expression level.

3. Proposal

Nested subqueries are self-contained Cypher queries that are usually run within the scope of an outer Cypher query.

This proposal suggests the introduction of new nested subquery constructs to Cypher.

  • Read-only nested simple subqueries of the form { …​ RETURN …​ }

  • Read-only nested chained subqueries of the form THEN { …​ RETURN …​ }

  • Read-only nested optional subqueries of the form OPTIONAL { …​ RETURN …​ }

  • Read-only nested mandatory subqueries of the form MANDATORY { …​ RETURN …​ }

  • Read/Write nested simple updating subqueries of the form DO { …​ } (inner query not ending with RETURN)

  • Read/Write nested conditionally-updating subqueries of the form DO [WHEN cond THEN { …​ }]+ [ELSE { …​ }] END (inner queries not ending with RETURN)

A nested simple subquery consists of an inner query in curly braces.

All other nested subquery constructs are introduced with a keyword in conjunction with an inner query in curly braces.

Nested subqueries may be correlated - i.e. the inner query may use variables from the outer query - or uncorrelated.

Nested subqueries can be contained within other nested subqueries at an arbitrary (but finite) depth.

Read/Write nested subqueries cannot be contained within other read-only nested subqueries.

Finally, this CIP proposes new shorthand syntax for starting a query with WHERE, along with the ability to specify that no fields are to be returned through the introduction of WITH -, RETURN -, and YIELD -.

1. Read-only nested simple subqueries

We propose the addition of read-only nested simple subqueries as a new form of read-only Cypher query.

A nested read-only simple subquery is denoted using the following syntax: { <inner-query> }.

The inner query can be any complete read-only Cypher query.

A nested read-only simple subquery may only be used as a primary clause, i.e. as a

  • top-level Cypher query,

  • inner query of another nested subquery,

  • inner query of another expression-level subquery (such as a pattern comprehension, or an EXISTS subquery),

  • argument query to UNION and similar clause-level binary operators

A nested read-only simple subquery may not be used as a secondary clause after a preceding primary clause. (However, a nested read-only chained subquery may be used in this case.)

2. Read-only nested chained subqueries

We propose the addition of read-only nested chained subqueries for using nested subqueries in a similar position as a secondary clause. This is called subquery chaining.

After a chain of clauses that together form a query, a new nested chained subquery may be introduced as a secondary clause using the THEN keyword followed by an inner query in curly braces, i.e. it is denoted using the following syntax: …​ THEN { <inner-query> }. THEN is a query combinator and more details may be found in the Query Combinator CIP.

3. Read-only nested optional subqueries

We propose the addition of a new OPTIONAL clause for expressing read-only nested optional subqueries.

A read-only nested optional subquery is denoted by the following syntax: OPTIONAL { <inner-query> }.

4. Read-only nested mandatory subqueries

We propose the addition of a new MANDATORY clause for expressing read-only nested mandatory subqueries.

A read-only nested mandatory subquery is denoted by the following syntax: MANDATORY { <inner-query> }.

4. Read/Write nested simple updating subqueries

We propose the addition of a new DO clause for expressing read/write nested simple updating subqueries that do not return any data.

A read/write nested simple updating subquery is denoted by the following syntax: DO { <inner-update-query> }.

Any updating Cypher query from which the trailing final RETURN clause has been omitted may be used as an inner update query.

We additionally propose removing the FOREACH clause from the current language as it is rendered obsolete by the introduction of DO.

5. Read/Write nested conditionally-updating subqueries

We propose the addition of a second form of the DO clause for expressing read/write nested conditionally-updating subqueries that do not return any data.

A read/write nested conditionally-updating subquery is denoted by the following syntax:

DO
  [WHEN <cond> THEN <inner-update-query>]+
  [ELSE <inner-update-query>]
END

Evaluation proceeds as follows:

  • Semantically, the WHEN conditions are tested in the order given, and the inner updating query is executed for only the first condition that evaluates to true.

  • If no given WHEN condition evaluates to true and an ELSE branch is provided, the inner updating query of the ELSE branch is executed.

  • If no given WHEN condition evaluates to true and no ELSE branch is provided, no updates will be executed.

6. Shorthand syntax

We propose the addition of a new clause WHERE <cond> <subclauses> as a shorthand syntax for WITH * WHERE <cond> THEN { <subclauses> }. The idea is for this to be used exclusively as a primary clause; for example, as the first clause of a nested subquery.

We propose the addition of a new projection clauses of the form WITH - and RETURN -, which will retain the input cardinality but project no result fields. This allows for only checking the cardinality in a read-only nested mandatory subquery.

We propose the addition of a new subclause to CALL of the form YIELD -, which will retain the output cardinality of a call but project no result fields. This allows for only checking the cardinality in an EXISTS subquery.

3.1. Semantic clarification

1. Read-only nested subqueries

Conceptually, a nested subquery is evaluated for each incoming record and may produce an arbitrary number of result records.

The rules regarding variable scoping are detailed as follows:

  • All incoming variables remain in scope throughout the whole subquery.

  • When evaluating the subquery, any new variable bindings introduced by the final RETURN clause will augment the variable bindings of the initial record.

  • It is valid (though redundant) if incoming variables from the outer scope are passed on explicitly by any projection clause of the subquery (including the final RETURN).

  • Nested subqueries therefore cannot shadow variables present in the outer scope, and thus behave in the same way as UNWIND and CALL with regard to the introduction of new variable bindings.

  • Any other variable bindings that are introduced temporarily in the subquery will not be visible to the outer scope.

Subqueries interact with write clauses in the same way as MATCH does.

2. Read/Write subqueries

Execution of a DO subquery does not change the cardinality; i.e. the inner update query is run for each incoming record.

Any input record is always passed on to the clause succeeding the DO subquery, irrespective of whether it was eligible for processing by any inner update query.

A DO clause that uses WHEN sub-clause is called a conditional DO.

A query may end with a DO subquery in the same way that a query can currently end with any update clause.

3.2. Examples

1. Read-only nested simple and chained subqueries

Post-UNION processing:

{
  // authored tweets
  MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
        (user)<-[:AUTHORED]-(tweet:Tweet)
  RETURN tweet, tweet.time AS time, user.country AS country
  UNION
  // favorited tweets
  MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
        (user)<-[:HAS_FAVOURITE]-(favorite:Favorite)-[:TARGETS]->(tweet:Tweet)
  RETURN tweet, favourite.time AS time, user.country AS country
}
WHERE country = 'se'
RETURN DISTINCT tweet
ORDER BY time DESC
LIMIT 10

Uncorrelated nested subquery:

MATCH (f:Farm {id: $farmId})
THEN {
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(p:Lawnmower)
  RETURN b.name AS name, p.code AS code
  UNION
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(v:Vehicle),
        (v)<-[:IS_A]-(:Category {name: 'Tractor'})
  RETURN b.name AS name, p.code AS code
}
RETURN f, name, code

Correlated nested subquery:

MATCH (f:Farm {id: $farmId})-[:IS_IN]->(country:Country)
THEN {
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(p:Lawnmower)
  RETURN b.name AS name, p.code AS code
  UNION
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(v:Vehicle),
        (v)<-[:IS_A]-(:Category {name: 'Tractor'})
  WHERE v.leftHandDrive = country.leftHandDrive
  RETURN b.name AS name, p.code AS code
}
RETURN f, name, code

Filtered and correlated nested subquery:

MATCH (f:Farm)-[:IS_IN]->(country:Country)
WHERE country.name IN $countryNames
THEN {
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(p:Lawnmower)
  RETURN b AS brand, p.code AS code
  UNION
  MATCH (u:User {id: $userId})-[:LIKES]->(b:Brand),
        (b)-[:PRODUCES]->(v:Vehicle),
        (v)<-[:IS_A]-(:Category {name: 'Tractor'})
  WHERE v.leftHandDrive = country.leftHandDrive
  RETURN b AS brand, p.code AS code
}
WHERE f.type = 'organic'
  AND b.certified
RETURN f, brand.name AS name, code

Doubly-nested subquery:

MATCH (f:Farm {id: $farmId})
THEN {
  MATCH (c:Customer)-[:BUYS_FOOD_AT]->(f)
  THEN {
     MATCH (c)-[:RETWEETS]->(t:Tweet)<-[:TWEETED_BY]-(f)
     RETURN c, count(*) AS count
     UNION
     MATCH (c)-[:LIKES]->(p:Posting)<-[:POSTED_BY]-(f)
     RETURN c, count(*) AS count
  }
  RETURN c, 'customer' AS type, sum(count) AS endorsement
  UNION
  MATCH (s:Shop)-[:BUYS_FOOD_AT]->(f)
  MATCH (s)-[:PLACES]->(a:Advertisement)-[:ABOUT]->(f)
  RETURN s, 'shop' AS type, count(a) * 100 AS endorsement
}
RETURN f.name AS name, type, sum(endorsement) AS endorsement

2. Read-only nested optional match and mandatory subqueries

This proposal also provides nested subquery forms of OPTIONAL MATCH and MANDATORY MATCH:

MANDATORY MATCH (p:Person {name: 'Petra'})
MANDATORY MATCH (conf:Conference {name: $conf})
MANDATORY {
    WHERE conf.impact > 5
    MATCH (p)-[:ATTENDS]->(conf)
    RETURN conf
    UNION
    MATCH (p)-[:LIVES_IN]->(:City)<-[:IN]-(conf)
    RETURN conf
}
OPTIONAL {
    MATCH (p)-[:KNOWS]->(a:Attendee)-[:PUBLISHED_AT]->(conf)
    RETURN a.name AS name
    UNION
    MATCH (p)-[:KNOWS]->(a:Attendee)-[:PRESENTED_AT]->(conf)
    RETURN a.name AS name
}
RETURN name

3. Read/Write nested simple and conditionally-updating subqueries

We illustrate these by means of an 'old' version of the query, in which FOREACH is used, followed by the 'new' version, using DO.

Using a single subquery - old version using FOREACH:

MATCH (r:Root)
FOREACH(x IN range(1, 10) |
  MERGE (c:Child {id: x})
  MERGE (r)-[:PARENT]->(c)
)

Using a single subquery - new version using DO:

MATCH (r:Root)
UNWIND range(1, 10) AS x
DO {
  MERGE (c:Child {id: x})
  MERGE (r)-[:PARENT]->(c)
}

Note how FOREACH is addressing two semantic concerns simultaneously; namely looping, and performing updates without affecting the cardinality of the outer query. In the new version of the query shown above, these orthogonal concerns have been separated. Looping is already handled by UNWIND, while DO suppresses the increased cardinality from the inner query.

DO also hides all new variable bindings introduced by the inner query from the outer query. If DO is omitted from the new version of the query shown above, the variable c would become visible to the remainder of the query.

Doubly-nested subquery - old version using FOREACH:

MATCH (r:Root)
FOREACH (x IN range(1, 10) |
  CREATE (r)-[:PARENT]->(c:Child {id: x})
  MERGE (r)-[:PUBLISHES]->(t:Topic {id: r.id + x})
  FOREACH (y IN range(1, 10) |
    CREATE (c)-[p:PARENT]->(:Child {id: c.id * 10 + y})
    SET p.id = c.id * 5 + y
  )
)

Doubly-nested subquery - new version using DO:

MATCH (r:Root)
UNWIND range(1, 10) AS x AS x
DO {
  CREATE (r)-[:PARENT]->(c:Child {id: x})
  MERGE (r)-[:PUBLISHES]->(t:Topic {id: r.id + x})
  UNWIND range(1, 10) AS y
  DO {
    CREATE (c)-[p:PARENT]->(:Child {id: c.id * 10 + y})
    SET p.id = c.id * 5 + y
  }
}

Conditional DO

MATCH (r:Root)
UNWIND range(1, 10) AS x
DO WHEN x % 2 = 1 THEN {
      MERGE (c:Odd:Child {id: x})
      MERGE (r)-[:PARENT]->(c)
  }
  ELSE {
      MERGE (c:Even:Child {id: x})
      MERGE (r)-[:PARENT]->(c)
  }
END

3.3. Interaction with existing features

Apart from the suggested deprecation of the FOREACH clause, nested read-only, write-only and read-write subqueries do not interact directly with any existing features.

3.4. Alternatives

Alternative syntax has been considered during the production of this document:

  • Using round braces; i.e. MATCH (…​)

  • Using alternative keywords:

    • SUBQUERY

    • QUERY

4. What others do

4.1. SQL

The following types of subqueries are supported in SQL:

Scalar:

SELECT orderID
FROM Orders
WHERE orderID =
  (SELECT max(orderID) FROM Orders)

Multi-valued:

SELECT customerID
FROM Customers
WHERE customerID IN
  (SELECT customerID FROM Orders)

Correlated:

SELECT orderID, customerID
FROM Orders AS O1
WHERE orderID =
  (SELECT max(O2.orderID) FROM Orders AS O2
   WHERE O2.customerID = O1.customerID)

Table-valued/table expression:

SELECT orderYear
FROM
  (SELECT YEAR(orderDate) AS orderYear
  FROM Orders) AS D

Scalar and list subqueries are addressed in the Scalar Subqueries and List Subqueries CIP.

4.2. SPARQL

SPARQL supports uncorrelated subqueries in the standard, exemplified by:

SELECT ?y ?minName
WHERE {
  :alice :knows ?y .
 {
    SELECT ?y (MIN(?name) AS ?minName)
    WHERE {
      ?y :name ?name .
    } GROUP BY ?y
  }
}

Owing to the bottom-up nature of SPARQL query evaluation, the supported forms of subqueries are evaluated logically first, and the results are projected up to the outer query. Variables projected out of the subquery will be visible, or in scope, to the outer query.

5. Benefits to this proposal

  • Increasing the expressivity of the language.

  • Allowing unified post-processing on results from multiple (sub)queries; this is exemplified by the request for post-UNION processing.

  • Facilitating query readability, construction and maintainability.

  • Providing a feature familiar to users of SQL.

6. Caveats to this proposal

At the current time, we are not aware of any caveats.