CIP2015-05-13 Existential subqueries
Authors: Andrés Taylor <andres@neotechnology.com>
This CIP introduces
EXISTS
, a function and a keyword for checking the existence of properties, simple patterns and full subqueries.
1. Background
In a way, Cypher already has existential subqueries - pattern predicates checking for the existence of subgraphs. This form is helpful, but not exhaustive in the types of subqueries that users want to be able to express.
2. Proposal
To make this feature more powerful, this CIP suggests the addition of a new function exists()
, and a keyword EXISTS {}
, allowing for two different predicates:
-
Property existence checking
-
Subquery existence checking
2.1. Syntax
expression = <current definition of expression> | property exists | subquery exists | simple subquery exists ; property exists = "exists", "(", expression ")" ; subquery exists = "EXISTS", "{", read only clause, { read only clause }, "}" ; simple subquery exists = "EXISTS", "{", simple match, "}" ; simple match = pattern, { ",", pattern }, [ "WHERE", predicate ] ; read only clause = match | unwind | with ;
2.2. Semantics
All forms of EXISTS {}
accomplish the same task: checking whether a particular pattern exists in the graph.
They are expressions, and as such must be side-effect free; that is, the subqueries in EXISTS {}
must not be updating queries.
All forms of EXISTS {}
are scalar expressions - they return a single boolean value, and this value is never null
.
2.2.1. Property exists
In the property checking form, any expression is accepted as the input.
If the input expression evaluates to null
, exists()
will evaluate to false
. For any other value, it will evaluate to true
.
This form of exists()
is equivalent to checking for null
using IS NOT NULL
: exists(expr)
is equivalent to expr IS NOT NULL
.
This form is there to make property existence checking idiomatic in Cypher - we do not want the user thinking of missing properties as checking for null
values.
2.2.2. Subquery exists
When checking for the existence of subqueries, variables and parameters from the outer query are available in the subquery. Variables introduced in the subquery are not available on the outside.
For each matching subgraph evaluated with EXISTS {}
, the result value must be true
if the subquery finds at least one matching row.
If no matches are found, false
should be returned.
The subquery used in EXISTS {}
has to follow the following rules:
-
It has to be a read-only query. Updating the graph as part of a predicate is not allowed.
-
It cannot end in a
RETURN
clause, nor may it containUNION
. -
When
WITH
ends the subquery, it is able to change the cardinality of the subquery. In other words, since theWITH
clause may includeWHERE
,SKIP
, and/orLIMIT
, the clause can with these sub-clauses turn a query that produces rows into one that does not. -
It may use nested
EXISTS {}
predicates.
2.2.3. Simple subquery exists
When the subquery can be described with a single MATCH-WHERE
clause, the MATCH
keyword can be omitted, as in example 2A and B.
The difference between this form and a simple pattern predicate, which is already available, is that this form allows for introducing new variables inside the EXISTS {}
scope.
2.3. Examples
Example 1A:
Return all nodes that have a property named slogan
.
MATCH (actor)
WHERE exists(actor.slogan)
RETURN actor
Example 2A:
Find all actors who have acted together with another actor with the same name.
MATCH (actor:Actor)
WHERE EXISTS {
(other:Actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(actor)
WHERE other.name = actor.name
}
RETURN actor
Example 2B:
Find all actors who have acted together with another actor with the same name on at least two movies.
MATCH (actor:Actor)
WHERE EXISTS {
MATCH (other:Actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(actor)
WHERE other.name = actor.name
WITH other, count(*) as c
WHERE c > 1
}
RETURN actor
2.4. Interaction with existing features
The EXISTS {}
subquery clause renders obsolete the current pattern predicate syntax.
This allows the pattern predicates to be deprecated and/or removed in favour of EXISTS {}
.
3. What others do
This is very similar to what SQL does with its EXISTS
functionality.
This is also very similar in syntax to what SPARQL does with its EXISTS
functionality; the rules regarding variables are identical, and the inner query also takes a subquery as input.
4. Benefits to this proposal
The existing pattern predicate functionality is very useful, but does not cover all cases. Pattern predicates do not allow for introducing variables, which makes some queries - such as the one below - difficult to express succinctly:
MATCH (person:Person)
WHERE EXISTS {
(person)-[:HAS_DOG]->(dog:Dog)
WHERE person.name = dog.name
}
RETURN person
This proposal also allows for powerful subqueries, for example using aggregation inside the EXISTS {}
query.
MATCH (team:Team)
WHERE EXISTS {
MATCH (team)-[:HAS_MEMBER]->(member:Person)
WHERE EXISTS {
(member)-[:WORKED_ON]->(p:Project) WHERE p.successful
}
WITH team, count(*) AS numAPlayers
WHERE numAPlayers > 2
}
RETURN team
5. Caveats to this proposal
Subqueries are powerful constructs. As such they can be difficult to understand, and difficult for a query planner to get right.