Permalink
Fetching contributors…
Cannot retrieve contributors at this time
177 lines (127 sloc) 5.7 KB

CIP2015-05-13 Existential subqueries

Authors: Andrés Taylor <andres@neotechnology.com>

Abstract

This CIP introduces EXISTS, a function and a keyword for checking the existence of properties, simple patterns and full subqueries.

1. Background

In a way, Cypher already has existential subqueries - pattern predicates checking for the existence of subgraphs. This form is helpful, but not exhaustive in the types of subqueries that users want to be able to express.

2. Proposal

To make this feature more powerful, this CIP suggests the addition of a new function exists(), and a keyword EXISTS {}, allowing for two different predicates:

  • Property existence checking

  • Subquery existence checking

2.1. Syntax

expression = <current definition of expression>
           | property exists
           | subquery exists
           | simple subquery exists
           ;

property exists = "exists", "(", expression ")"
                ;

subquery exists = "EXISTS", "{", read only clause, { read only clause }, "}" ;

simple subquery exists = "EXISTS", "{", simple match, "}" ;

simple match = pattern, { ",", pattern }, [ "WHERE", predicate ] ;

read only clause = match
                 | unwind
                 | with
                 ;

2.2. Semantics

All forms of EXISTS {} accomplish the same task: checking whether a particular pattern exists in the graph. They are expressions, and as such must be side-effect free; that is, the subqueries in EXISTS {} must not be updating queries. All forms of EXISTS {} are scalar expressions - they return a single boolean value, and this value is never null.

2.2.1. Property exists

In the property checking form, any expression is accepted as the input. If the input expression evaluates to null, exists() will evaluate to false. For any other value, it will evaluate to true. This form of exists() is equivalent to checking for null using IS NOT NULL: exists(expr) is equivalent to expr IS NOT NULL.

This form is there to make property existence checking idiomatic in Cypher - we do not want the user thinking of missing properties as checking for null values.

2.2.2. Subquery exists

When checking for the existence of subqueries, variables and parameters from the outer query are available in the subquery. Variables introduced in the subquery are not available on the outside.

For each matching subgraph evaluated with EXISTS {}, the result value must be true if the subquery finds at least one matching row. If no matches are found, false should be returned.

The subquery used in EXISTS {} has to follow the following rules:

  • It has to be a read-only query. Updating the graph as part of a predicate is not allowed.

  • It cannot end in a RETURN clause, nor may it contain UNION.

  • When WITH ends the subquery, it is able to change the cardinality of the subquery. In other words, since the WITH clause may include WHERE, SKIP, and/or LIMIT, the clause can with these sub-clauses turn a query that produces rows into one that does not.

  • It may use nested EXISTS {} predicates.

2.2.3. Simple subquery exists

When the subquery can be described with a single MATCH-WHERE clause, the MATCH keyword can be omitted, as in example 2A and B. The difference between this form and a simple pattern predicate, which is already available, is that this form allows for introducing new variables inside the EXISTS {} scope.

2.3. Examples

Example 1A:

Return all nodes that have a property named slogan.

MATCH (actor)
WHERE exists(actor.slogan)
RETURN actor

Example 2A:

Find all actors who have acted together with another actor with the same name.

MATCH (actor:Actor)
WHERE EXISTS {
  (other:Actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(actor)
  WHERE other.name = actor.name
}
RETURN actor

Example 2B:

Find all actors who have acted together with another actor with the same name on at least two movies.

MATCH (actor:Actor)
WHERE EXISTS {
  MATCH (other:Actor)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(actor)
  WHERE other.name = actor.name
  WITH other, count(*) as c
  WHERE c > 1
}
RETURN actor

2.4. Interaction with existing features

The EXISTS {} subquery clause renders obsolete the current pattern predicate syntax. This allows the pattern predicates to be deprecated and/or removed in favour of EXISTS {}.

3. What others do

This is very similar to what SQL does with its EXISTS functionality.

This is also very similar in syntax to what SPARQL does with its EXISTS functionality; the rules regarding variables are identical, and the inner query also takes a subquery as input.

4. Benefits to this proposal

The existing pattern predicate functionality is very useful, but does not cover all cases. Pattern predicates do not allow for introducing variables, which makes some queries - such as the one below - difficult to express succinctly:

MATCH (person:Person)
WHERE EXISTS {
  (person)-[:HAS_DOG]->(dog:Dog)
  WHERE person.name = dog.name
}
RETURN person

This proposal also allows for powerful subqueries, for example using aggregation inside the EXISTS {} query.

Find all teams that have at least two members who have worked on successful projects.
MATCH (team:Team)
WHERE EXISTS {
  MATCH (team)-[:HAS_MEMBER]->(member:Person)
  WHERE EXISTS {
	(member)-[:WORKED_ON]->(p:Project) WHERE p.successful
  }
  WITH team, count(*) AS numAPlayers
  WHERE numAPlayers > 2
}
RETURN team

5. Caveats to this proposal

Subqueries are powerful constructs. As such they can be difficult to understand, and difficult for a query planner to get right.