New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path Pattern Queries #187

Open
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@thobe
Contributor

thobe commented Feb 6, 2017

This is first draft of a proposal for adding Path Patterns to Cypher. There is still work to be done here before this is finalised.

CIP2017-02-06

@thobe thobe added the CIP label Feb 6, 2017

@thobe

This comment has been minimized.

Contributor

thobe commented Feb 6, 2017

This proposal should eventually be able to fulfil the requirements outlined in #179.

The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction.

This comment has been minimized.

@Mats-SX

Mats-SX Feb 7, 2017

Member

We could use 'outgoing' instead of 'left-to-right', and 'incoming' instead of 'right-to-left'.

This comment has been minimized.

@thobe

thobe Feb 7, 2017

Contributor

That would imply a particular traversal order, which I think is saying too much, with left-to-right and right-to-left we are only talking about the direction with regard to how the pattern is written.

In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.
The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive.

This comment has been minimized.

@Mats-SX

Mats-SX Feb 7, 2017

Member

Will it be the job of the query compiler to determine reflexiveness of the defined predicate, and issue an error if non-reflexive predicates are defined (or used) without direction?

This comment has been minimized.

@thobe

thobe Feb 7, 2017

Contributor

I'd say that either the query compiler issues an error, or the behaviour of the path predicate is undefined (i.e. the behaviour in such a case is outside of the scope of the specification).

I'd prefer for an implementation to issue an error, but I don't know how hard that analysis is to perform, and I thus don't want to mandate that at this point (it needs further analysis).

----
In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.

This comment has been minimized.

@Mats-SX

Mats-SX Feb 7, 2017

Member

This applies to the definition of a predicate, right? I imagine that it will be perfectly possible to reverse the direction in the Regular Path Pattern, like so:

MATCH (a)<-/pred/-(b)
PATH (a)-/pred/->(b) IS
     (a)-[:KNOWS]->(b)

How about it being used in an undirected Regular Path Pattern?

MATCH (a)-/pred/-(b)
PATH (a)-/pred/->(b) IS
     (a)-[:KNOWS]->(b)

This comment has been minimized.

@thobe

thobe Feb 7, 2017

Contributor

Both of those are perfectly valid. The undirected use of a directed defined path predicate is the same as an OR/UNION between the two directions.

I.e. (a)-/pred/-(b) is the same as (a)-/<pred | pred>/-(b) (which is the same as (a)-/pred> | <pred/-(b) or (a)-/<pred>/-(b) or (a)<-/pred/->(b)).

== Regular Path Patterns
Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
This functionality is called Regular Path Patterns.

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

I think it would be a good idea to mention somewhere that these are essentially RPQs, since RPQs are the standard term for these sorts of queries.

Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
This functionality is called Regular Path Patterns.
A Regular Path Pattern is defined as:

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

For 1 - 4 below, it would be great if these could be supplemented with regex notation

This comment has been minimized.

@thobe

thobe Mar 27, 2017

Contributor

As in an example of what the syntax looks like?

Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable.
In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`).
This avoids a problem that existed in the past with repetition of relationships (a syntax that was deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships.

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

Is it not meant to be "a syntax that is deprecated" (as RPPs are only being introduced now)?

This comment has been minimized.

@thobe

thobe Mar 27, 2017

Contributor

I've tried writing the text as if Regular Path Patterns are already in the language, since when this text is merged, it will be in the language. Although is works in that case as well.

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

Aha - ok, that makes sense

The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction.
It is possible to both prefix the part with `<` and suffixing it with `>`, giving that part the interpretation of being undirected.

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

"suffixing" -> "suffix"

This comment has been minimized.

@petraselmer

petraselmer Mar 27, 2017

Contributor

The "giving that part the interpretation of being undirected" section is a bit unwieldy - maybe something along the lines of "indicating that that part of the pattern is undirected"

@thobe

This comment has been minimized.

Contributor

thobe commented Mar 31, 2017

I've updated the document to unify the Path Pattern syntax with normal Pattern syntax. I have not yet updated the grammar to reflect that, I'll do that imminently.

@thobe

This comment has been minimized.

Contributor

thobe commented Mar 31, 2017

I think it is important to note which things that is currently valid Cypher will have changed semantics under this proposal:

  • Binding a variable under repetition will no longer be possible. I.e. this pattern is now invalid: (a)-[rels*]->(b), the way to achieve such a binding is by binding the pattern to a path variable (p=(a)-[-*]->(b)) and using the relationships function to access the relationships of the path (WITH *, relationships(p) AS rels).
  • Property predicates will have a different scope if colons are used when separating alternatives of relationship types. The pattern ()-[:FOO|:BAR{baz=17}]-() used to mean any relationship of type FOO or type BAR where the property baz has value 17, but under this proposal the semantics would be any relationship of type FOO or any relationship of type BAR where the property baz has value 17, i.e. under the old semantics the property predicate applied to both relationship types, but under the new semantics it only applies to the BAR relationships. The old semantics are still used if colon is not used in the listing of relationship types as in ()-[:FOO|BAR{baz=17}]-().

@thobe thobe changed the title from Regular Path Queries to Path Pattern Queries May 10, 2017

@thobe

This comment has been minimized.

Contributor

thobe commented Jun 19, 2017

Notable example queries that cannot be expressed using this syntax includes:

  • Paths between two nodes that does not match a given pattern.
    Cypher can still express such a cartesian product, but not as a path.
  • Paths where the value of a given property in all nodes of the path differs from the value of that property at the first node of the path.
    This is one of the canonical examples of Regular Expressions with Memory (REMs)
  • Paths where a certain property has a different value in all nodes.
The direction of each relationship is governed by the overall direction of the Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffix it with `>` for a left-to-right direction.
It is possible to both prefix the part with `<` and suffix it with `>`, indicating that this part of the pattern matches in any direction.

This comment has been minimized.

@Mats-SX

Mats-SX Jun 20, 2017

Member

This section is mostly repetition of the section Directions above. Perhaps replace this with a reference to that section? Something like

Using the arrowhead syntax introduced in [[directions]], consider the following query

[source, cypher]
.Find chains of co-authorship
----
PATH PATTERN co_author = (a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b)

This comment has been minimized.

@Mats-SX

Mats-SX Jun 20, 2017

Member

The example DPP does define direction of the relationships, which perhaps it should not, in order to be an example of when leaving the definition out is okay (reflexivity)?

This comment has been minimized.

@thobe

thobe Jun 21, 2017

Contributor

But it is an example of when the named path pattern itself is undirected - and the direction is left out on the next row.

This comment has been minimized.

@Mats-SX

Mats-SX Jun 21, 2017

Member

Ah, so the meaning is that it is allowed to omit the direction of a path pattern that references a DPP, but only when the DPP is reflexive. Got it.

@petraselmer

This comment has been minimized.

Contributor

petraselmer commented Jun 25, 2017

Two things:

  1. Please can some more examples be added to further exemplify the use of this feature?

For example, drawn from life sciences, it would be good to see these queries:
Query 1) 'Return all pairs of directly-connected nodes (a,b) where every second node in the path must have label A.'
Query 2) 'Return all pairs of directly-connected nodes (a,b) where there are at least 2 instances of a node labelled with X linked to a node labelled with Y in the path.'

  1. Your comment beginning 'Notable example queries that cannot be expressed using this syntax includes....' is very pertinent. Can this be added to the CIP itself for the purposes of rigour and completeness?

thobe added some commits Jun 29, 2017

Use 'Named Path Predicates'
Instead of 'Defined Path Predicates'
Add more examples
Including examples of what cannot be expressed, or is hard to express.
==== Differing property values along a path
While it is possible to express that a certain property should have the same value for all nodes in a path (by saying that each pair of nodes should have the same property value), it is not possible to express that all nodes should have a _different_ property value.
It has been shown that computing such paths would not be tractable in the general case, so perhaps it is a good thing to not be able to express this.

This comment has been minimized.

@Mats-SX

Mats-SX Jul 6, 2017

Member

Perhaps we should add a reference to this claim.

@Mats-SX Mats-SX added the oCIG label Jul 26, 2017

@thobe

This comment has been minimized.

Contributor

thobe commented Oct 11, 2017

I wrote up some slides that give an overview of the history of this proposal, what is has been influenced by and what other things has been influenced by it. This information should provide some insight into some of the design choices made in this CIP.

The slides are available on the opencypher.org/references page

@boggle boggle added the cypher10 label May 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment