Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path Pattern Queries #187

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Path Pattern Queries #187

wants to merge 8 commits into from

Conversation

thobe
Copy link
Contributor

@thobe thobe commented Feb 6, 2017

This is first draft of a proposal for adding Path Patterns to Cypher. There is still work to be done here before this is finalised.

CIP2017-02-06

@thobe thobe added the CIP label Feb 6, 2017
@thobe
Copy link
Contributor Author

thobe commented Feb 6, 2017

This proposal should eventually be able to fulfil the requirements outlined in #179.

Copy link
Member

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice CIP!


The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use 'outgoing' instead of 'left-to-right', and 'incoming' instead of 'right-to-left'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would imply a particular traversal order, which I think is saying too much, with left-to-right and right-to-left we are only talking about the direction with regard to how the pattern is written.


In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.
The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be the job of the query compiler to determine reflexiveness of the defined predicate, and issue an error if non-reflexive predicates are defined (or used) without direction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that either the query compiler issues an error, or the behaviour of the path predicate is undefined (i.e. the behaviour in such a case is outside of the scope of the specification).

I'd prefer for an implementation to issue an error, but I don't know how hard that analysis is to perform, and I thus don't want to mandate that at this point (it needs further analysis).

----

In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to the definition of a predicate, right? I imagine that it will be perfectly possible to reverse the direction in the Regular Path Pattern, like so:

MATCH (a)<-/pred/-(b)
PATH (a)-/pred/->(b) IS
     (a)-[:KNOWS]->(b)

How about it being used in an undirected Regular Path Pattern?

MATCH (a)-/pred/-(b)
PATH (a)-/pred/->(b) IS
     (a)-[:KNOWS]->(b)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those are perfectly valid. The undirected use of a directed defined path predicate is the same as an OR/UNION between the two directions.

I.e. (a)-/pred/-(b) is the same as (a)-/<pred | pred>/-(b) (which is the same as (a)-/pred> | <pred/-(b) or (a)-/<pred>/-(b) or (a)<-/pred/->(b)).

== Regular Path Patterns

Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
This functionality is called Regular Path Patterns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to mention somewhere that these are essentially RPQs, since RPQs are the standard term for these sorts of queries.

Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
This functionality is called Regular Path Patterns.

A Regular Path Pattern is defined as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1 - 4 below, it would be great if these could be supplemented with regex notation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in an example of what the syntax looks like?


Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable.
In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`).
This avoids a problem that existed in the past with repetition of relationships (a syntax that was deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not meant to be "a syntax that is deprecated" (as RPPs are only being introduced now)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried writing the text as if Regular Path Patterns are already in the language, since when this text is merged, it will be in the language. Although is works in that case as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha - ok, that makes sense

The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffixing it with `>` for a left-to-right direction.
It is possible to both prefix the part with `<` and suffixing it with `>`, giving that part the interpretation of being undirected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"suffixing" -> "suffix"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "giving that part the interpretation of being undirected" section is a bit unwieldy - maybe something along the lines of "indicating that that part of the pattern is undirected"

@thobe thobe force-pushed the rpq branch 3 times, most recently from 5b64198 to b317753 Compare March 31, 2017 14:47
@thobe
Copy link
Contributor Author

thobe commented Mar 31, 2017

I've updated the document to unify the Path Pattern syntax with normal Pattern syntax. I have not yet updated the grammar to reflect that, I'll do that imminently.

@thobe
Copy link
Contributor Author

thobe commented Mar 31, 2017

I think it is important to note which things that is currently valid Cypher will have changed semantics under this proposal:

  • Binding a variable under repetition will no longer be possible. I.e. this pattern is now invalid: (a)-[rels*]->(b), the way to achieve such a binding is by binding the pattern to a path variable (p=(a)-[-*]->(b)) and using the relationships function to access the relationships of the path (WITH *, relationships(p) AS rels).
  • Property predicates will have a different scope if colons are used when separating alternatives of relationship types. The pattern ()-[:FOO|:BAR{baz=17}]-() used to mean any relationship of type FOO or type BAR where the property baz has value 17, but under this proposal the semantics would be any relationship of type FOO or any relationship of type BAR where the property baz has value 17, i.e. under the old semantics the property predicate applied to both relationship types, but under the new semantics it only applies to the BAR relationships. The old semantics are still used if colon is not used in the listing of relationship types as in ()-[:FOO|BAR{baz=17}]-().

thobe added 2 commits May 10, 2017 09:40
Also update examples to fit updated syntax.
@thobe thobe changed the title Regular Path Queries Path Pattern Queries May 10, 2017
@thobe
Copy link
Contributor Author

thobe commented Jun 19, 2017

Notable example queries that cannot be expressed using this syntax includes:

  • Paths between two nodes that does not match a given pattern.
    Cypher can still express such a cartesian product, but not as a path.
  • Paths where the value of a given property in all nodes of the path differs from the value of that property at the first node of the path.
    This is one of the canonical examples of Regular Expressions with Memory (REMs)
  • Paths where a certain property has a different value in all nodes.

The direction of each relationship is governed by the overall direction of the Path Pattern.
It is however possible to explicitly define the direction for a particular part of the pattern.
This is done by either prefixing that part with `<` for a right-to-left direction or suffix it with `>` for a left-to-right direction.
It is possible to both prefix the part with `<` and suffix it with `>`, indicating that this part of the pattern matches in any direction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is mostly repetition of the section Directions above. Perhaps replace this with a reference to that section? Something like

Using the arrowhead syntax introduced in [[directions]], consider the following query

[source, cypher]
.Find chains of co-authorship
----
PATH PATTERN co_author = (a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example DPP does define direction of the relationships, which perhaps it should not, in order to be an example of when leaving the definition out is okay (reflexivity)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is an example of when the named path pattern itself is undirected - and the direction is left out on the next row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the meaning is that it is allowed to omit the direction of a path pattern that references a DPP, but only when the DPP is reflexive. Got it.

@petraselmer
Copy link

Two things:

  1. Please can some more examples be added to further exemplify the use of this feature?

For example, drawn from life sciences, it would be good to see these queries:
Query 1) 'Return all pairs of directly-connected nodes (a,b) where every second node in the path must have label A.'
Query 2) 'Return all pairs of directly-connected nodes (a,b) where there are at least 2 instances of a node labelled with X linked to a node labelled with Y in the path.'

  1. Your comment beginning 'Notable example queries that cannot be expressed using this syntax includes....' is very pertinent. Can this be added to the CIP itself for the purposes of rigour and completeness?

Instead of 'Defined Path Predicates'
Including examples of what cannot be expressed, or is hard to express.
==== Differing property values along a path

While it is possible to express that a certain property should have the same value for all nodes in a path (by saying that each pair of nodes should have the same property value), it is not possible to express that all nodes should have a _different_ property value.
It has been shown that computing such paths would not be tractable in the general case, so perhaps it is a good thing to not be able to express this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should add a reference to this claim.

@Mats-SX Mats-SX added the oCIG label Jul 26, 2017
@thobe
Copy link
Contributor Author

thobe commented Oct 11, 2017

I wrote up some slides that give an overview of the history of this proposal, what is has been influenced by and what other things has been influenced by it. This information should provide some insight into some of the design choices made in this CIP.

The slides are available on the opencypher.org/references page


[source, ebnf]
----
NamedPathPredicate = 'PATH', 'PATTERN', NamedPathName, '=', PathPattern, [Where] ;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like examples are not match with grammar for NamedPathPredicate. In examples like

PATH PATTERN unreciprocated_love = (a)-[:LOVES]->(b)

we see, that it should be something like PatternPart (maybe without [Variable, '='] part) in NamedPathPredicate after = instead of PathPattern.

So, I think that correct rule should be

NamedPathPredicate   = 'PATH', 'PATTERN', NamedPathName, '=', NodePattern, {(EdgePattern | PathPattern), NodePattern}, [Where] ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CIP cypher10 This work targets Cypher 10 oCIG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants