Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying Cypher constraints #166

Open
wants to merge 30 commits into
base: master
from
Open

Conversation

@Mats-SX
Copy link
Member

@Mats-SX Mats-SX commented Dec 15, 2016

Specifies syntax for constraints and specifies three concrete constraints: node property uniqueness, node property existence, and relationship property existence.

CIP

@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 6a73206 to 73f9909 Jan 11, 2017
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 73f9909 to 1fd0cbf Jan 20, 2017
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 1fd0cbf to 942570a Jan 30, 2017
Copy link
Contributor

@petraselmer petraselmer left a comment

Looks good! Some comments...

cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
@thobe
Copy link
Contributor

@thobe thobe commented Feb 28, 2017

I thought the idea was that we were going to specify a syntax for constraints without specifying which particular constraints an implementation should support. The syntax definition here explicitly talks only about uniqueness-constraint and existence-constraint.

@Mats-SX
Copy link
Member Author

@Mats-SX Mats-SX commented Mar 1, 2017

CIP has now been reworked to try and fit the model discussed.

cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
* `UNIQUE` - Ensures that each row for a column must have a unique value.
* `PRIMARY KEY` - A combination of a `NOT NULL` and `UNIQUE`. Ensures that a column (or a combination of two or more columns) has a unique identity, reducing the resources required to locate a specific record in a table.
* `FOREIGN KEY` - Ensures the referential integrity of the data in one table matches values in another table.
* `CHECK` - Ensures that the value in a column meets a specific condition

This comment has been minimized.

@boggle

boggle Mar 1, 2017
Contributor

Maybe we should have a leading keyword like that for non-UNIQUE constraints as well?

This comment has been minimized.

@thobe

thobe Mar 7, 2017
Contributor

if so, I would propose the word THAT, since it reads nicely...

CREATE CONSTRAINT FOR (x:Foo) REQUIRE THAT ...
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch 3 times, most recently from 4121971 to 22a7f30 Mar 6, 2017
@thobe
Copy link
Contributor

@thobe thobe commented Mar 7, 2017

PRIMARY KEY is not a helpful name for the concept it is used for describing.

On the high level there are two reasons for this:

  1. The word "primary" does not mean anything that is helpful in this context.
  2. The concept of primary keys carries with it a lot of associations from relational databases, many of which do not apply to the property graph model.
    • it should be noted that for the relational database case the word "primary" does have relevant meaning.

Diving further into these reasons, starting with the word "primary":

  • It implies that there is such a thing as a "secondary" key as well. In a relational database a secondary key is any index on a table other than the primary key.
  • It implies that this key has higher importance than any other key. While this might be true in many actual domain models, it is not always the case - in some cases there are other keys of equal importance.
  • It carries with it the association that there can be only one primary key. In many implementations - Neo4j being one of them - there is absolutely no need to enforce such a constraint on the ability to model your data.
  • It implies that the key needs to be defined first - before any data is inserted. In many implementation - Neo4j being one of them - this is not the case.

As for the aspects of primary keys in relational databases that do not apply to the property graph model:

  • The notion of there being a primary key implies that there might also be a foreign key - the idea of having foreign keys in a graph is quite silly, since we have direct relationships.
  • Coming from relational databases, I would expect preferential treatment for primary keys over any other (secondary) keys. I would expect lookup based on the primary key to be faster than any other key, since I would expect the data in that table to be structured by the primary key. In essence I would expect that leaf nodes in the index for the primary key being the actual row, with all of its data. Whereas for a secondary key the leaf node would only point to the actual row in the primary structure - there would be indirection for accessing the full data by a secondary key, thus penalizing access by secondary key.
    • Again this would not be true in for example Neo4j, where every key is actually secondary.
  • Relational databases equate the primary key with identity, the property graph (at least in some implementations - for example Neo4j) has a separate notion of identity, and while the type of key we are proposing to add to the model would allow you to uniquely identify an entity, it would not necessarily identify that same entity forever - the same entity might change some of the values of the key and thus be identified by a different key, but still have the same identity.

The only reason I can think of for introducing the concept of a primary key in Cypher is for being able to map Cypher onto a relational database model. If that is the case I would much rather see this proposed from a vendor working on such a mapping, since they would have the insight into what needs to be modeled.

I do think that the notion of a unique indexed key of mandatory properties is helpful, and I see the benefit of elevating such a concept to the status of receiving its own syntax, but I don't think PRIMARY KEY is a good name for it.

==== Mutability

Once a constraint has been created, it may not be amended.
Should a user wish to change its definition, it has to be dropped and recreated with an updated structure.

This comment has been minimized.

@thobe

thobe Mar 7, 2017
Contributor

Should we have a note here that transactional implementations could do both the dropping and recreation in the same transaction so that the constraint is atomically mutated? This would of course allow leaving the old constraint in place should the creation of the new constraint fail.

@thobe
Copy link
Contributor

@thobe thobe commented Mar 7, 2017

I wonder if the notion of unique key (that at the moment are erroneously called primary key) should really be a constraint, or if it should have its own syntax, something like:

CREATE KEY FOR (p:Person) AS p.name, p.address

The reason for this being that it actually implies multiple constraints, and typically also an index. Since it is a composed concept like that, perhaps it would be sensible to elevate it to being a syntactical concept of its own.

In the syntax for this, if accepted, we should allow an optional name for the key as well, just like we do for constraints.

@Mats-SX
Copy link
Member Author

@Mats-SX Mats-SX commented Mar 7, 2017

The CIP now uses ADD, NODE KEY and details a return record. I also took several review comments into account (thanks!).

@thobe
thobe approved these changes Mar 30, 2017
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 32d1322 to 16c2843 May 4, 2017
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 88c0f11 to e07f1f5 Jul 19, 2019
- General outline
- Describe the node uniqueness constraint
Mats-SX and others added 22 commits Dec 15, 2016
- Use hex integers for rgb examples
- Specify general constraint language
- Specify `UNIQUE` operator
- Clearly define semantics for domain and expressions
- List all concrete constraints in example section
- Add several more examples
They are more appropriate under the Semantics and Syntax sections, respectively.
Move Errors section
- Remove TODO
- Add example using larger pattern
- Add example using multiple `exists()`
- Rename `constrait-expr` to `constraint-predicate`
- Limit scope of `UNIQUE` to single properties only
- Update examples to reflect `PRIMARY KEY`
- Remove erroneous example for composing `NODE KEY` with `UNIQUE` and `exists()`
- Rephrase example section to describe `NODE KEY` more accurately.
- Add missing case for when an error should be raised
Add test for DROP
@Mats-SX Mats-SX force-pushed the Mats-SX:constraints-cip branch from 60c584f to 4ef7b32 Jul 26, 2019
Copy link
Contributor

@hvub hvub left a comment

  1. I personally think risking nonalignment between the constraint CIP and a to-be-written property graph schema CIP seems very undesirable. Hence, acceptance of the constraint CIP should depend on prior acceptance of the property graph schema CIP. Alternatively, we could accept the constraint CIP under the noted understanding that the constraint CIP may be adjusted later for alignment with a property graph schema CIP. It is also not unlike that the property graph schema CIP may even inline the constraint CIP. In any case, we should discuss and decide the strategy around these two related CIPs before making any isolated acceptance decision about the constraint CIP.

  2. The grammar syntax used seems to make inconsistent use of semicolon at the end of a grammar rule. In any case, the strategy how proceed with these two CIP should be discussed and agreed before making any acceptance decision on the constraint CIP alone.

  3. Instead of the note in Sec 2.3, I would add to the syntax definition a syntax rule stating that syntax restrictions to the pattern depending on the constraint predicates as well as and syntax restrictions to the check expression are implementation-specified. I think, it is part of the specification and it should not be buried in the examples section, which is supposed to be purely informative.

  4. An important restriction to the "expression" of a CHECK constraint, which I would also add, is that the evaluation of the expression on any given binding tuple must be deterministic and side effect-free.

  5. In Property Graph Schema we foresee plural versions of all DDL commands. Following that here, I would change add-constraint to "CREATE", constraint-keyword, constraint-declaration, { ",", constraint-declaration } ; and add constraint-keyword = "CONSTRAINT" | "CONSTRAINTS" and constraint-declaration = [ constraint-name ], "FOR", pattern, "REQUIRE", constraint-predicate, { "REQUIRE", constraint-predicate } ;. Additionally, the command would return one return record for each constraint-declaration (cf. Sec. 2.2.6).

  6. UNIQUE and KEY name effectively the type of the constraint predicate. Likewise, I would like the third type of constraint predicate to be named, too, e.g. as CHECK, i.e. I would change constraint-predicate to check | unique | node-key ; and add check = "CHECK", expression ;. Further, I would drop the keyword REQUIRE or make it optional at least.

  7. Rule 2 and Rule 3: It is not precisely clear what "constraint expression" is in all the cases. "constraint expression" does not appear in the grammar neither is it defined anywhere.

  8. Rule 3: It seems more appropriate to me to remove Rule 3 and, instead, add rules of similar effect to the semantics of the individual constraint predicate types where it actually applies. I think, rules with special case exceptions should be avoided. If a rule is not general enough to hold for all cases of a concept, it is not a rule for that concept but apparently a rule of a number of more specialized concepts and should be specified for these specialized concepts.

  9. On

    The new operator UNIQUE is only valid as part of a constraint predicate.

    Doesn't that follow from the grammar already? If the statement adds anything to what the grammar already says, what it is? If statement does not add something, I would remove the statement.

  10. On

    The new operator NODE KEY is only valid as part of a constraint predicate.

    What is that suppose to mean? Doesn't that follow from the grammar already? If the statement adds anything to what the grammar already says, what it is? If statement does not add something, I would remove the statement.

  11. Why is UNIQUE limited to a single property-expression? I think, it can be easily generalized to multiple property-expression. Specifically, if any of them evaluates to null for a given binding tuple, that tuple is not part of the constraint domain. (KEY p.a, p.b would be equivalent to UNIQUE p.a, p.b CHECK exists(p.a) AND exists(p.b))

  12. Why NODE KEY? Why not just KEY? Property Graph Schema foresee keys for relationships as well (and the further outlook shows the possibility of generalized key constraints based on path expressions). Hence, I would change the grammar rule for node-key into key = "KEY", property-expression, { ",", property-expression } ;.

  13. On

    2.2.5. Compositionality

    What does this add to Rule 2 and the grammar? It does not seem to add anything although it is more precisely phrased than Rule 2. I would remove this section and make the phrasing of Rule 2 more precise.

  14. In the return record, are the details implementation-dependent or implementation-specified?

  15. Cardinality, endpoint requirements, label coexistence, property existence, property typing, etc. will be covered by Property Graph Schema language. I would not forbid them here, but also not necessarily advocate them.

@Mats-SX
Copy link
Member Author

@Mats-SX Mats-SX commented Aug 16, 2019

It is a little difficult to manage the different comments in the review if they are all in the same message, instead of comments made on/near the lines that triggered them. That would help discussions to be kept disentangled, and suggestions such as the one in item 5. would be easier to adopt directly. In the absence of this, I will try to address comments in a master-comment following the same numbering scheme:

  1. I know nothing of that CIP and can not comment on it specifically. Non-alignment is of course undesirable, but I would favour progress over perfection (to steal a quote) and accept this CIP until there's a conflict identified, and deal with it at that point.

  2. Fixed.

  3. It is not a syntax rule, it is a disclaimer which intends to note that the Examples are purely informative and only pose examples on possible constraints. If you think that is self explanatory by the fact that these are examples we can scratch the note entirely. I'm not sure adding a note in the syntax rules saying that an implementation may omit support for certain patterns or predicates or combinations thereof would improve this specification; surely any implementer may impose any implementation-specific limitations they would see fit anyhow?

  4. Added a 4th point under Semantics.

  5. I've added a section Plurality, extended the syntax definition and added an example. However, I am not sure comma would be the appropriate symbol to use as a separator; it is not significant enough to stick out between declarations.

  6. I'm not so keen on CHECK as I rather like REQUIRE. In a way one could say that this is the name of the 'standard' constraints. Could make it optional for UNIQUE and KEY although I think it inhibits readability to omit it.

  7. I added an explanatory paragraph in the Syntax section to clarify.


At this point we discussed this CIP in the CLG, mostly focusing on points 1. and 6. above, and decided to pause the work on this CIP awaiting maturity on the schema topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.