-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v5 validation: Clearly document validation principles #55
Comments
Can you phrase this in terms of what's written on The wiki or website or other literature can serve as the basis for the design of JSON Schema, idk how much we have to write into the I-D itself. (After all, HTTP doesn't re-define REST.) |
I'll take a look and see if I can come up with something. I mostly just wanted to get this filed before I lost track of the idea. I think it should be fine to include a few brief lines of explanation in the standard just to make it clear that certain things are on purpose. I'm thinking almost bullet points. I agree that digging into the full implications of something like "context-free validation" should just live on the website. |
Importing @epoberezkin 's list of proposed principles from issue #77 :
|
I've again contemplated adding a "Principles" section, maybe as an informational, non-normative appendix, as maybe implementors would benefit from discussion of the design decisions. But idk. To paraphrase my previous comment, HTTP's best literature isn't found in the RFCs. I did end up explaining some of these principles in the draft, check out the release I made: https://github.com/json-schema-org/json-schema-spec/releases/tag/20161011 (this is probably going to be the contents of the imminent draft, unless there's typos or better language someone wants to suggest) For each of the principles, I want to be sure we're getting a benefit out of it. Independence (what I've also called linearity) doesn't apply to "additionalProperties" because it makes authoring easier. "applicability" is a trait that reduces the need for "anyOf" - other than this, a lot of people get confused why they need "statelessness" I would also call "functional". And a property of functional code is you can parallelize, re-order, cache, and optimize it very easily and completely transparently (well, exempting things like timing attacks). backward compatibility is just good design, period. But I would also emphasize forward compatibility: how easy is it to introduce new features in the future without breaking existing clients? |
Another principle (discovered through issue #88 ) is that Hyper-Schema expects correct use of protocols (notably HTTP) and media types, and will not add features that exist only to facilitate incorrect use. We may want to consider features that close gaps or resolve ambiguities in existing standards, but I don't have anything in mind for that right now. |
Applicability also helps a great deal with "not":: {
"type": "object",
"not": {
"properties": {"foo": {}}
}
} If "properties" implied "object", then this would be an impossible schema because both the outer and inner schema would require the instance to be an object. This is more compelling if you consider a more complex inner schema that is being $ref'd to the "not". This example is so simple that you could avoid the problem by pushing the "not" into the individual property schema, but for a more complex situation that would not be possible. |
This addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover container vs child and type applicability, both of which flow directly from keyword independence. In draft 04, the wording obscured the connection between keyword independence and container/child independence. When we rewrote the array and object keywords to explicitly classify each keyword as either validating the container or the child, keyword independence became sufficient to explain container/child independence. The list of non-independent keywords has been updated, and exceptions to the independence of parent and child schemas have been documented. Finally, I added a comprehensive example of the frequently-confusing lack of connection between type and other keywords.
This addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover container vs child and type applicability, both of which flow directly from keyword independence. In draft 04, the wording obscured the connection between keyword independence and container/child independence. When we rewrote the array and object keywords to explicitly classify each keyword as either validating the container or the child, keyword independence became sufficient to explain container/child independence. The list of non-independent keywords has been updated, and exceptions to the independence of parent and child schemas have been documented. Finally, I added a comprehensive example of the frequently-confusing lack of connection between type and other keywords.
PR #143 covers the parts of this that I think are really key. If it goes through I will probably close this (anyone will be welcome to file any additional points separately). |
This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous. The list of non-independent keywords has been updated to include minimum/maximum and their "exclusive" booleans.
This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous. The list of non-independent keywords has been updated to include minimum/maximum and their "exclusive" booleans.
This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous.
@awwright @Relequestual this should be in "draft-6 (next draft)" as I have made all of the requested changes to PR #143 and it is just awaiting final approval. Merging that will resolve this issue- the other points raised have been dealt with in other ways or after discussion have been determined to not need action. |
@handrews Perhaps I should pick a new naming scheme for the milestones, I'd like some way to indicate which features are desirable for a new meta-schema publication; so that's not to say this can't make it in very shortly. |
@awwright I don't know what this means or what it has to do with this issue. Could you please elaborate? |
This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous.
This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous.
These are the leftover bits of Issue json-schema-org#55 and some clarifications requested in a comment on issue json-schema-org#101 that have not already been added in some other PR for some other issue. These specific chagnes were previously approved in json-schema-org#143, but so many other things have changed since json-schema-org#143 that most of it was no longer relevant, so I closed it and started these changes over. In particular, explaining {} and {"not": {}} is no longer needed as they are covered while introducing "true" and "false" schemas in the core specification, so that is no longer repeated in this change. Likewise, the parent/child validation descriptions have been modified in several PRs and no longer has the problems that were previously a concern.
These are the leftover bits of Issue json-schema-org#55 and some clarifications requested in a comment on issue json-schema-org#101 that have not already been added in some other PR for some other issue. These specific chagnes were previously approved in json-schema-org#143, but so many other things have changed since json-schema-org#143 that most of it was no longer relevant, so I closed it and started these changes over. In particular, explaining {} and {"not": {}} is no longer needed as they are covered while introducing "true" and "false" schemas in the core specification, so that is no longer repeated in this change. Likewise, the parent/child validation descriptions have been modified in several PRs and no longer has the problems that were previously a concern.
Resolved by merging #195 |
NOTE: This is a request for clarification in v5, and is not a proposal for changed behavior.
The Problem
There are several underlying principles to validation which are currently poorly articulated, or even just implied. Some of the more contentious arguments over feature proposals are due to unclear understanding of these principles. Plainly stating these in the specification will help keep the evolution of JSON Schema focused and reduce feature debate noise.
Terminology: indexing into a schema
You can index into JSON data by a property name or an array index. This can be written in JavaScript access form, e.g. A["foo"], A.foo, or A[0].
Indexing into a schema by a property name or array index number will, within this issue, mean finding the schema that would validate a similarly indexed instance. So if schema X validates instance A, then:
X.foo is the schema that is used to validate A.foo in the course of validating A with X.
X[5] is similarly the schema used to validate A[5]
Note that X.foo will in truth be one of:
X.properties.foo
X.patternProperties.patternThatMatchesFoo
X.additionalProperties # if neither of the above and additionalProperties is a schema
{} # the blank schema, if none of the above and additionalProperties is true
Similarly, X[5] will in truth be one of:
X.items[5] # if items is an array with at least six members
X.additionalItems # if items is an array with less than six members and addtionalItems is a schema
X.items # if items is a schema rather than an array
{} # if none of the above and additionalItems is true
"allOf"/"anyOf"/"oneOf"/"not" involve special considerations, which we will revisit within the principles below. Here are the basics of how indexing applies to them:
if X is an "allOf" with two branches X1 and X2, then:
X.foo is {"allOf": [X1.foo, X2.foo]}
if X is an "anyOf" or "oneOf" with two branches X1 and X2, then X.foo must only take into account the schema(s) that validated A. In the case of "anyOf" that may be both or just one, while in the case of "oneOf" it will always be just one of the branches.
If X2 is the branch of "oneOf" that validates A, then X.foo is X2.foo
If both X1 and X2 validate A in an "anyOf", then X.foo is {"anyOf": [X1.foo, X2.foo]}
if X is a "not" schema {"not": Y}, then there is no meaningful index into X. Depending on the rest of how Y is defined, Y.foo may or may not validate against A.foo, even though Y as a whole is guaranteed to fail validation with A due to the "not".
Known or Suspected Principles
I am totally making these up off the top of my head. They are a starting point: some are missing, and some are probably wrong. Some are defined, and others are more of a request for someone to explain the principle involved.
Context-free validation
Validation of a schema should succeed or fail independent of whether or where it appears within another schema.
A corollary of this is that if instance A validates against schema X, then indexing into both will produce a sub-instance that validates against the sub-schema. Since A.foo validates against X.foo in the context of A and X, it must also validate when pulled out to stand alone.
Notably, if X is {"not": Y}, the impact of this principle is unclear because there is no meaningful X.foo. The overall context of the "not" must be taken into account in order to say anything.
Schemas that cannot possibly validate any instance are considered valid
That this is an underlying principle is clear from reading the spec. However, I have not seen any explanation as to the benefit. Is it intended to facilitate extensibility somehow? Is it to avoid burdening validator implementors with expensive and difficult checks? If it is the latter, is having the validation succeed the only possible solution to this requirement?
One generalized example is section 4.1 of draft 04, which says: "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed."
Why should a schema of {"type": "string", "maximum": 10} which is clearly nonsensical validate cleanly against the string "foo"?
Furthermore, why should a default, or enum values, be allowed that fail validation?
A minimally conforming validator need only validate syntactical/structural constraints
It may ignore all annotation fields, all hypermedia fields, and all semantic validation fields (currently "format" is the only semantic field).
This is important for answering the objection that a new annotation field (for instance) places a burden on validator implementors. Since any minimal validator must already ignore any unrecognized fields in a schema, there is no validator burden for non-validation schema fields.
This principle can be inferred from what is marked required or optional and how each field behaves, but clearly articulating it will avoid some arguments based on observations of other issue discussions.
The text was updated successfully, but these errors were encountered: