v5 validation: Clearly document validation principles #55

handrews · 2016-09-17T03:04:33Z

NOTE: This is a request for clarification in v5, and is not a proposal for changed behavior.

The Problem

There are several underlying principles to validation which are currently poorly articulated, or even just implied. Some of the more contentious arguments over feature proposals are due to unclear understanding of these principles. Plainly stating these in the specification will help keep the evolution of JSON Schema focused and reduce feature debate noise.

Terminology: indexing into a schema

You can index into JSON data by a property name or an array index. This can be written in JavaScript access form, e.g. A["foo"], A.foo, or A[0].

Indexing into a schema by a property name or array index number will, within this issue, mean finding the schema that would validate a similarly indexed instance. So if schema X validates instance A, then:

X.foo is the schema that is used to validate A.foo in the course of validating A with X.
X[5] is similarly the schema used to validate A[5]

Note that X.foo will in truth be one of:
X.properties.foo
X.patternProperties.patternThatMatchesFoo
X.additionalProperties # if neither of the above and additionalProperties is a schema
{} # the blank schema, if none of the above and additionalProperties is true

Similarly, X[5] will in truth be one of:
X.items[5] # if items is an array with at least six members
X.additionalItems # if items is an array with less than six members and addtionalItems is a schema
X.items # if items is a schema rather than an array
{} # if none of the above and additionalItems is true

"allOf"/"anyOf"/"oneOf"/"not" involve special considerations, which we will revisit within the principles below. Here are the basics of how indexing applies to them:

if X is an "allOf" with two branches X1 and X2, then:
X.foo is {"allOf": [X1.foo, X2.foo]}

if X is an "anyOf" or "oneOf" with two branches X1 and X2, then X.foo must only take into account the schema(s) that validated A. In the case of "anyOf" that may be both or just one, while in the case of "oneOf" it will always be just one of the branches.

If X2 is the branch of "oneOf" that validates A, then X.foo is X2.foo
If both X1 and X2 validate A in an "anyOf", then X.foo is {"anyOf": [X1.foo, X2.foo]}

if X is a "not" schema {"not": Y}, then there is no meaningful index into X. Depending on the rest of how Y is defined, Y.foo may or may not validate against A.foo, even though Y as a whole is guaranteed to fail validation with A due to the "not".

Known or Suspected Principles

I am totally making these up off the top of my head. They are a starting point: some are missing, and some are probably wrong. Some are defined, and others are more of a request for someone to explain the principle involved.

Context-free validation

Validation of a schema should succeed or fail independent of whether or where it appears within another schema.

A corollary of this is that if instance A validates against schema X, then indexing into both will produce a sub-instance that validates against the sub-schema. Since A.foo validates against X.foo in the context of A and X, it must also validate when pulled out to stand alone.

Notably, if X is {"not": Y}, the impact of this principle is unclear because there is no meaningful X.foo. The overall context of the "not" must be taken into account in order to say anything.

Schemas that cannot possibly validate any instance are considered valid

That this is an underlying principle is clear from reading the spec. However, I have not seen any explanation as to the benefit. Is it intended to facilitate extensibility somehow? Is it to avoid burdening validator implementors with expensive and difficult checks? If it is the latter, is having the validation succeed the only possible solution to this requirement?

One generalized example is section 4.1 of draft 04, which says: "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed."

Why should a schema of {"type": "string", "maximum": 10} which is clearly nonsensical validate cleanly against the string "foo"?

Furthermore, why should a default, or enum values, be allowed that fail validation?

A minimally conforming validator need only validate syntactical/structural constraints

It may ignore all annotation fields, all hypermedia fields, and all semantic validation fields (currently "format" is the only semantic field).

This is important for answering the objection that a new annotation field (for instance) places a burden on validator implementors. Since any minimal validator must already ignore any unrecognized fields in a schema, there is no validator burden for non-validation schema fields.

This principle can be inferred from what is marked required or optional and how each field behaves, but clearly articulating it will avoid some arguments based on observations of other issue discussions.

awwright · 2016-09-17T07:43:43Z

Can you phrase this in terms of what's written on master, or even proposed in #50? Because, yeah, there's a lot of problems in draft-04, but a lot of them have also been fixed.

The wiki or website or other literature can serve as the basis for the design of JSON Schema, idk how much we have to write into the I-D itself. (After all, HTTP doesn't re-define REST.)

handrews · 2016-09-17T09:19:49Z

I'll take a look and see if I can come up with something. I mostly just wanted to get this filed before I lost track of the idea. I think it should be fine to include a few brief lines of explanation in the standard just to make it clear that certain things are on purpose. I'm thinking almost bullet points. I agree that digging into the full implications of something like "context-free validation" should just live on the website.

handrews · 2016-10-11T16:54:09Z

Importing @epoberezkin 's list of proposed principles from issue #77 :

"independence" of keywords from anything but sibling keywords
"orthogonality" - avoiding overlap in keywords purpose (patternGroups kind of violates it...)
"applicability" - keywords only apply to the existing data of a certain type (or all types in case of logical/compound keywords), can't apply to multiple types (in this way I would restrict "format" to strings only).
"statelessness" - independent of previous validation results ("switch" violates it, I'd rather redefine it to depend on some data value, like JS switch, than on the process).
"backward compatibility" - avoiding semantic changes without sufficiently strong reasons.

awwright · 2016-10-11T17:07:58Z

I've again contemplated adding a "Principles" section, maybe as an informational, non-normative appendix, as maybe implementors would benefit from discussion of the design decisions. But idk. To paraphrase my previous comment, HTTP's best literature isn't found in the RFCs.

I did end up explaining some of these principles in the draft, check out the release I made: https://github.com/json-schema-org/json-schema-spec/releases/tag/20161011 (this is probably going to be the contents of the imminent draft, unless there's typos or better language someone wants to suggest)

For each of the principles, I want to be sure we're getting a benefit out of it. Independence (what I've also called linearity) doesn't apply to "additionalProperties" because it makes authoring easier.

"applicability" is a trait that reduces the need for "anyOf" - other than this, a lot of people get confused why they need "type": "string" and "minLength": 1. Explaining applicability helps them understand why. (Maybe there's a better name for this?)

"statelessness" I would also call "functional". And a property of functional code is you can parallelize, re-order, cache, and optimize it very easily and completely transparently (well, exempting things like timing attacks).

backward compatibility is just good design, period. But I would also emphasize forward compatibility: how easy is it to introduce new features in the future without breaking existing clients?

handrews · 2016-10-12T21:15:43Z

Another principle (discovered through issue #88 ) is that Hyper-Schema expects correct use of protocols (notably HTTP) and media types, and will not add features that exist only to facilitate incorrect use. We may want to consider features that close gaps or resolve ambiguities in existing standards, but I don't have anything in mind for that right now.

handrews · 2016-10-12T21:22:21Z

Applicability also helps a great deal with "not"::

{
    "type": "object",
    "not": {
        "properties": {"foo": {}}
    }
}

If "properties" implied "object", then this would be an impossible schema because both the outer and inner schema would require the instance to be an object.

This is more compelling if you consider a more complex inner schema that is being $ref'd to the "not". This example is so simple that you could avoid the problem by pushing the "not" into the individual property schema, but for a more complex situation that would not be possible.

This addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover container vs child and type applicability, both of which flow directly from keyword independence. In draft 04, the wording obscured the connection between keyword independence and container/child independence. When we rewrote the array and object keywords to explicitly classify each keyword as either validating the container or the child, keyword independence became sufficient to explain container/child independence. The list of non-independent keywords has been updated, and exceptions to the independence of parent and child schemas have been documented. Finally, I added a comprehensive example of the frequently-confusing lack of connection between type and other keywords.

handrews · 2016-11-16T19:46:01Z

PR #143 covers the parts of this that I think are really key. If it goes through I will probably close this (anyone will be welcome to file any additional points separately).

This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous. The list of non-independent keywords has been updated to include minimum/maximum and their "exclusive" booleans.

This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous.

handrews · 2016-12-01T21:50:26Z

@awwright @Relequestual this should be in "draft-6 (next draft)" as I have made all of the requested changes to PR #143 and it is just awaiting final approval. Merging that will resolve this issue- the other points raised have been dealt with in other ways or after discussion have been determined to not need action.

awwright · 2016-12-03T19:04:32Z

@handrews Perhaps I should pick a new naming scheme for the milestones, I'd like some way to indicate which features are desirable for a new meta-schema publication; so that's not to say this can't make it in very shortly.

handrews · 2016-12-03T19:16:53Z

which features are desirable for a new meta-schema publication

@awwright I don't know what this means or what it has to do with this issue. Could you please elaborate?

This paritally addresses issue json-schema-org#55 plus concerns raised in the comments of issue json-schema-org#101. I replaced "linearity" with "independence" as I think it is more general and intuitive. The general considerations section has been reorganized to start with the behavior of the empty schema, then explain keyword independence, and finally cover type applicability. In draft 04, the wording obscured the connection between keyword independence and container/child independence. I thought I needed this primitive type vs child validation section even with the rewritten keywords, but going over it now based on feedback, I agree that it is superfluous.

These are the leftover bits of Issue json-schema-org#55 and some clarifications requested in a comment on issue json-schema-org#101 that have not already been added in some other PR for some other issue. These specific chagnes were previously approved in json-schema-org#143, but so many other things have changed since json-schema-org#143 that most of it was no longer relevant, so I closed it and started these changes over. In particular, explaining {} and {"not": {}} is no longer needed as they are covered while introducing "true" and "false" schemas in the core specification, so that is no longer repeated in this change. Likewise, the parent/child validation descriptions have been modified in several PRs and no longer has the problems that were previously a concern.

handrews · 2016-12-27T23:28:23Z

Resolved by merging #195

handrews mentioned this issue Oct 10, 2016

Keywords for exclusive minimum/maximum #77

Closed

awwright added the Type: Enhancement label Oct 11, 2016

awwright added this to the draft-6 milestone Oct 11, 2016

handrews mentioned this issue Oct 12, 2016

Should hyper-schema support non-RESTful HTTP APIs? #88

Closed

handrews mentioned this issue Oct 19, 2016

additionalProperties behavior for non-object instances #103

Closed

handrews mentioned this issue Nov 15, 2016

Allow true and false for all schemas (except maybe the root schema) #101

Closed

handrews mentioned this issue Nov 15, 2016

Add general validation principles and examples. #143

Closed

handrews mentioned this issue Nov 30, 2016

Finish and publish Draft 06 #170

Closed

awwright modified the milestones: draft-future, draft-next Nov 30, 2016

handrews mentioned this issue Dec 13, 2016

Wording clarifications in general considerations. #195

Merged

handrews closed this as completed Dec 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5 validation: Clearly document validation principles #55

v5 validation: Clearly document validation principles #55

handrews commented Sep 17, 2016

awwright commented Sep 17, 2016

handrews commented Sep 17, 2016

handrews commented Oct 11, 2016

awwright commented Oct 11, 2016 •

edited

Loading

handrews commented Oct 12, 2016

handrews commented Oct 12, 2016 •

edited

Loading

handrews commented Nov 16, 2016

handrews commented Dec 1, 2016

awwright commented Dec 3, 2016

handrews commented Dec 3, 2016

handrews commented Dec 27, 2016

v5 validation: Clearly document validation principles #55

v5 validation: Clearly document validation principles #55

Comments

handrews commented Sep 17, 2016

The Problem

Terminology: indexing into a schema

Known or Suspected Principles

Context-free validation

Schemas that cannot possibly validate any instance are considered valid

A minimally conforming validator need only validate syntactical/structural constraints

awwright commented Sep 17, 2016

handrews commented Sep 17, 2016

handrews commented Oct 11, 2016

awwright commented Oct 11, 2016 • edited Loading

handrews commented Oct 12, 2016

handrews commented Oct 12, 2016 • edited Loading

handrews commented Nov 16, 2016

handrews commented Dec 1, 2016

awwright commented Dec 3, 2016

handrews commented Dec 3, 2016

handrews commented Dec 27, 2016

awwright commented Oct 11, 2016 •

edited

Loading

handrews commented Oct 12, 2016 •

edited

Loading