Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v5 validation: Clearly document validation principles #55

Closed
handrews opened this issue Sep 17, 2016 · 11 comments
Closed

v5 validation: Clearly document validation principles #55

handrews opened this issue Sep 17, 2016 · 11 comments

Comments

@handrews
Copy link
Contributor

NOTE: This is a request for clarification in v5, and is not a proposal for changed behavior.

The Problem

There are several underlying principles to validation which are currently poorly articulated, or even just implied. Some of the more contentious arguments over feature proposals are due to unclear understanding of these principles. Plainly stating these in the specification will help keep the evolution of JSON Schema focused and reduce feature debate noise.

Terminology: indexing into a schema

You can index into JSON data by a property name or an array index. This can be written in JavaScript access form, e.g. A["foo"], A.foo, or A[0].

Indexing into a schema by a property name or array index number will, within this issue, mean finding the schema that would validate a similarly indexed instance. So if schema X validates instance A, then:

X.foo is the schema that is used to validate A.foo in the course of validating A with X.
X[5] is similarly the schema used to validate A[5]

Note that X.foo will in truth be one of:
X.properties.foo
X.patternProperties.patternThatMatchesFoo
X.additionalProperties # if neither of the above and additionalProperties is a schema
{} # the blank schema, if none of the above and additionalProperties is true

Similarly, X[5] will in truth be one of:
X.items[5] # if items is an array with at least six members
X.additionalItems # if items is an array with less than six members and addtionalItems is a schema
X.items # if items is a schema rather than an array
{} # if none of the above and additionalItems is true

"allOf"/"anyOf"/"oneOf"/"not" involve special considerations, which we will revisit within the principles below. Here are the basics of how indexing applies to them:

if X is an "allOf" with two branches X1 and X2, then:
X.foo is {"allOf": [X1.foo, X2.foo]}

if X is an "anyOf" or "oneOf" with two branches X1 and X2, then X.foo must only take into account the schema(s) that validated A. In the case of "anyOf" that may be both or just one, while in the case of "oneOf" it will always be just one of the branches.

If X2 is the branch of "oneOf" that validates A, then X.foo is X2.foo
If both X1 and X2 validate A in an "anyOf", then X.foo is {"anyOf": [X1.foo, X2.foo]}

if X is a "not" schema {"not": Y}, then there is no meaningful index into X. Depending on the rest of how Y is defined, Y.foo may or may not validate against A.foo, even though Y as a whole is guaranteed to fail validation with A due to the "not".

Known or Suspected Principles

I am totally making these up off the top of my head. They are a starting point: some are missing, and some are probably wrong. Some are defined, and others are more of a request for someone to explain the principle involved.

Context-free validation

Validation of a schema should succeed or fail independent of whether or where it appears within another schema.

A corollary of this is that if instance A validates against schema X, then indexing into both will produce a sub-instance that validates against the sub-schema. Since A.foo validates against X.foo in the context of A and X, it must also validate when pulled out to stand alone.

Notably, if X is {"not": Y}, the impact of this principle is unclear because there is no meaningful X.foo. The overall context of the "not" must be taken into account in order to say anything.

Schemas that cannot possibly validate any instance are considered valid

That this is an underlying principle is clear from reading the spec. However, I have not seen any explanation as to the benefit. Is it intended to facilitate extensibility somehow? Is it to avoid burdening validator implementors with expensive and difficult checks? If it is the latter, is having the validation succeed the only possible solution to this requirement?

One generalized example is section 4.1 of draft 04, which says: "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed."

Why should a schema of {"type": "string", "maximum": 10} which is clearly nonsensical validate cleanly against the string "foo"?

Furthermore, why should a default, or enum values, be allowed that fail validation?

A minimally conforming validator need only validate syntactical/structural constraints

It may ignore all annotation fields, all hypermedia fields, and all semantic validation fields (currently "format" is the only semantic field).

This is important for answering the objection that a new annotation field (for instance) places a burden on validator implementors. Since any minimal validator must already ignore any unrecognized fields in a schema, there is no validator burden for non-validation schema fields.

This principle can be inferred from what is marked required or optional and how each field behaves, but clearly articulating it will avoid some arguments based on observations of other issue discussions.

@awwright
Copy link
Member

Can you phrase this in terms of what's written on master, or even proposed in #50? Because, yeah, there's a lot of problems in draft-04, but a lot of them have also been fixed.

The wiki or website or other literature can serve as the basis for the design of JSON Schema, idk how much we have to write into the I-D itself. (After all, HTTP doesn't re-define REST.)

@handrews
Copy link
Contributor Author

I'll take a look and see if I can come up with something. I mostly just wanted to get this filed before I lost track of the idea. I think it should be fine to include a few brief lines of explanation in the standard just to make it clear that certain things are on purpose. I'm thinking almost bullet points. I agree that digging into the full implications of something like "context-free validation" should just live on the website.

@handrews
Copy link
Contributor Author

Importing @epoberezkin 's list of proposed principles from issue #77 :

  • "independence" of keywords from anything but sibling keywords
  • "orthogonality" - avoiding overlap in keywords purpose (patternGroups kind of violates it...)
  • "applicability" - keywords only apply to the existing data of a certain type (or all types in case of logical/compound keywords), can't apply to multiple types (in this way I would restrict "format" to strings only).
  • "statelessness" - independent of previous validation results ("switch" violates it, I'd rather redefine it to depend on some data value, like JS switch, than on the process).
  • "backward compatibility" - avoiding semantic changes without sufficiently strong reasons.

@awwright
Copy link
Member

awwright commented Oct 11, 2016

I've again contemplated adding a "Principles" section, maybe as an informational, non-normative appendix, as maybe implementors would benefit from discussion of the design decisions. But idk. To paraphrase my previous comment, HTTP's best literature isn't found in the RFCs.

I did end up explaining some of these principles in the draft, check out the release I made: https://github.com/json-schema-org/json-schema-spec/releases/tag/20161011 (this is probably going to be the contents of the imminent draft, unless there's typos or better language someone wants to suggest)

For each of the principles, I want to be sure we're getting a benefit out of it. Independence (what I've also called linearity) doesn't apply to "additionalProperties" because it makes authoring easier.

"applicability" is a trait that reduces the need for "anyOf" - other than this, a lot of people get confused why they need "type": "string" and "minLength": 1. Explaining applicability helps them understand why. (Maybe there's a better name for this?)

"statelessness" I would also call "functional". And a property of functional code is you can parallelize, re-order, cache, and optimize it very easily and completely transparently (well, exempting things like timing attacks).

backward compatibility is just good design, period. But I would also emphasize forward compatibility: how easy is it to introduce new features in the future without breaking existing clients?

@handrews
Copy link
Contributor Author

Another principle (discovered through issue #88 ) is that Hyper-Schema expects correct use of protocols (notably HTTP) and media types, and will not add features that exist only to facilitate incorrect use. We may want to consider features that close gaps or resolve ambiguities in existing standards, but I don't have anything in mind for that right now.

@handrews
Copy link
Contributor Author

handrews commented Oct 12, 2016

Applicability also helps a great deal with "not"::

{
    "type": "object",
    "not": {
        "properties": {"foo": {}}
    }
}

If "properties" implied "object", then this would be an impossible schema because both the outer and inner schema would require the instance to be an object.

This is more compelling if you consider a more complex inner schema that is being $ref'd to the "not". This example is so simple that you could avoid the problem by pushing the "not" into the individual property schema, but for a more complex situation that would not be possible.

handrews added a commit to handrews/json-schema-spec that referenced this issue Nov 15, 2016
This addresses issue json-schema-org#55 plus concerns raised in the comments of
issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover container vs child
and type applicability, both of which flow directly from
keyword independence.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
When we rewrote the array and object keywords to explicitly
classify each keyword as either validating the container
or the child, keyword independence became sufficient to
explain container/child independence.

The list of non-independent keywords has been updated, and
exceptions to the independence of parent and child schemas
have been documented.  Finally, I added a comprehensive example
of the frequently-confusing lack of connection between
type and other keywords.
handrews added a commit to handrews/json-schema-spec that referenced this issue Nov 16, 2016
This addresses issue json-schema-org#55 plus concerns raised in the comments of
issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover container vs child
and type applicability, both of which flow directly from
keyword independence.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
When we rewrote the array and object keywords to explicitly
classify each keyword as either validating the container
or the child, keyword independence became sufficient to
explain container/child independence.

The list of non-independent keywords has been updated, and
exceptions to the independence of parent and child schemas
have been documented.  Finally, I added a comprehensive example
of the frequently-confusing lack of connection between
type and other keywords.
@handrews
Copy link
Contributor Author

PR #143 covers the parts of this that I think are really key. If it goes through I will probably close this (anyone will be welcome to file any additional points separately).

handrews added a commit to handrews/json-schema-spec that referenced this issue Nov 17, 2016
This paritally addresses issue json-schema-org#55 plus concerns raised in
the comments of issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover type applicability.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
I thought I needed this primitive type vs child validation
section even with the rewritten keywords, but going over
it now based on feedback, I agree that it is superfluous.

The list of non-independent keywords has been updated to
include minimum/maximum and their "exclusive" booleans.
handrews added a commit to handrews/json-schema-spec that referenced this issue Nov 17, 2016
This paritally addresses issue json-schema-org#55 plus concerns raised in
the comments of issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover type applicability.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
I thought I needed this primitive type vs child validation
section even with the rewritten keywords, but going over
it now based on feedback, I agree that it is superfluous.

The list of non-independent keywords has been updated to
include minimum/maximum and their "exclusive" booleans.
handrews added a commit to handrews/json-schema-spec that referenced this issue Nov 21, 2016
This paritally addresses issue json-schema-org#55 plus concerns raised in
the comments of issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover type applicability.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
I thought I needed this primitive type vs child validation
section even with the rewritten keywords, but going over
it now based on feedback, I agree that it is superfluous.
@awwright awwright modified the milestones: draft-future, draft-next Nov 30, 2016
@handrews
Copy link
Contributor Author

handrews commented Dec 1, 2016

@awwright @Relequestual this should be in "draft-6 (next draft)" as I have made all of the requested changes to PR #143 and it is just awaiting final approval. Merging that will resolve this issue- the other points raised have been dealt with in other ways or after discussion have been determined to not need action.

@awwright
Copy link
Member

awwright commented Dec 3, 2016

@handrews Perhaps I should pick a new naming scheme for the milestones, I'd like some way to indicate which features are desirable for a new meta-schema publication; so that's not to say this can't make it in very shortly.

@handrews
Copy link
Contributor Author

handrews commented Dec 3, 2016

which features are desirable for a new meta-schema publication

@awwright I don't know what this means or what it has to do with this issue. Could you please elaborate?

handrews added a commit to handrews/json-schema-spec that referenced this issue Dec 5, 2016
This paritally addresses issue json-schema-org#55 plus concerns raised in
the comments of issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover type applicability.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
I thought I needed this primitive type vs child validation
section even with the rewritten keywords, but going over
it now based on feedback, I agree that it is superfluous.
handrews added a commit to handrews/json-schema-spec that referenced this issue Dec 12, 2016
This paritally addresses issue json-schema-org#55 plus concerns raised in
the comments of issue json-schema-org#101.

I replaced "linearity" with "independence" as I think it is
more general and intuitive.

The general considerations section has been reorganized
to start with the behavior of the empty schema, then explain
keyword independence, and finally cover type applicability.

In draft 04, the wording obscured the connection between
keyword independence and container/child independence.
I thought I needed this primitive type vs child validation
section even with the rewritten keywords, but going over
it now based on feedback, I agree that it is superfluous.
handrews added a commit to handrews/json-schema-spec that referenced this issue Dec 13, 2016
These are the leftover bits of Issue json-schema-org#55 and some clarifications
requested in a comment on issue json-schema-org#101 that have not already been
added in some other PR for some other issue.

These specific chagnes were previously approved in json-schema-org#143, but so
many other things have changed since json-schema-org#143 that most of it was
no longer relevant, so I closed it and started these changes over.

In particular, explaining {} and {"not": {}} is no longer needed
as they are covered while introducing "true" and "false" schemas
in the core specification, so that is no longer repeated in this
change.

Likewise, the parent/child validation descriptions have been
modified in several PRs and no longer has the problems that were
previously a concern.
handrews added a commit to handrews/json-schema-spec that referenced this issue Dec 27, 2016
These are the leftover bits of Issue json-schema-org#55 and some clarifications
requested in a comment on issue json-schema-org#101 that have not already been
added in some other PR for some other issue.

These specific chagnes were previously approved in json-schema-org#143, but so
many other things have changed since json-schema-org#143 that most of it was
no longer relevant, so I closed it and started these changes over.

In particular, explaining {} and {"not": {}} is no longer needed
as they are covered while introducing "true" and "false" schemas
in the core specification, so that is no longer repeated in this
change.

Likewise, the parent/child validation descriptions have been
modified in several PRs and no longer has the problems that were
previously a concern.
@handrews
Copy link
Contributor Author

Resolved by merging #195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants