Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some paragraphs to Overview to put some context to "vocabulary" #1244

Closed
wants to merge 1 commit into from

Conversation

awwright
Copy link
Member

The Overview is the first section of the document that describes how JSON Schema accomplishes the goals laid out in the abstract and the introduction.

Currently, it jumps to talking about vocabularies and keywords before the definition can be meaningful to readers. This PR adds some context to that, and provides a good primer for the essential mechanics of how schemas are read, at a more "architectural" level.

@karenetheridge
Copy link
Member

Note: before we merge anything to draft-next we need to sync it up with the latest changes to main/master.

@awwright awwright force-pushed the expand-overview branch 4 times, most recently from c98afa8 to b797a57 Compare June 14, 2022 23:41
@jdesrosiers jdesrosiers changed the base branch from draft-next to main July 8, 2022 15:28
@jdesrosiers
Copy link
Member

The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.

  1. Make sure your remote for the json-schema-org/json-schema-spec repo is up-to-date. (Example: git fetch upstream).
  2. Rebase your commits onto main. (Example: git rebase --onto upstream/main abcd123~1 (replace abcd123 with the commit hash of the first commit in your PR)).
  3. Force push the rebased branch to your fork. (Example: git push --force origin my-branch).

Copy link
Member

@Julian Julian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments in case their helpful, I only read the first half or so carefully. Overall seems like a decent idea not to jump straight into vocabularies, but personally I'm not 100% sure we should go into quite as much detail as what's here, and I think there are a few things in here that don't necessarily fit the current model.

jsonschema-core.xml Outdated Show resolved Hide resolved
Given a schema and an instance, the instance is "valid" if all of the keywords in
the schema are valid against the instance.
Validation keywords may be used to describe a set of JSON documents.
Without any validation keywords, the set of all valid instances is the set of all valid JSON documents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people or specifications use the term "JSON document" to refer to non-object typed JSON? If not this is a bit imprecise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different term should be used because JSON doesn't use the term "JSON document."

But "document" has an Internet meaning, so "JSON document" should imply "document with media type application/json"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, older versions of JSON required that documents be an array or an object... this restriction has since been lifted, a document can be simply true or null or a number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, as I say I just don't know that people use the word document to refer to those. Values maybe. But I could be wrong.

Validation keywords may be used to describe a set of JSON documents.
Without any validation keywords, the set of all valid instances is the set of all valid JSON documents.
Typically, adding validation keywords creates a subset.
Keywords are typically defined to not be redundant with other keywords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if redundancy is the right word for this (or whether putting it this early is needed at all). I'd probably not mention this level of detail at this point, it's more philosophical than really an overview, and it's also not immediately clear to me just how true it is, there's lots of keywords that "shrink" the set of valid instances across different types at once, like const or enum, or keywords added in a not schema, or ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing can be played with, but I think it's important to point out that minLength doesn't impact numbers and there's a good, predictable reason for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's again not very important to point out at this point, but if we do so I don't have too much of an issue with the way we've described the concept so far, namely that keywords may apply to a specific type, and if so, generally will then ignore (consider valid) instances of other types.

types—this is what the "type" keyword does.
</t>
<t>
When an instance validates against a schema, "annotation keywords" may provide "annotations":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the above (whether validation is more important), do we call the process of applying annotations "validation"? I think it's just an independent process one can do even without performing validation, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language emphasizes that annotations are a byproduct of validation (an invalid instance doesn't produce annotations). Mostly, I haven't introduced the term "applying" yet, and I don't think "validation" is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying I don't think that that's how we talk about annotations previously. One can "process" an instance, not validate anything, and collect some annotations as far as I know, and we support and encourage that use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production of annotations requires validation because no annotations are returned if validation fails.

Conversely many validation keywords require annotation output from other keywords.

They are inextricably linked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't define a keyword with annotation result without saying it participates in some validation process? I guess I could have answered that by looking at what we say for title or description or default. I stand corrected then, especially since handrews in another comment also said we do indeed call the general process validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your schema has no assertions (and you don't use applicators that convert a passing subschema to a validation failure, e.g. not and oneOf) it can't fail validation, therefore validation is only involved to a trivial degree. And as I noted before, yes, I have used JSON Schema that way (sadly the project is not public).

Copy link
Member

@Julian Julian Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back again I'm honestly confused now by the bottom of this :D -- @handrews (I know you're not fully responsive so feel free to delay here obviously) -- you +1'ed my initial comment, but then Greg contradicted it pointing out it was wrong, and then your last comment seems ambiguous :D so I again am confused as to whether I'm wrong or not, so you'll forgive me for asking it again:

What do we call the process itself of applying the schema {"title" : "Cool Stuff", "default": 12} to an instance "foo" in an implementation with no support whatsoever for the validation vocabulary, it's some completely different use of JSON Schema as it sounds like the project you're referencing may have done. (In the previous sentence the word "applying" is not being used in any technical sense, just the English connotation).

Do we still call that process "validation", just now in a vacuous sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "validation" is currently used, but I like the more generic "evaluation." "Apply" or "application" may be construed as related to "applicator" keywords specifically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with "evaluation." I think "apply" is also fine whether the application is because of an applicator or because of a top-level invocation, but to avoid confusion, "evaluate" is probably best (and I think we settled on evaluationPath in the revised output format?)

Whether or not you can have annotation without validation depends on how you conceptualize it:

  1. "Annotations happen unless validation fails"
  2. "Annotations only happen if validation succeeds"

A schema that does not contain any keywords that can produce a validation failure by definition cannot fail validation. If you think of that as performing validation that succeeds, and only then having annotations, then technically you always have validation.

If you think of validation as a process that only occurs when it's possible to fail (meaning that something like title wouldn't really be performing validation because it can't cause a failure), then a schema with no keywords that can fail isn't doing validation but it does produce annotations.

I conceptualize it as option 1: the schema {"title": "foo"} produces an annotation. Since there are no assertions, and no applicators that can fail validation in the absence of assertions, then there is no validation going on that would drop those annotations.

But you can think of it the other way around and it works just as well.

And as @gregsdennis said, you have some assertions that require** annotations to function correctly, so you could also say that annotation is just as fundamental. Or you could say that if none of your keywords produce annotations, then no annotation is happening, just validation.

Which I think proves that neither should be framed as the primary behavior.

All of this is addressed in a lot more detail in the presentation that I should work on rather than answering all of this stuff, but answering short comments is easier right now.

**there's some wording to allow people to do something different in terms of actual code, but conceptually they require annotations

@awwright
Copy link
Member Author

awwright commented Jul 8, 2022

That's exactly what I'm looking for. In most cases I'm not sure what language would improve on the issues you identified, maybe you can suggest something?

@awwright awwright force-pushed the expand-overview branch 3 times, most recently from 2f8cf26 to 92f3074 Compare July 8, 2022 21:02
@Julian
Copy link
Member

Julian commented Jul 12, 2022

That's exactly what I'm looking for. In most cases I'm not sure what language would improve on the issues you identified, maybe you can suggest something?

Sorry, just saw this one too, and I assume it's in response to one of my comments above, will try to suggest something!

The name of the keyword is used as the property name,
and any arguments to the keyword are provided as the value.
Keywords can be classified by their functions.
The most important classes are assertion, annotation, and applicator keywords.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be important to recognize that not all keyword provide any of these functions (e.g. $id) and some provide more than one (e.g. properties).

types&#x2014;this is what the "type" keyword does.
</t>
<t>
When an instance validates against a schema, "annotation keywords" may provide "annotations":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "validation" is currently used, but I like the more generic "evaluation." "Apply" or "application" may be construed as related to "applicator" keywords specifically.

They may also validate a schema against values within the instance,
for example, a specific property within the object (the "properties" keyword).
The "$ref" keyword is a special applicator keyword that takes a reference to a schema,
instead of a literal schema. This allows schemas to be defined recursively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also allows schemas to be broken into manageable bits and promote code reuse.

Comment on lines 188 to 189
Schemas may also be a boolean value:
The booleans true/false represent a schema that's always valid/invalid, respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this statement deserve its own paragraph? It seems unrelated to this paragraph.

<t>
JSON Schema documents can themselves be described by a schema.
In this context, this "schema of the schema" is called the meta-schema.
In addition to a core vocabulary and a default validation vocabulary,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition to a core vocabulary and a default validation vocabulary,
In addition to a core vocabulary and a set of default vocabularies,

assertions and annotations to more complex JSON data structures, or based on
some sort of condition.
A JSON Schema document is comprised of a set of keywords,
which are specified as properties in a JSON object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically not true for boolean schemas - it's not common, but they are valid as the entire schema document.

the previous version also kind of omits boolean schemas, though the wording doesn't quite exclude them as this does.

A JSON Schema document is comprised of a set of keywords,
which are specified as properties in a JSON object.
The name of the keyword is used as the property name,
and any arguments to the keyword are provided as the value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the the spec uses the term "arguments" like this anywhere else. I'm not sure it's the best word for keyword values - it does make sense to me, but wording that's more consistent with the rest of the spec would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there's a formal term for these otherwise I would put that here... If we did have a term, I think it'd still be helpful to say it's an argument, the same way that functions have arguments. Do you have an idea for a better term?

@awwright
Copy link
Member Author

I trimmed this down a bit. This meshes nicely with #1365 and should go in first, and it sets up for some changes that will likely be made in that issue.

Comment on lines +148 to +151
Schemas may also be used to build a set of JSON documents:
The "valid set" of a schema consists of all instances that the schema accepts,
and the "invalid set" of a schema consists of all instances that the schema rejects.
In a schema without any assertion keywords, the set of all instances is the set of all JSON documents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How necessary is it to bifurcate JSON documents into sets (explicitly)? It seems like this is an understood consequence of a schema passing or failing an instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to take out "invalid set"? It might not warrant explaining, at least not in this section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why the sets are mentioned at all. This statement doesn't seem to add anything beyond a definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overview is a natural place to show how JSON Schema relates formal language theory to JSON. I can try to rework it so it doesn't sound like a definition, but I'd like to emphasize this is a logical consequence of how formal languages work, not a use case that we're meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the intro is the right place, but I question its necessity.

Schemas defining sets of JSON instances isn't a concept that's discussed anywhere else in the document. As such, presenting this concept at all seems confusing.

If it were referenced somewhere else or provided a framework to explain some other idea, I could understand its inclusion, but not on its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a fair point, I'm trying to figure out if there's a better way to convey what this is intending to convey.

I'm not defining a concept that's going to be used by the rest of the document; rather it's describing a feature from formal languages that JSON Schema provides.

I have two alternate ideas I want to try...

Assertions add constraints that instances must conform to.
Given a schema and an instance, the schema "accepts" an instance whenever all the assertions are met,
and the schema "rejects" when any of the assertions fail.
Schemas may also be used to build a set of JSON documents:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"build a set" sounds like you're entering data generation territory, which is not something we've discussed anywhere. (Yeah, I have a library that does that, but it's my design, not really based on any conversation here.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to mean "build" as in "set-builder notation" that JSON Schema closely resembles, e.g. {x | x is an even positive integer}. Is there a better phrasing? I think it's worth it to point out that describing requirements also can be used as a description of a set of members meeting those requirements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe "define" or "denote" instead of "build" might be better?

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me other than a couple editorial suggestions.

Comment on lines +154 to +155
Schemas may also provide "annotations" to instances:
metadata that describes the instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence awkward and I'm not sure this is a correct use of :. It's clear what you're saying, but I had to read it twice and slowly to follow it. I suggest breaking it into two sentences to make it easier for readers.

Suggested change
Schemas may also provide "annotations" to instances:
metadata that describes the instance.
Schemas may also provide "annotations" to instances. Annotations are
metadata that describe the instance.

Assertions add constraints that instances must conform to.
Given a schema and an instance, the schema "accepts" an instance whenever all the assertions are met,
and the schema "rejects" when any of the assertions fail.
Schemas may also be used to build a set of JSON documents:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should end with a . rather than a :.

Suggested change
Schemas may also be used to build a set of JSON documents:
Schemas may also be used to build a set of JSON documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants