jsonschema: dialect identification #20

ioggstream · 2022-02-17T10:50:40Z

I expect

to establish whether dialect identification should be addressed in this document ✔️
or in json-schema spec, and we just need to add a reference: ❌
in the first case, we need to define a procedure for dialect identification
reference definitions instead of copying them

jdesrosiers · 2022-02-23T01:28:50Z

I think it's very important for dialect identification to be defined here. In fact, I don't think we can reference any specific part of the JSON Schema spec because it's a moving target. The JSON Schema spec isn't one thing. Releasing a new dialect doesn't replace or obsolete the previous one. The old dialects are still valid and have implementations and users. If we point to the JSON Schema spec to define dialect identification, which version of the spec do we point to? Will the next release take precedence? I can't imagine that the IETF would except the definition of a media type that can change without notice some time in the future.

My goal is to standardize a media type that defines the bare minimum necessary to identify which dialect the schema uses and then delegate semantics of the schema to wherever the dialect is defined. Dialects can't decide how dialects are identified because if they did, they might have different rules and it would be unclear which rules to apply. There needs to be one authority and that's where this I-D comes in.

Having the media type and dialect identification officially registered and stable means future JSON Schema releases can just be dialects without redefining the media type with every release. It also means third-party dialects don't need to be updated whenever a new JSON Schema dialect is released because they point to this stable I-D, not the previous JSON Schema release.

I hope that made sense.

ioggstream · 2022-02-23T09:56:53Z

I think it's very important for dialect identification to be defined here
Dialects can't decide how dialects are identified because if they did, they might have different rules

Since dialect identification relies on the media type parameter, it is correct to define dialect identification in this I-D.

I don't think we can reference any specific part of the JSON Schema spec because it's a moving target

We then need to cite in future JSON Schema releases using the $schema keyword, that it's defined
according to the normative parts contained in this I-D.

I'll add this issue to the slide so we can get some editorial feedback regarding the best strategy for doing this.

jdesrosiers · 2022-02-24T01:30:19Z

We then need to cite in future JSON Schema releases using the $schema keyword, that it's defined
according to the normative parts contained in this I-D.

Yep. That's the plan.

handrews · 2022-06-06T22:59:47Z

Jumping back here from the discussion in PR #32, I want to follow up on what I think is a debate between:

@jdesrosiers wanting to specify dialect identification in the sense of where to look ($schema, the schema media type parameter, the enclosing context) and what the URI value means
@awwright wanting to reference the JSON Schema specification, which would involve filling in any gaps currently caused by the specification not explaining how to identify or process previous drafts.

Please let me know if I am mischaracterizing your positions!

Furthermore, @awwright observed:

I know we sometimes say there's "versions" of JSON Schema, but in this context that may be misleading: There's been many publications of JSON Schema over time, but newer publications replace older ones in their entirety (this is specified in the first few paragraphs).

in response to @jdesrosiers's concern that the multiple versions of the JSON Schema Core specification mean that it is not possible to use any JSON Schema Core spec as a stable reference.

I outlined what I thought a stable base specification could look like, but failed to make clear that I do not think that this media type registration needs to wait for all of that stable base to be finalized.

Problems with meta-schema URIs and dialect identification

$schema and analogues such as the schema media type parameter have always been problematic as indicators of which JSON Schema draft a.k.a. version is in use. Since draft-04, $schema was intended to be customized. This has meant that there are two dimensions that can be indicated in the same opaque URI:

which draft/version, which determines the processing rules
which customizations, which determines the keywords being processed

If you are only looking at standardized meta-schema URIs from the JSON Schema specification documents, then you are choosing among version dialects. If you know which processing rules are involved across all possible URIs and are looking at non-standardized URIs, then you are choosing among non-version dialects in the context of a known version.

The problem being that, without a known version, a custom URI obliterates the draft/version information, which is what signifies the processing rules. Assuming vocabularies are used as expected, we should expect a proliferation of custom $schema URIs. The intention was that $vocabulary URIs would be fairly stable, with the core vocabulary URI indicating the processing rules, while $schema URIs would be created as needed for different vocabulary combinations.

This was how I hoped to separate the processing rules from the keyword syntax and semantics. Otherwise I don't see how meta-schema URIs are viable for determining processing rules. Each implementation would have to know each URI in advance and know the processing rules associated with it, which defeats the purpose of being able to assemble a custom ~~vocabulary~~ [EDIT: I meant dialect] for any application.

JSON Schema "fragmentation"

Over in issue #32, in a conversation with @dret and @awwright, @jdesrosiers lamented that "it's unfortunate that JSON Schema has become fragmented". I've been trying to figure out what you meant, and I noticed that @awwright started https://github.com/orgs/json-schema-org/discussions/169 in response with "a more positive outlook."

@jdesrosiers, if you are using the proliferation of meta-schema URIs as a metric for fragmentation, I understand your concern! As noted above, the vocabulary system was intentionally designed with the expectation that meta-schema URIs (and the dialects they represent) would proliferate more-or-less uncontrollably, on a relatively stable base of vocabulary URIs.

I did this because I did not see any way to salvage $schema as it was, although something along the lines of your ideas in json-schema-org/json-schema-spec#918 (basically inlining vocabulary URIs or even keyword declarations under $schema as an object or array) could be viable in a schema document. But not, I think, as a media type parameter.

The upshot of this is I don't think it's sufficient to explain $schema and call that "dialect identification." I mean, it literally is dialect identification, but it doesn't identify the processing model in any reliably useful way. This isn't fragmentation, it's how the system is supposed to work. It's only the core processing that needs to remain unified.

An approach for handling both past and future.

@awwright has advocated for including instructions for processing past drafts in the next iteration of the spec. You could alternatively do at least some of that in this document. What would that look like? Here's a quick proposal that might be missing some things, but you get the idea:

draft-00 through draft-02 do not include any way to identify the processing rules, and I don't think I've ever seen them in the wild.
draft-03 and draft-04 work with $schema and id, and a $ref that replaces the object that contains it (I've rarely seen draft-03 in the wild, and not for several years, but I think it works the same as draft-04 which is common)
there is no draft-05 😜
draft-06 and draft-07 work with $schema, $id (with the same capabilities as id), and a replacing $ref

Each of the above could be enumerated, with their standardized meta-schemas. If you try to use them with custom meta-schemas, unless the implementation specifically recognizes them, you're out of luck.

From that point on, $schema and $vocabulary plus $id (with $anchor split out) and a delegating $ref are the rules. As noted elsewhere, it's quite plausible to consider all of that except the details of $vocabulary finalized. This is what I was trying to get at with the base specification stuff, but I kind of went overboard and obscured the point.

If $vocabulary remains stable enough that it's always possible to identify the core vocabulary URI, then that's sufficient to figure out how to process anything else (such as $dynamicRef, which is necessary for meta-schemas but almost certain to change). This is plausible because further $vocabulary functionality could be offloaded to a vocabulary description file identified by the vocabulary URI, which could be independently self-descriptive. So it's arguably not necessary to nail down the vocabulary system to reach a point of stability for bootstrapping.

But I don't think $schema alone can possibly do it.

jdesrosiers · 2022-06-08T02:24:39Z

That's a lot of good stuff to discuss, @handrews. I don't have time to fully address all of that, but I'll quickly address what I meant by fragmentation. I was referring to OpenAPI and MongoDB defining their own custom versions of JSON Schema. That doesn't include OpenAPI 3.1 which is a dialect of 2020-12, but does include OpenAPI 2.0 and 3.0 that make their own rules. I would prefer that this media type be defined in a way that is inclusive of those rouge versions because OpenAPI 3.0 users, for example, are likely to want to use this media type as well.

ioggstream added the jsonschema label Feb 17, 2022

ioggstream mentioned this issue Feb 17, 2022

Fix: #7. Add JSON Schema media types #19

Merged

ioggstream added a commit that referenced this issue Feb 21, 2022

Don't override external specifications. See #20.

8af3ee1

jdesrosiers linked a pull request Feb 23, 2022 that will close this issue

Don't override external specifications. See #20. #26

Open

darrelmiller linked a pull request Mar 19, 2022 that will close this issue

Don't override external specifications. See #20. #26

Open

ioggstream added the rest-api label Apr 1, 2022

handrews mentioned this issue Jun 6, 2022

Fix: #32. Reference OAS and jsonschema as spec. #43

Open

handrews mentioned this issue Jun 18, 2022

Draft 2019+ tests incorrectly depend on implementations supporting $schema-less schemas but they are not required to process them json-schema-org/JSON-Schema-Test-Suite#311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jsonschema: dialect identification #20

jsonschema: dialect identification #20

ioggstream commented Feb 17, 2022 •

edited

Loading

jdesrosiers commented Feb 23, 2022

ioggstream commented Feb 23, 2022

jdesrosiers commented Feb 24, 2022

handrews commented Jun 6, 2022 •

edited

Loading

jdesrosiers commented Jun 8, 2022

jsonschema: dialect identification #20

jsonschema: dialect identification #20

Comments

ioggstream commented Feb 17, 2022 • edited Loading

I expect

jdesrosiers commented Feb 23, 2022

ioggstream commented Feb 23, 2022

jdesrosiers commented Feb 24, 2022

handrews commented Jun 6, 2022 • edited Loading

Problems with meta-schema URIs and dialect identification

JSON Schema "fragmentation"

An approach for handling both past and future.

jdesrosiers commented Jun 8, 2022

ioggstream commented Feb 17, 2022 •

edited

Loading

handrews commented Jun 6, 2022 •

edited

Loading