JSON Schema Identifiers #382
Replies: 1 comment 5 replies
-
There may be some cleanup we can do in this area, but I don't think it's a bad as you're making it out to be. The spec says that Beyond that, I accept your point that the concept of normalization is open-ended and we don't clearly define what is expected. It could be helpful to include guidance on what types of normalization are expected (case, %-encoding, path-segment) and what isn't (schema, protocol). |
Beta Was this translation helpful? Give feedback.
-
I believe that URI/IRI handling in JSON Schema is underdefined.
First I present examples where implementations behave mutually inconsistently. I am doing this to show that the problem is of practical relevance.
The topic of this post is not "which implementation is correct". I believe that it is either impossible or unnecessarily hard to determine the correct behavior from the JSON Schema specification. I believe that with enough effort more similar examples could be found.
Example 1
Consider this schema:
Consider this instance:
1
.Let us try some implementations (all from here):
The implementations do not behave consistently.
Example 2
Consider this schema:
Consider this instance:
1
.Let us try some implementations (all from here):
These implementations report the instance as valid:
These implementations reject the schema:
The implementations do not behave consistently.
Preliminary Subjective Analysis
The following are the instructions that I would currently give. The goal here is of course to reach a consensus. I hope the imperative wording below does not make this impossible.
Acknowledge that RFC 3986 (URI) and RFC 3987 (IRI) do not define 1 unique equivalence relation.
Acknowledge that RFC 3986 (URI) and RFC 3987 (IRI) do not define 1 unique, canonical normal form.
Acknowledge that RFC 3986 (URI) and RFC 3987 (IRI) define only a very limited set of operations. For example they explicitly define how to resolve relative references but they do not define how to add a fragment to a URI/IRI or how to add a JSON pointer segment to a URI/IRI fragment.
Acknowledge that RFC 3986 (URI) and RFC 3987 (IRI) do not define a model / data structure that is free of percent-hex-hex sequences (URIs/IRIs and percent encoding/escaping is not analogous to JSON and backslash encoding/escaping in this respect). The percent-hex-hex sequences "bleed out".
Acknowledge that RFC 3986 (URI) and RFC 3987 (IRI) potentially contain open ended dependencies / references. For example:
If JSON Schema depended on all scheme-specific rules:
Acknowledge that distrust is a major motivation for validation. Some people will look for weird edge cases to exploit. Others will produce them by accident. No URI is too weird to be forbidden.
Acknowledge that whatever the JSON Schema specification describes is a consumer and a producer of URIs/IRIs. Becoming more permissive and becoming more restrictive can both be breaking changes (analogy: "
Container<Cat>
andContainer<Animal>
are mutually unassignable"). For example widening the set of allowed characters might be a breaking change, just like switching from URIs to IRIs (even though "all URIs are IRIs").Acknowledge that the JSON Schema specification does not assign 1 unique URI/IRI to each schema. The specification might even associate a schema with multiple equivalence classes of URIs/IRIs (the exact details are not clear to me). The meaning of phrases like "[...] the URI of the schema [...]" might not be obvious to everyone.
Acknowledge that RFC 6901 (JSON Pointer) does not directly deal with URI/IRI equivalence.
Acknowledge that the JSON Schema specification is the ultimate authority triggering the application of a JSON pointer.
Here I feel like: "Why not just tell implementations exactly when (and when not) to apply a JSON pointer?". Registering JSON pointers as fragment identifiers for
application/schema+json
is a good idea but this alone might not be enough. It would seem like dereferencing using JSON pointers can conflict with dereferencing relying on (some currently implicit) URI/IRI equivalence relation. Challenges: reconciling the W3C fragid best practices,"$id"
s with empty fragments (when can we assume the media typeapplication/schema+json
?),"$id"
-less root schemas, external "$id" "overriding" (seems very powerful/unrestricted!).In my dreams, part I:
strings
: they must conform to the absolute IRI syntax and they must be dereferenced to one of my schemas when used as value of"$ref"
.»"$id"
in all root schemas. This certainly should be really simple.»Beta Was this translation helpful? Give feedback.
All reactions