Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require $schema in schemas #1434

Merged
merged 12 commits into from
Sep 20, 2023
Merged

require $schema in schemas #1434

merged 12 commits into from
Sep 20, 2023

Conversation

gregsdennis
Copy link
Member

Resolves #1420

@gregsdennis gregsdennis marked this pull request as ready for review September 3, 2023 23:03
@gregsdennis gregsdennis requested a review from a team September 4, 2023 21:30
@gregsdennis gregsdennis added this to the stable-release milestone Sep 4, 2023
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see these this change happening!

Something that's missing is dialect inheritance in embedded schemas. An embedded schema that doesn't declare $schema inherits the dialect of the schema it's embedded in (how ever the parent schema's dialect was determined, not necessarily $schema).

jsonschema-core.md Outdated Show resolved Hide resolved
jsonschema-core.md Outdated Show resolved Hide resolved
…are not required to support media type parameter inputs
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be organized a bit better. There are four rules for determining the dialect, but they're not all in the same place and not presented consistently. The first rule is in a different place than the other three, Those three are in the same place, but the third and fourth are presented differently (as a list) from the second.

I think this could be more clear if the rules for determining the dialect were presented consistently and in one place in a way implementers could follow like an algorithm. I think the third and fourth rules are presented in this way, but it would be nice if the first and second were included as well.

Also, it just occurred to me that this is all defined in the section about the $schema keyword. Since the $schema keyword is only one one part of dialect determination, it seems like it would be better to talk about dialect determination in it's own section that references to $schema keyword as one way the dialect could be determined.

jsonschema-core.md Outdated Show resolved Hide resolved
@gregsdennis
Copy link
Member Author

There are four rules for determining the dialect

This PR is a realization of the discussion we had in #1420. In that discussion, we identified three prioritized steps:

  1. The $schema keyword
  2. The schema media type parameter (if the schema was retrieved over HTTP) (optional)
  3. The user provides a default through an implementation's API in some way. (optional)

What is this fourth rule?

@jdesrosiers
Copy link
Member

The fourth rule is an embedded schema inheriting from its parent context. I think it belongs after the media-type parameter and before the user provided default.

I had forgotten about that one when I wrote that issue, but it's definitely part of the process. If you'd rather address that in a separate PR, I'm ok with that.

@gregsdennis

This comment was marked as outdated.

Copy link
Member

@Relequestual Relequestual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait to see the tests for this get merged! =D

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left a couple notes, but either could be address separately if we want.

Comment on lines 899 to 901
3. Parent dialect - An embedded schema resource which does not itself contain a
`$schema` keyword MUST be processed using the same dialect as the schema
which contains it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this concept also applies to schemas embedded in other types of documents like OpenAPI. I'm not sure how this might be reworded to include that case, or if it should be considered a different case, or if that case should be addressed elsewhere entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say something like:

If the schema is embedded in a larger document (such as OpenAPI), the semantics for determining the dialect may be specified by that document's specification. Unless explicitly dictated by that document's specification, the presence of a $schema keyword shall always override the dialect imposed by the containing document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that @karenetheridge. I'll try to work some of that in.

Unless explicitly dictated by that document's specification, the presence of a $schema keyword shall always override the dialect imposed by the containing document.

I think this may be redundant given this is a prioritized list and the $schema requirement is higher on the list.

Unless you're saying the containing document's spec can say that $schema should be ignored. That seems like we're saying it's okay for a consuming specification to override this rule. I'd prefer such specs simply disallow using $schema so that there's no confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently, the OpenAPI spec allows you to define a default dialect.

@karenetheridge What's the use case you see for allowing a containing document specification to override $schema for it's contained schemas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the other way around - the openapi spec has language about following jsonSchemaDialect to set the dialect for all contained schemas, but if one of those schemas has a $schema keyword in it, that overrides the jsonSchemaDialect. (Unless OpenAPI wanted to add a rule saying that $schema is no good here.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so that falls in line with the precedence order here. $schema first, then media type (if it exists), then parent document, then config.

I think Ben was wondering about the edge case where a document wanted to override $schema. I'm not sure we can define requirements for that.

jsonschema-core.md Show resolved Hide resolved
@jdesrosiers
Copy link
Member

Can't wait to see the tests for this get merged! =D

I'm not sure much, if any, of this testable with our current setup.

Copy link
Member

@karenetheridge karenetheridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's been discussed elsewhere but there is still a problem with allowing $schema keywords to appear in non-root locations in the document -- because we also require that the schema must validate against its metaschema, and this requirement can be violated if a contained sub-schema uses a different dialect than the document root that conflicts in some way.

Are we expecting that implementations need to have a special evaluation mode when validating schemas, and somehow switch to a different metaschema when $schema keywords are encountered? We don't spell that out in the spec in any way, and instead assume that when a schema is being evaluated against its metaschema, we treat it just like any other data instance.

Comment on lines +894 to +898
2. `application/schema+json` media type with a `schema` parameter -
Implementations which support media type parameter inputs MUST process the
schema according to the dialect the parameter declares. A media type will
generally only be available if the schema has been retrieved from an external
source and only applies to the document root.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go stronger here and say that if the schema is fetched over HTTP, it MUST respect the media type parameter if it is present, and add a reference to the section that defines the name and syntax of this header. This may be implied by the existing wording, but we can spell it out more explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not what it's already saying? 895 contains the "MUST" requirement. I don't want to call out HTTP; I'd prefer to stay more general and say that if the media type parameter is available in general. It could come from anywhere, e.g. the OS or app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was confused what "Implementations which support media type parameter inputs" meant. Does that cover "if you used HTTP, you better check the Content-Type header and do what it says"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it covers any scenario, including http, where you might receive a media type. So, yeah.

Comment on lines 899 to 901
3. Parent dialect - An embedded schema resource which does not itself contain a
`$schema` keyword MUST be processed using the same dialect as the schema
which contains it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can say something like:

If the schema is embedded in a larger document (such as OpenAPI), the semantics for determining the dialect may be specified by that document's specification. Unless explicitly dictated by that document's specification, the presence of a $schema keyword shall always override the dialect imposed by the containing document.

jsonschema-core.md Outdated Show resolved Hide resolved
@Relequestual
Copy link
Member

Are we expecting that implementations need to have a special evaluation mode when validating schemas, and somehow switch to a different metaschema when $schema keywords are encountered?

Yup.

We don't spell that out in the spec in any way, and instead assume that when a schema is being evaluated against its metaschema, we treat it just like any other data instance.

First, we have to look at the definition of a Compound Schema Document...

A Compound Schema Document is defined as a JSON document (sometimes called a "bundled" schema) which has multiple embedded JSON Schema Resources bundled into the same document to ease transportation. - Core 9.3

I would argue that this defines any OpenAPI definition as a compound schema document.
While the examples in the apendix section is a JSON Schema, it doesn't have to be.

Then, the validation section...

Given that a Compound Schema Document may have embedded resources which identify as using different dialects, these documents SHOULD NOT be validated by applying a meta-schema to the Compound Schema Document as an instance. It is RECOMMENDED that an alternate validation process be provided in order to validate Schema Documents. Each Schema Resource SHOULD be separately validated against its associated meta-schema. - Core 9.3.3

Given this, we might want to tighten up the language of meta-schema useage for validation to make the target a "schema resource" to make this clearer? I don't know. Open to thoughts.

@gregsdennis
Copy link
Member Author

Are we expecting that implementations need to have a special evaluation mode when validating schemas, and somehow switch to a different metaschema when $schema keywords are encountered? - @karenetheridge

Yup. - @Relequestual

I'm with Karen. I think a meta-scheam evaluating a schema should be no different than a schema validating an instance. I don't think it's too hard to achieve this.

Still, let's work that out separately from this PR, please.

I don't know if it's been discussed elsewhere but there is still a problem with allowing $schema keywords to appear in non-root locations in the document -- because we also require that the schema must validate against its metaschema, and this requirement can be violated if a contained sub-schema uses a different dialect than the document root that conflicts in some way. - @karenetheridge

I would argue that this defines any OpenAPI definition as a compound schema document. - @Relequestual

I don't think the comment was about OpenAPI, but rather any time a $schema is used in a subschema resource, which is allowed currently.

If the parent schema declares 2020-12 (which disallows array items, and it contains a schema resource that declares draft 7 which also contains an array items (valid in draft 7), then a pure meta-schema evaluation of the root schema would fail.

I don't know that the Compound Document definition applies to the above scenario like it does for bundling. I think it could be argued both ways.

I agree that there's no real way to handle this right now. Let's take another PR for that and consider this PR as a mere iteration.

These two topics combined intersect somewhat with this discussion about $schema in instances.

@Relequestual
Copy link
Member

I don't think the comment was about OpenAPI, but rather any time a $schema is used in a subschema resource, which is allowed currently.

If the schema is embedded in a larger document (such as OpenAPI)... - One of @karenetheridge comments.

However, I agree, it applies more broadly.

Still, let's work that out separately from this PR, please.

I agree. As a reference, we discussed this at quite some length already: #936

@karenetheridge
Copy link
Member

I agree. As a reference, we discussed this at quite some length already: #936

That issue is talking about how to evaluate with a schema when there are embedded $schema keywords that change the dialect midway through. The problem I raised is how to evaluate that schema as a data instance against its metaschema -- because the metaschema you're evaluating with is changing midway through the evaluation, but the normal evaluation process doesn't know anything about switching schemas depending on what it finds in the data.

@gregsdennis
Copy link
Member Author

@karenetheridge I get what you're saying. I'll create an issue for it for us to discuss.

@gregsdennis gregsdennis merged commit 1db4168 into main Sep 20, 2023
3 checks passed
@gregsdennis gregsdennis deleted the gregsdennis/require-$schema branch September 20, 2023 06:13
@gregsdennis gregsdennis self-assigned this Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Schema Update: Require dialect to be declared and understood
4 participants