Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract output to its own spec #1429

Merged
merged 16 commits into from
Oct 5, 2023
Merged

Extract output to its own spec #1429

merged 16 commits into from
Oct 5, 2023

Conversation

gregsdennis
Copy link
Member

@gregsdennis gregsdennis commented Aug 28, 2023

This is my proposal for extracting the output into its own specification. I've left some definitions in Core so that satellite specs can reference those and use a common language, but I think this still gives sufficient freedom to those satellite specs and implementations.

Probably renders #1428 unnecessary as it just deletes those changes.

There are still changes I want to make to the output spec itself, but this will do to get started.

It also incorporates aspects of annotation filtering and output unit pruning.

Resolves #1320

@gregsdennis gregsdennis requested a review from a team August 28, 2023 04:51
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked closely at the content. I assume it's basically just extracted from "core" and moved almost as is. My comments are all style issues to maintain consistency throughout the spec and with the other spec documents.


The name of the new spec is odd. Is the "machine" part on purpose? What's that supposed to mean?


Instead of using jsonc to add titles to code blocks, you should use the remark-code-titles syntax I setup exactly for this situation.

```jsonschema "Example Schema"
```json "Failing instance"

I should also update that plugin to style jsonc the same as json in case there's a reason we do need jsonc.


When I converted the old spec, I went through all of our JSON and tried to normalize the code style. I'm sure I missed a lot of them and the changes I'm suggesting are probably things that I missed.


I noticed several places where you have comments. Comments would only show to authors, but these look like the audience is reader. Should these be footnotes (until we come up with a better alternative), or is this maybe a good use for a "note" block using the remark-flexible-contanier plugin?

::: note {title}
{content}
:::

The current code style used in the spec documents is to hard wrap long lines. I know that 's controversial and needs discussion, but until we decide otherwise, we should maintain consistency.


I too am old and learned to type in the time where two spaces after a period was the norm, but they've gone and changed the rules on us since and all the cool kids are using one space now. The old spec had a mix of both styles. I normalized to one space when converting so everything was consistent. We should keep up that consistency.

jsonschema-validation-output-machines.md Outdated Show resolved Hide resolved
jsonschema-validation-output-machines.md Outdated Show resolved Hide resolved
jsonschema-validation-output-machines.md Outdated Show resolved Hide resolved
jsonschema-validation-output-machines.md Outdated Show resolved Hide resolved
@gregsdennis
Copy link
Member Author

I assume it's basically just extracted from "core" and moved almost as is.

Yeah, mostly. The definitions remain in Core, and I've added some optional filtering stuff at the bottom to address some discussions.

The name of the new spec is odd. Is the "machine" part on purpose? What's that supposed to mean?

In discussions (primarily with Henry and @karenetheridge), it became apparent that this output format was intended to be consumed by other systems and applications. That's part of why this is being extracted: it allows @karenetheridge (or anyone else) to also create an output format for human consumption.

Instead of using jsonc to add titles to code blocks...

I use jsonc because the JSON has comments in it. It's not for titles.

I too am old and learned to type in the time where two spaces after a period was the norm, but they've gone and changed the rules on us since and all the cool kids are using one space now.

Maybe we should have some automatic linter. I'm not sure this is something I have the energy to change. Markdown doesn't care.

@jdesrosiers
Copy link
Member

I use jsonc because the JSON has comments in it. It's not for titles.

The comments are used only to label the individual JSON values. Where you have one code block with three JSON values each labeled with comments, I'm suggesting that you make a separate code block for each value and use the comment as the title. I think that would look a lot nicer.

Maybe we should have some automatic linter. I'm not sure this is something I have the energy to change. Markdown doesn't care.

We do have a linter, but it doesn't check for that one. But, I think you meant an auto-formatter. I hate those, but it would be possible to set something up to do that. You're right, the markdown to html conversion collapses it to one space, so no one would see it but us. But, I'd still prefer that we were consistent.

@gregsdennis
Copy link
Member Author

gregsdennis commented Aug 31, 2023

I might set up something locally to "unformat" the text so it's how I like it while I'm editing then reformat before I commit, kinda like how git manages line endings.

Edit: Found a Reflow VSCode extension that works fairly well. It only does the current paragraph, but it works.

@gregsdennis
Copy link
Member Author

@jdesrosiers I've updated the opening comment to include links to the discussions for the additional things this does (filtering).

@gregsdennis gregsdennis requested a review from a team September 4, 2023 21:33
Copy link
Member

@Relequestual Relequestual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to review this PR, but broadly it looks OK.

This PR however does not just extract output to its own spec, but also changes requirements, removing SHOULD in terms of strongly recommending using at least "flag" output.

I didn't comment on #1320, but I feel now like I should have done.

In my mind, I imagined that a "human readable" version of validation output, would be computable from the (now dubbed) "machine readable" version of validation output. This, to me, was one of the main drivers to support and push for it.

My thought process was, implementations would then only need to worry about the machine readable version, and then (they or developers) could use external/additional libraries to provide human readable feedback. Not only would this allow for customization, but also for additional languages, or different algorithms to assess such output structures to find the helpful human readable error.

I looked in the associated Issue and Discussion, but couldn't see anyone suggesting removing the recomendation. (Please direct me to a specific comment if I've missed it.)

I would find it acceptable if, having put back such a recomendation, a clause like the following was added: "Implementations MAY choose not to support any standard output format."

@gregsdennis
Copy link
Member Author

removing SHOULD in terms of strongly recommending using at least "flag" output.

Output as a whole was a recommendation. The flag format was part of that.

Like vocabularies, output as a feature was added as a loose requirement so that we could iterate on it. This change is merely one of those iterations.

Aside from extracting it, the output formats themselves have changed significantly from 2020-12. The requirements are already broken. Moving output into its own spec doesn't change anything.

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of defining the core concepts in the spec and having the actual formats described elsewhere. I didn't realize that's what you were doing here, but I like it. It's a similar concept to Structured Fields that's used as essentially a framework for defining new HTTP headers.

Comment on lines 2107 to 2111
The schema location is the absolute, dereferenced location of the schema object
that produced a result. The value MUST be expressed using the canonical IRI of
the relevant schema resource plus a JSON Pointer fragment that indicates the
schema object that produced the output. It MUST NOT include by-reference
applicators such as `$ref` or `$dynamicRef`.[^14]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying that it MUST NOT include $ref is, I think, a little more restrictive than is intended. It makes sense for the pointer to include a $ref, but only as the terminating value. Since 2019-09, $ref is a normal keyword and it make sense for schema location to point to that keyword as much as any other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a schema at a $ref, only a string (URI).

{
  "type": "array",
  "items": { "$ref": "foo.json" }
}

The schema location indicates the schema object that produced the output. There are only two schema objects here:

  • the root
  • the items subschema

If an evaluation result comes from inside the $ref, then the schema location is foo.json or some subschema contained by it (unless another $ref was followed).

Thus, "schema location" never contains a reference keyword. Really, this is a "by definition" requirement, but it felt helpful to state it explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that "schema location" meant the location in the schema rather than the location of a schema, but reading more carefully, I see I had that wrong. However, if this section is intended to be a set of universal concepts output format authors can use when creating custom output formats, it seems a bit limiting to not provide a concept that allows them to point to a keyword. I won't hold up this PR for that. It can be worked out later if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the concepts currently listed are the ones that are currently used by the (now split) output spec. I don't mind additional things being added, but I think I'd want to see them being used first. You know, see the term in another output spec, and then import it into this spec to standardize it so that other output specs can use the same term.

jsonschema-core.md Outdated Show resolved Hide resolved
jsonschema-core.md Outdated Show resolved Hide resolved
jsonschema-core.md Show resolved Hide resolved
Comment on lines +2142 to +2146
Many keywords are defined to produce annotations, whether intended for
inter-keyword communication (e.g. between `properties` and
`unevaluatedProperties`) or for application consumption (e.g. `title` or
`readOnly`). Annotation values may be of any type and are defined by the
keywords that produced them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to have a link in here somewhere to the section that defines annotations.

Comment on lines 2158 to 2160
Implementations that wish to provide dropped annotations MUST NOT provide them
as their default behavior. Dropped annotations MUST only be included when the
implementations is explicitly configured to do so.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the spec is overstepping here. I think it's perfectly fine to allow an implementation to make this choice themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but I still want to emphasize that this data point is generally intended for debugging purposes, and making it available in a production scenario will likely lead to confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For a separate conversaion, but worth mentioning here.)

Moreover, if we drop annotations as an driving mechanism for evaluation, many of the keywords which now produce annotations won't need to. For example, if additionalProperties just looks at the adjacent properties keyword's keys to know what remains to evaluate, then properties doesn't need to emit its annotation.

That said, I'd probably still track that data internally and treat it as a sort of "internal annotation" in order to avoid duplicate effort like having to re-analyze the patternProperties keys against an instance's keys. But that doesn't mean that it needs to be exposed.

Additionally, I do like having some kind of non-prescriptive mechanism that we can use to describe how keywords interact. I think it's helpful to the reader, even if implementations aren't required to actually do it that way.

I still have to do some research here to find out what annotations are being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the highlighted content, I feel like SHOULD NOT and SHOULD would be enough here. There may be implementations that are specifically designed to do that by default, for example.

@gregsdennis to your comment, I don't think there's any harm in saying explicitly that this sort of behaviour is intended to aid debugging in non-production environments. Of course, that could mean for developer tooling, such a thing is wanted by default.

gregsdennis and others added 10 commits September 11, 2023 10:54
Co-authored-by: Jason Desrosiers <jdesrosi@gmail.com>
Co-authored-by: Jason Desrosiers <jdesrosi@gmail.com>
Co-authored-by: Jason Desrosiers <jdesrosi@gmail.com>
Co-authored-by: Jason Desrosiers <jdesrosi@gmail.com>
@Relequestual
Copy link
Member

removing SHOULD in terms of strongly recommending using at least "flag" output.

Output as a whole was a recommendation. The flag format was part of that.

Like vocabularies, output as a feature was added as a loose requirement so that we could iterate on it. This change is merely one of those iterations.

Aside from extracting it, the output formats themselves have changed significantly from 2020-12. The requirements are already broken. Moving output into its own spec doesn't change anything.

I guess I'm just a little hesitent to make no recomendation for output, but I think I understand the reasoning here.

I'd be less concerned if we aim to create a test suite for output, making it clear that we would like implementations to make use of the output spec.

As far as I can tell, we haven't seen much use of the output spec, and I'd like that to chage. You can't create downstream tools if stream isn't flowing.

jsonschema-core.md Outdated Show resolved Hide resolved
jsonschema-core.md Show resolved Hide resolved
@gregsdennis gregsdennis added this to the stable-release milestone Sep 11, 2023
@gregsdennis
Copy link
Member Author

I think all comments have been addressed either through replies or changes.

@Relequestual @jdesrosiers please re-review.

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last round of changes look great. However, in one last attempt to try to convince you to to use code titles rather than jsonc for the output examples, here's some screenshots illustrating the difference. I think the one with code titles is much nicer. This was exactly the kind of thing we introduced the code title plugin for.

With jsonc
Screenshot from 2023-09-13 12-12-34


With code titles
Screenshot from 2023-09-13 12-12-55

@gregsdennis
Copy link
Member Author

I'm going ahead with merging this. @karenetheridge I'd still like your thoughts, but feel free to express those as a follow-up issue/PR or here if you like.

@gregsdennis gregsdennis merged commit 0d88b62 into main Oct 5, 2023
3 checks passed
@gregsdennis gregsdennis deleted the gregsdennis/output-spec2 branch October 5, 2023 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Extract output to a separate specification
3 participants