Blog entry about defects found in public JSON schemas. #40

zx80 · 2023-09-06T12:02:28Z

This add a 3 minites blog entry, a cover image from Unsplash and two small avatars.

gregsdennis

Every one of your examples are illustrations of typographical errors, specifically, keywords that are ineffectual because a closing brace was misplaced. This can hardly be attributed to the spec. This happens when writing JSON in general. It happens even more with YAML.

The analysis presented in this paper makes a false assumption about JSON Schema: that it's intended for data modelling. As such many of its conclusions are incorrect.

JSON Schema is a collection of constraints. Keywords are independent because it allows them to be combined however the user needs. If they want schemas that represent data modelling, that's possible, but then they need to understand how JSON Schema works in order to include the proper constraints that model that data.

Having a multitude of keywords enables users to isolate the behavior they want. Moreover, the vocabulary system allows them to create their own keywords and dialects in order to make JSON Schema into whatever they need. JSON Schema's flexibility allows people to have control over what they want to validate.

There is nothing inherent about JSON Schema that is causing authors to write bad schemas. Developers write bad code all of the time. C++ isn't inherently flawed because developers write code that manages memory poorly. That's just bad code. C++ simply offers more control over memory management. Sometimes you need that level of control.

The spec isn't at fault. What's lacking is proper tooling (and perhaps documentation) to help guide schema authors toward writing better schemas.

gregsdennis · 2023-09-07T21:25:39Z

pages/posts/analysis-of-json-schema-defects.md

+    photo: /img/avatars/claire.jpg
+    link: https://www.linkedin.com/in/claire-medrala/
+    byline: Research Engineer
+excerpt: Evidences show that schemas are hard to write, and suggest changes in the spec


I absolutely object to putting (or hinting at) third party recommendations for spec changes in our blog.

This is the JSON Schema blog. It is a place for us to show off what it can do, not a forum for discussion about change or shortcomings. The appropriate place for that is issues and discussions.

I absolutely object to putting (or hinting at) third party recommendations for spec changes in our blog.

Ok. The blog is only for your recommendations and discussions. Fine, it is your blog after all. As an academic, we are more used to open discussions and disagreements.

This is the JSON Schema blog. It is a place for us to show off what it can do, not a forum for discussion about change or shortcomings. The appropriate place for that is issues and discussions.

Ok.

gregsdennis · 2023-09-07T21:27:04Z

pages/posts/analysis-of-json-schema-defects.md

+These findings suggest key changes in JSON Schema specification which would block most
+of encountered defects.


I disagree with the conclusion that because people aren't using JSON Schema correctly (in many cases, they're typographical errors) that JSON Schema is at fault and needs to change.

zx80 · 2023-09-08T15:09:00Z

Every one of your examples are illustrations of typographical errors, specifically, keywords that are ineffectual because a closing brace was misplaced. This can hardly be attributed to the spec.

Yes and no. Misplacing a keyword leads the system to silently ignore the ineffectual keyword. It is a choice of the language semantics. Different choices could lead to your schema is invalid in many (but not all) cases.

The analysis presented in this paper makes a false assumption about JSON Schema: that it's intended for data modelling. As such many of its conclusions are incorrect.

I'm unclear on where you see this assumption. Our study does not assume a particular use case, whether data modelling or something else, because we do not really have any the relevant information regarding this!

We just look at existing schemata, without knowing why/for what they were developed, and look for factual errors.

JSON Schema is a collection of constraints. Keywords are independent because it allows them to be combined however the user needs. If they want schemas that represent data modelling, that's possible, but then they need to understand how JSON Schema works in order to include the proper constraints that model that data.

A lot of the allowed combinations do not make much sense. We did not found significant cases where it was an requirement to have such a freedom.

Having a multitude of keywords enables users to isolate the behavior they want. Moreover, the vocabulary system allows them to create their own keywords and dialects in order to make JSON Schema into whatever they need. JSON Schema's flexibility allows people to have control over what they want to validate.

{
  "type": "object",
  "minLength": 10,
  "pattern": "^[0-9]*[a-z]*$",
  "maxItems": 42,
  "minSize": 17
}

Why allowing the above non sense?

There is nothing inherent about JSON Schema that is causing authors to write bad schemas.

Trivial errors are silently ignored because of the chosen semantics, so the user is likely never
to find out. This does not cause bad schemas, but it helps.

Developers write bad code all of the time. C++ isn't inherently flawed because developers write code that manages memory poorly. That's just bad code. C++ simply offers more control over memory management. Sometimes you need that level of control.

A lot of errors are filtered out by a C++ Compiler, because of type checks, mandatory declarations, and so on.

The spec isn't at fault.

The evidence we gathered demonstrates that (1) people get it wrong quite often (>60%) and (2) some spec changes would improve this situation (we tested our proposals). AFAICS both of these points are facts.

I understand that the spec will be broken again on the next release, so you seem to also believe that it can be improved and that it is worth breaking compatibility. At last a point of agreement!

What's lacking is proper tooling (and perhaps documentation) to help guide schema authors toward writing better schemas.

There are hundreds of existing tools, but the right one is still missing?

There is a suggestion that a linter would help. Sure, we implicitely wrote one to detect the various errors reported in the paper. Now, if a linter somehow restrict the language by filtering issues, then why not try to put at least some of these restrictions in the language itself, so that all conformant tool would check them?

gregsdennis · 2023-09-08T20:44:39Z

I'm unclear on where you see this assumption. Our study does not assume a particular use case, whether data modelling or something else, because we do not really have any the relevant information regarding this!

The assumption is present in the "invalid" cases you present. You're assuming that a schema has to align with data patterns in programming languages. You're assuming validation (which arguably is the primary purpose of JSON Schema). But there are many use cases, most of which we still don't know about.

For example, code generation. Many languages support union types. If I want to generate a union type, I might combine keywords that don't otherwise make sense.

Why allowing the above non sense?

It may be nonsense to us, but we can't guarantee that some user actually has a purpose for something like that.

JSON Schema is intentionally permissive in order to account for as many use cases as possible. Yes, many schemas appear to serve no purpose or contain ineffectual keywords, however it's impossible for us to rule out the possibility that some user has a real use for such a schema.

This is where a linter comes in. A linter will warn the user that a specific construct doesn't generally make sense, but the user still has the option to ignore the warning and do it anyway. If JSON Schema disallows such things, then the user no longer has that choice, and we've prevented them from doing what they want to do.

The point is to allow users to find new use cases without restriction. The solution to helping these users that you found is targeted tooling. Yes, some such tooling already exists, and we've partially built some. However what's there is not well-integrated into the common IDEs and editors, so they're not typically used.

There are hundreds of existing tools, but the right one is still missing?

Yes. The vast majority of the "hundreds of existing tools" are validators, and many of them don't support even the latest version of the spec, which is almost three years old.

Beyond validators, there are various generators and a few other targeted/single-purpose tools. Few of them are editors, and we are actively working with those to help them improve there offerings.

gregsdennis · 2023-09-08T20:57:29Z

I also think the data sources have something to be desired.

Ref is absolutely a poor choice as it's a test suite. The schemas it contains are specifically crafted to verify that implementations meet the requirements of the various specs, and they're definitely NOT examples of real-world use.
Store - I don't have direct experience with and can't speak to.
ODS - I wouldn't trust generated schemas. I say as much in my documentation on my own generation library. Generation is a tool to get you started. It shouldn't go directly to production.
JSC - This is actually a good source of real-world usage, but it's limited to public repos. Many higher-quality schemas are used by enterprises and are proprietary or otherwise private. It's understandable that you don't have access to these.
Misc - I believe many publicly-accessible systems like Kubernetes and AWS are stuck using old versions, which means that users can't use newer features. This will skew your results (e.g. your conclusion of "people don't use $dynamic* so those keywords should be removed").

I'm surprised there's no mention of OpenAPI or AsyncAPI, arguably the largest usage points of JSON Schema. I wouldn't be surprised if more people used JSON Schema indirectly through one of these specs than they do directly.

There's still a lot of good work done with this study, and it would be useful for creating linting tools. I just don't agree with some of the conclusions.

But the biggest thing for me, though, is that I can't back putting third party spec change recommendations and advertising potential competitors or alternative proposals in our blog.

benjagm · 2023-09-13T14:12:51Z

This add a 3 minutes blog entry, a cover image from Unsplash and two small avatars.

We'd like to thank you for contributing with this blog post proposal. We recognize the big effort behind the study backing the blog, and we are sure we can extract great insights from it, however this content differs from the Community driven content we expected. We'd like to learn from your work and be able to discuss about your conclusions, but most important make sure we serve the JSON Schema Community the best way.

This is why we'd like to invite you to move the discussion to this Community discussion and continue there a constructive conversation to take the most from this opportunity.

We'd like to acknowledge once again for this contribution. This situation inspired the community to work on publishing the blog guidelines and make this experience better in the future.

Please @zx80, join us in this discussion.

karenetheridge · 2023-10-08T19:09:10Z

pages/posts/analysis-of-json-schema-defects.md

+Users have a hard time remembering the 60 keywords and writing schemas.
+We think that this can be significantly improved with limited changes to
+the spec.
+


They could also be found with a linter mode, which has been proposed here - https://github.com/orgs/json-schema-org/discussions/323 and json-schema-org/json-schema-spec#1079

Thanks for the pointers.

benjagm · 2023-10-11T09:15:54Z

@zx80 Do you mind sending this PR to the new website repository? I am asking this because we just launched a new version of the JSON Schema website and now blog and website are in that same repository. As a consequence, this repository is going to be archived. Thanks a lot.

zx80 · 2023-10-12T07:19:04Z

@zx80 Do you mind sending this PR to the new website repository? I am asking this because we just launched a new version of the JSON Schema website and now blog and website are in that same repository. As a consequence, this repository is going to be archived. Thanks a lot.

Done as this PR.

add a small blog

ea20da5

gregsdennis suggested changes Sep 7, 2023

View reviewed changes

benjagm mentioned this pull request Sep 13, 2023

Open Community Working Meeting 2023-09-11 - 14:00 PT json-schema-org/community#474

Closed

4 tasks

Fabien Coelho added 8 commits September 26, 2023 20:42

expand, add some context

39cef2c

add caveats and other expansions

64377d6

proofreading

8d07bca

proofreading

b8a47bd

add reference to added example

020e39c

more proof reading

a80eced

add link and further caveats

489ebfd

typo--

3e4a48a

karenetheridge reviewed Oct 8, 2023

View reviewed changes

zx80 closed this Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog entry about defects found in public JSON schemas. #40

Blog entry about defects found in public JSON schemas. #40

zx80 commented Sep 6, 2023

gregsdennis left a comment

gregsdennis Sep 7, 2023

zx80 Sep 8, 2023

gregsdennis Sep 7, 2023

zx80 commented Sep 8, 2023

gregsdennis commented Sep 8, 2023

gregsdennis commented Sep 8, 2023 •

edited

Loading

benjagm commented Sep 13, 2023 •

edited

Loading

karenetheridge Oct 8, 2023

zx80 Oct 11, 2023

benjagm commented Oct 11, 2023

zx80 commented Oct 12, 2023

		These findings suggest key changes in JSON Schema specification which would block most
		of encountered defects.

Blog entry about defects found in public JSON schemas. #40

Blog entry about defects found in public JSON schemas. #40

Conversation

zx80 commented Sep 6, 2023

gregsdennis left a comment

Choose a reason for hiding this comment

gregsdennis Sep 7, 2023

Choose a reason for hiding this comment

zx80 Sep 8, 2023

Choose a reason for hiding this comment

gregsdennis Sep 7, 2023

Choose a reason for hiding this comment

zx80 commented Sep 8, 2023

gregsdennis commented Sep 8, 2023

gregsdennis commented Sep 8, 2023 • edited Loading

benjagm commented Sep 13, 2023 • edited Loading

karenetheridge Oct 8, 2023

Choose a reason for hiding this comment

zx80 Oct 11, 2023

Choose a reason for hiding this comment

benjagm commented Oct 11, 2023

zx80 commented Oct 12, 2023

gregsdennis commented Sep 8, 2023 •

edited

Loading

benjagm commented Sep 13, 2023 •

edited

Loading