Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: JSON Schema #4666

Closed
samuelcolvin opened this issue Oct 27, 2022 · 14 comments · Fixed by #5029
Closed

V2: JSON Schema #4666

samuelcolvin opened this issue Oct 27, 2022 · 14 comments · Fixed by #5029
Assignees
Labels
feature request help wanted Pull Request welcome schema Related to JSON Schema

Comments

@samuelcolvin
Copy link
Member

I think we should try to target a specific version of JSON Schema, I guess ideally the latest version, e.g. currently 2020-12.

See this thread on twitter for some pointers.

I've heard there are a set of test cases somewhere, would be good if we could test against all or some of them.

I know many people want OpenAPI support too. Maybe we have to build a simple converter or switch to work with the latest version of OpenAPI which I understand is very behind JSON Schema? Maybe a convert already exists?

In terms of how to implement this, might rough suggestion (once #4516 is merged) would be:

  • delete everything in pydantic/schema.py and start again
  • use the pydantic-core schema which is already being generated and saved on models as __pydantic_validation_schema__ and process that recursively building a JSON Schema
  • extra details required by JSON Schema, but not used in pydantic-core should be saved in the 'extra' key which is already available in pydantic-core schema

You'll need to build something to respect the __modify_schema__ method which existed in V2. We should rename this to perhaps __pydantic_modify_json_schema__, because:

  • we're trying to be good (or good-ish) and note add dunder attributes/methods which don't start with __pydantic
  • we need it to have a name that avoids confusion with pydantic-core schema

@tiangolo I know you originally implemented JSON Schema in pydantic when you were first working on fastapi 🙏, would you be able to implement it again? If not, please unassign yourself and hopefully someone else will take it on.

I've had some interactions with @Relequestual about JSON Schema, maybe he would be able to help too ❓ 😄 there's also a JSON Schema slack org which might be able to help.

@samuelcolvin samuelcolvin added feature request help wanted Pull Request welcome schema Related to JSON Schema labels Oct 27, 2022
@samuelcolvin samuelcolvin changed the title JSON Schema for V2 V2: JSON Schema Oct 27, 2022
@samuelcolvin
Copy link
Member Author

Need to think about #1270 while working on this, it has a lot of interest.

@samuelcolvin
Copy link
Member Author

As dicussed with @Relequestual, the plan is to target JSON Schema 2020-12 which is compliant with OpenAPI 3.1.

FastAPI uses redoc or swagger-ui to generate API docs, so it would be good if they both supported OpenAPI 3.1:

Regarding testing, we should try to leverage the JSON Schema test suit, see https://github.com/json-schema-org/JSON-Schema-Test-Suite.

The Slack channel for JSON Schema should be helpful when planning/work on this issue and I'm sure some of them would be willing to review a PR, Ben mentioned that @Julian might be able to help us a little?

Of course it would be amazing if someone from the JSON Schema community could take a stab at a PR ... but maybe that's too much to ask? 🤞

@Julian
Copy link

Julian commented Nov 1, 2022

Hi hi! Indeed happy to hear a bit more about what I can help with -- I'm graciously paid full-time to work on JSON Schema-related things, and whilst there's an everlasting backlog of things I'm already engaged with, if there are things I can help easily with (especially vis-a-vis the test suite) I'm more than happy to. I also honestly have never looked at Pydantic's JSON Schema support so I have a bunch of caatching up to do, but separately my goals with the python-jsonschema org are to create JSON Schema ecosystem tooling for Python beyond those just for my own jsonschema implementation. That may not be directly relevant to Pydantic's v2 goals but may as well mention it in case there become any tools which are "obviously" beneficial in a wider sense. But yeah should I start by just reading this issue a bit more carefully? Apologies for not making the call this morning I have literally no good excuse other than "my alarm didn't go off" :(

@samuelcolvin
Copy link
Member Author

Apologies for not making the call this morning I have literally no good excuse other than "my alarm didn't go off" :(

No worries, I was very late, with equally little excuse.

I also honestly have never looked at Pydantic's JSON Schema support so I have a bunch of caatching up to do

Not really, we basically need to delete what's there now and re-implement the whole thing to translate the pydantic-core schema generated for a model/other validation target into JSON Schema.

See #4516 for the switch to pydantic-core, and https://github.com/pydantic/pydantic-core/blob/main/pydantic_core/core_schema.py is probably the best explanation of what pydantic-core schema looks like, the pydantic-core trays might help to.


After that, basically someone needs to have a crack at implementing it, and others can review.

If no one else is willing, I'll get around to it as soon as I can.

In the meantime, any pointers or suggestions from JSON Schema experts would be much appreciated.

@ulimanaio
Copy link

Hi, I am not sure if this is helpful but https://schema.org/Recipe offers a lot of information and is also used by Bring! (https://sites.google.com/getbring.com/bring-import-dev-guide/web-to-app-integration).

@Kludex
Copy link
Member

Kludex commented Nov 30, 2022

On the twitter thread mentioned in the description, I suggested focusing on the JSON version that OpenAPI 3.1.0 complies (the latest JSON version). But...

There's one problem with that, which you've already pointed out: Swagger doesn't support OpenAPI 3.1.0 yet. But then, that's a very important issue for FastAPI. If Swagger doesn't support it, FastAPI will not be able to bump it as well.

So... Maybe we are blocked here because of that? And if so... Should we change our strategy?

It looks like OpenAPI 3.0.2 (the one FastAPI uses) is a subset of Draft Wright JSON Schema.

I personally don't think we should be blocked by Swagger. For me, it's much more interesting to have a valid OpenAPI schema, which I can use already with ReadMe, ReDoc, to generate HTTP clients, and to use with tools like schemathesis.

In any case, I'd be happy to help here, but I think @tiangolo wants to do this himself (I'll talk to him).

@samuelcolvin
Copy link
Member Author

Thanks @Kludex. My understanding from talking this through with @Relequestual was that we can formally target 3.1.0, but we might have to not use all the new features/types.

We might just have to call it 3.0.2 or similar to satisfy Swagger?

I suspect we won't really know what works and what doesn't until we get into the work and try it out with Swagger.

@tiangolo
Copy link
Member

Thanks @Kludex! So, Pydantic v1 actually had JSON Schema Draft-07, that was the latest at the moment I added it. Then it is included as-is in OpenAPI, and the outputted version in FastAPI is 3.0.2, even though it's slightly incorrect, but that allows Swagger UI to interpret and show it. If I remember correctly, the main case where it doesn't work is with exclusiveMinimum and exclusiveMaximum.

I think Pydantic v2 could target the latest JSON Schema, and then it could be included in FastAPI as-is, still saying "OpenAPI 3.0.2" just to convince Swagger UI to render it. I think that would be an okay balance.

Otherwise, having support for an old JSON Schema would mean work that definitely has to be done again soon. Another option, but that implies more work, is to make it configurable, and to be able to generate different versions of JSON Schema from Pydantic. There are not many changes between versions, but it would still be quite some extra work than just supporting a single one.


The other thing is, if I remember correctly, the Swagger team was working on the next version of the Swagger Editor, with a bunch of features, I think re-writing it from scratch. And I understand that a lot of that work would be inherited by Swagger UI. So, probably (hopefully) soon, there would be a new Swagger UI that supports OpenAPI 3.1.0.


It would probably take me a bit of time to be able to work on this properly, so feel free to take this @Kludex! 🙇 Please just keep me posted. 🤓 😉

@Kludex
Copy link
Member

Kludex commented Dec 22, 2022

extra details required by JSON Schema, but not used in pydantic-core should be saved in the 'extra' key which is already available in pydantic-core schema

I didn't get this part. You mean to save it for caching purposes?

(Should I ask questions elsewhere, or is it fine here?)

EDIT: Got it, you mean this https://github.com/pydantic/pydantic-core/blob/03a2adf24dd5f0cea5d07b1f956c42805d54fe92/pydantic_core/core_schema.py#L1038-L1042.

@Kludex
Copy link
Member

Kludex commented Dec 22, 2022

Regarding testing, we should try to leverage the JSON Schema test suit, see json-schema-org/JSON-Schema-Test-Suite.

I don't understand how you see this. Can you elaborate?

I'm asking because currently the test suite is based on having an input, which is mainly a python object, usually a BaseModel class, and we have the output being a dictionary, and then we make an assertion. How does the JSON-Schema-Test-Suite fits here?

@samuelcolvin
Copy link
Member Author

I didn't get this part. You mean to save it for caching purposes?

Sorry if I wasn't quite clear, the JsonSchema you refer to is not quite right, sorry if it's confusing. JsonSchema there refers to the pydantic-core schema for the Json "meta-type", e.g. how we would defined the json_data type below

from pydantic import BaseModel, Json

class MyModel(BaseModel):
    json_data: Json[list[int]]

What I meant by

extra details required by JSON Schema, but not used in pydantic-core should be saved in the 'extra' key which is already available in pydantic-core schema

is this: the new JSONSchema generation logic will recursively process the pydantic-core schema to convert it to JSONSchema. However sometimes there's information required to generate the JSONSchema which aren't required for pydantic-core schema (e.g. title and description), since pydantic-core schema is validated and extra keys are forbidden, we need a way to include this info so it can be used to generate JSONSchema, so we use the extra key which is allowed in pydantic-core schema but not used to generate the validator.


How does the JSON-Schema-Test-Suite fits here?

No idea yet, I just saw that there were some standard JSONSchema tests and thought it might help to try and implement them. I haven't yet looked into how best to do this.

@DanielNoord
Copy link
Contributor

How does the JSON-Schema-Test-Suite fits here?

No idea yet, I just saw that there were some standard JSONSchema tests and thought it might help to try and implement them. I haven't yet looked into how best to do this.

Note that these are meant to test validators of JSON Schema. For pydantic it would be better to have a set of python objects and their expected JSON Schema.

We/you could probably do something like:

  1. Take test data from JSON-Schema-Test-Suite
  2. Convert data to python object
  3. Make pydantic model from object
  4. Create JSON Schema for model
  5. Assert model's generated schema is same as test data's schema

However, you would probably still need a mapping of test data to models or perhaps a bit more easily a mapping of models to JSON Schemas. I don't immediately see how we/you could use the JSON-Schema-Test-Suite directly with modification as it serves another purpose (schema validation vs. schema generation).

@spacether
Copy link

spacether commented Jan 23, 2023

openapi-generator and openapi-json-schema-generator ingest the json schema test suite and generate an openapi-test spec that includes the test cases and uses them to auto generate tests. pydantic could do the same.

@koxudaxi
Copy link
Contributor

Make pydantic model from object

I don't know whether this way is the best.
datamodel-code-generator can generate Pydantic code from the JSON Schema for the JSON-Schema-Test-Suite.
Now, the code generator creates v1 pydantic models.
After It supports v2, we can create a test case.
We may use the generator for the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome schema Related to JSON Schema
Projects
No open projects
Status: Done