Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema for serialised model, rather than data required for validation #4697

Closed
samuelcolvin opened this issue Nov 1, 2022 Discussed in #4577 · 10 comments
Closed

JSON Schema for serialised model, rather than data required for validation #4697

samuelcolvin opened this issue Nov 1, 2022 Discussed in #4577 · 10 comments
Assignees
Labels

Comments

@samuelcolvin
Copy link
Member

Discussed in #4577

Originally posted by JonasKs October 1, 2022
Hi,

When I have a model like this:

class X(BaseModel):
    list_item: list[str] = Field(default=[])

The list will always be there, it won't be None. I have also not type annotated it as list[str] | None. However, the OpenAPI documentation does not mark it as required.

The solution is to do this:

class X(BaseModel):
    list_item: list[str] = Field(default=...)

but that also forces me to always create X(list_item=[]) instead of just X(). Not the end of the world, but for some large models this is a bit annoying.

Is OpenAPI documentation generated without using type hints, since this don't work?
Any clean workarounds?

@samuelcolvin samuelcolvin changed the title Json Schema for serialised model, rather than data required for validation JSON Schema for serialised model, rather than data required for validation Nov 1, 2022
@samuelcolvin
Copy link
Member Author

Thanks so much for this, see #4666 for some context on what we're doing with Json Schema.

You pose (indirectly) an interest question - right now the schema generated for a model represents "the data required to pass validation and create a model", in this scenario marking list_item as not required is correct.

But what I think you're talking about is generating a JSON Schema for the data generated from .dict() or .json() - the serialised model.

  • I've changed the title of this issue to reflect this, if I'm wrong, please clarify.

This is not something we currently do (or claim we do AFAIK).

There are two options:

  • leave this case for to be patched via Config.schema_extra or similar
  • add explicit support for "serialisation JSON Schema" - this seems like it could be pretty hard to get right

My immediate thinking is that we should add docs on this, and leave users to patch this themselves rather than put the effort into supporting this properly, but I'm open to a discussion.

@JonasKs
Copy link

JonasKs commented Nov 2, 2022

right now the schema generated for a model represents "the data required to pass validation and create a model", in this scenario marking list_item as not required is correct.

I agree - this is correct from that point of view, but we have two "views". In FastAPI it is common to have InputBody and ResponseModel Pydantic models.

  • For InputBody a default field should be marked as not required.
  • For ResponseModel, a default field should be marked as required, so that a list will always be a list, not list | None

My use case: We generate TypeScript models based on OpenAPI/Swagger JSON (which I assume is a pretty common use case). This is typed as list_item: list | None, instead of list_item: list. Yes, the field is not required to create a model, but it also will never be None from the consumers perspective, which it seems like it will be when looking at .schema_json()

I don't know if it would make sense to have a response=True-flag to alter this behavior, but usage of Config.schema_extra works too.

@JonasKs
Copy link

JonasKs commented Nov 2, 2022

This could also potentially be altered in FastAPI without any involvement of Pydantic itself? If a model is defined in response_model, go through and mark fields with default values as required in the schema_json()

@samuelcolvin
Copy link
Member Author

Well, if someone - me, @tiangolo, you, or some other volunteer is going to do the work to add this, we might as well do it in pydantic so everyone can benefit from the result.

Thinking about this, the only differences between the input schema and output schema is: fields with a default are "not required" in the input schema, but "required" in the output schema.

Note - as per #4516, in pydantic V2, Union[X, None] no longer means not required - to be not required a field has to have an explicit default. This is consistent with dataclasses, much cleaner and should help reduce the difference between input and output schema.

Apart from that, I can't think of any other differences between input and output schema (Of course, you can create a model that won't match the schema - with a post validation validator, or with assignment with validate_assignment=False, but that's your fault and your problem). Can anyone think of any other differences?

If not, this sounds like a reasonable feature request, happy to try to add it in V2.

@JonasKs
Copy link

JonasKs commented Nov 2, 2022

Thinking about this, the only differences between the input schema and output schema is: fields with a default are "not required" in the input schema, but "required" in the output schema.

Agree 😊

Apart from that, I can't think of any other differences between input and output schema. Can anyone think of any other differences?

I guess the use case for load_alias and dump_alias mentioned in #624 comes from the same place: Input, (working with the model), output.

Well, if someone - me, @tiangolo, you, or some other volunteer is going to do the work to add this, we might as well do it in pydantic so everyone can benefit from the result.

I'm not familiar with pydantic-core nor pydantic after the split. Where would this be implemented in V2? Would you accept PRs into #4516? Any thoughts on how the API should work? I won't promise anything (going on vacation soon), but I'd love to try.

@samuelcolvin
Copy link
Member Author

I agree on #624.

In terms of how to do this, #4666 is the best description of what I plan.

#4516 will be merged today (:tada: :sweat:), so any PR can be against main.

@JonasKs
Copy link

JonasKs commented Nov 2, 2022

Thank you, I'll read up on this after work.

Oh wow, you're that far already! 🎉 😊 Good job!

@JonasKs
Copy link

JonasKs commented Jan 17, 2023

I see #4666 has been assigned to @kennethreitz 🎉 Let me know if I can help with anything, I'd love to have this issue and #624 implemented.

@samuelcolvin samuelcolvin added the Schema JSON schema label Apr 17, 2023
@samuelcolvin samuelcolvin mentioned this issue Apr 17, 2023
1 task
@dmontagu dmontagu assigned adriangb and hramezani and unassigned adriangb Apr 28, 2023
@dmontagu
Copy link
Contributor

dmontagu commented Apr 28, 2023

Quick update: we do have a plan for distinguishing between the validation and serialization JSON schemas; I'm hoping to get to this at some point in the next week or so (no promises...).

@adriangb
Copy link
Member

I think this was completed in #5707

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants