Add JSON schema support (v2) #5029

dmontagu · 2023-02-09T01:54:23Z

Things left to resolve:

dmontagu · 2023-02-11T20:39:39Z

@samuelcolvin there's still a lot of work to be done before the JSON schema work is "finished", but I think I am starting to add too much to reasonably review. I think it would be better to stop new work on this branch, and just get everything it adds reviewed and merged before continuing.

There are already enough decisions in here that may be controversial that I think it's probably a mistake to continue building on them before establishing more thorough agreement.

I will be happy to remove things from this branch (such as my possibly-half-baked implementation of handling discriminator), even if it means more tests fail / etc., so that we can merge whatever fraction of it seems acceptable for now. This will make it easier to keep up-to-date, and make reviewing future improvements easier as well.

pydantic/_internal/_std_types_schema.py

pydantic/_internal/_core_metadata.py

pydantic/_internal/_typing_extra.py

pydantic/main.py

tiangolo · 2023-02-21T08:39:42Z

@dmontagu ah, very clever!

And good point that subclassing GenerateJsonSchema might work (from the comment in the other PR). Although it's true that doing it in FastAPI would not allow users to intuitively subclass it themselves.

I think this idea of including errors in the schema with a flag should work. My only fear is including two types of things in the same object (schema and errors), but at the same time, this would only happen when enabling the flag, so whoever does that (me) would have to know what they are doing and know what to expect, so I think that might be enough.

And this is probably the simplest solution, not affecting anyone else nor changing much the return type annotations.

I was also thinking about an alternative implementation separating the errors and typing it with @overload, but thinking about it, your solution of including it in the schema would naturally preserve the exact spot of the error without needing any other tricks. So, the more I think about it, the more I get convinced your idea is better than what I was thinking originally. 🧠

Not sure if you would rather keep the conversation here in a central place, or there in the other PR to avoid my extra noise here, let me know if you would prefer to switch over there!

samuelcolvin · 2023-02-21T12:25:46Z

`Field` - `example`, `const`, `regex`

I agree we should move to remove/change them but provide backwards compatibility with a deprecation warning, I guess we should also allow **kwargs and raise a warning for that too - I guess we should we add those **kwargs to JSON Schema for backwards compatibility

`Config.schema_extra`

I guess we should support it, with a warning

`default` values

I guess we should use the same logic as for fields with a type that can't be defined in JSON - e.g. Callable and IsSubClass - I don't really like UserWarning, but since we use it already I'm happy to keep it.

Update: @dmontagu I don't see this code, did we decide to remove it?

`__pydantic_modify_json_schema__`

I'm actually inclined here to change the behaviour and name to def __pydantic_json_schema__(cls, schema: Dict[str, Any], info: Info) -> Dict[str, Any], e.g. remove the slightly odd "modify" behaviour to a much more obvious "take the value, return the new value" signature that matches what we're doing with pydantic_core.

I know this is another change, however:

the method name has changed, we can continue to support the current behaviour with __modify_schema__ albeit with a warning
we can raise a warning or error if __pydantic_modify_json_schema__ "returns None" and hence avoid mistakes

`Decimal`

My preference would be to add a config setting for JSON Schema generation, something like mode: Literal['validation', 'serialisation'] or 'input' | 'output' which indicates whether we're build a JSON schema for what's required for MyModel(**data) vs. my_model.model_json() etc.

Then the JSON Schema for validation should be number, but should be string for serialisation where we're going to default
to returning a string as per #1511.

@tiangolo errors feature request

My solution would be this:

always collect the errors/warnings on the instance of GenerateJsonSchema, use a method on GenerateJsonSchema to create the error/warning, and another method to construct the JSON Schema value to use in these cases (both so they can be customised fairly easily)
Change the default code to generate, then warn/raise an error if the errors exist, this would allow FastAPI to do something different

So

s = schema_generator(by_alias=by_alias, ref_template=ref_template).generate(cls.__pydantic_core_schema__)

Becomes (by default)

schema_generator = schema_generator_cls(by_alias=by_alias, ref_template=ref_template)
s = s.generate(cls.__pydantic_core_schema__)
if s.errors:
    raise PydanticInvalidForJsonSchema(...)

Then FastAPI have have it's own logic to generate JSON Schema with very little duplication.

@tiangolo migration tool

see #5013

samuelcolvin

I think this is looking awesome.

I'm in favour of merging it asap and creating separate issues for outstanding problems & discussions.

pydantic/json_schema.py

pydantic/schema.py

pydantic/json_schema.py

tiangolo · 2023-02-21T17:53:09Z

Thanks for the replies to all the points @samuelcolvin!

All looks good, and agreed it makes sense in subsequent PRs.

`Field()` with `**kwargs`

About Field() supporting **kwargs, I would like a way to make mypy/editors complain about them, so that developers can avoid using things that are not supported while writing the code and not only at runtime with a warning.

I was playing around and found a way to achieve both, I think. This works at runtime (with a deprecation warning), but shows an error in editors. It's a bit of extra code duplication because of the overload, but I guess it's the cost of backwards compatibility. 😅

Field() implementation

@overload
def Field(
    default: Any = Undefined,
    *,
    default_factory: typing.Callable[[], Any] | None = None,
    alias: str = None,
    title: str = None,
    description: str = None,
    exclude: typing.AbstractSet[int | str] | typing.Mapping[int | str, Any] | Any = None,
    include: typing.AbstractSet[int | str] | typing.Mapping[int | str, Any] | Any = None,
    gt: float = None,
    ge: float = None,
    lt: float = None,
    le: float = None,
    multiple_of: float = None,
    allow_inf_nan: bool = None,
    max_digits: int = None,
    decimal_places: int = None,
    min_items: int = None,
    max_items: int = None,
    min_length: int = None,
    max_length: int = None,
    frozen: bool = None,
    pattern: str = None,
    discriminator: str = None,
    repr: bool = True,
    strict: bool | None = None,
    json_schema_extra: dict[str, Any] | None = None,
) -> Any:
    ...

def Field(
    default: Any = Undefined,
    *,
    default_factory: typing.Callable[[], Any] | None = None,
    alias: str = None,
    title: str = None,
    description: str = None,
    examples: list[Any] = None,
    exclude: typing.AbstractSet[int | str] | typing.Mapping[int | str, Any] | Any = None,
    include: typing.AbstractSet[int | str] | typing.Mapping[int | str, Any] | Any = None,
    gt: float = None,
    ge: float = None,
    lt: float = None,
    le: float = None,
    multiple_of: float = None,
    allow_inf_nan: bool = None,
    max_digits: int = None,
    decimal_places: int = None,
    min_items: int = None,
    max_items: int = None,
    min_length: int = None,
    max_length: int = None,
    frozen: bool = None,
    pattern: str = None,
    discriminator: str = None,
    repr: bool = True,
    strict: bool | None = None,
    json_schema_extra: dict[str, Any] | None = None,
    **kwargs: Any,
) -> Any:
    """
    Used to provide extra information about a field, either for the model schema or complex validation. Some arguments
    apply only to number fields (``int``, ``float``, ``Decimal``) and some apply only to ``str``.

    :param default: since this is replacing the field's default, its first argument is used
      to set the default, use ellipsis (``...``) to indicate the field is required
    :param default_factory: callable that will be called when a default value is needed for this field
      If both `default` and `default_factory` are set, an error is raised.
    :param alias: the public name of the field
    :param title: can be any string, used in the schema
    :param description: can be any string, used in the schema
    :param examples: can be any list of json-encodable data, used in the schema
    :param exclude: exclude this field while dumping.
      Takes same values as the ``include`` and ``exclude`` arguments on the ``.dict`` method.
    :param include: include this field while dumping.
      Takes same values as the ``include`` and ``exclude`` arguments on the ``.dict`` method.
    :param gt: only applies to numbers, requires the field to be "greater than". The schema
      will have an ``exclusiveMinimum`` validation keyword
    :param ge: only applies to numbers, requires the field to be "greater than or equal to". The
      schema will have a ``minimum`` validation keyword
    :param lt: only applies to numbers, requires the field to be "less than". The schema
      will have an ``exclusiveMaximum`` validation keyword
    :param le: only applies to numbers, requires the field to be "less than or equal to". The
      schema will have a ``maximum`` validation keyword
    :param multiple_of: only applies to numbers, requires the field to be "a multiple of". The
      schema will have a ``multipleOf`` validation keyword
    :param allow_inf_nan: only applies to numbers, allows the field to be NaN or infinity (+inf or -inf),
        which is a valid Python float. Default True, set to False for compatibility with JSON.
    :param max_digits: only applies to Decimals, requires the field to have a maximum number
      of digits within the decimal. It does not include a zero before the decimal point or trailing decimal zeroes.
    :param decimal_places: only applies to Decimals, requires the field to have at most a number of decimal places
      allowed. It does not include trailing decimal zeroes.
    :param min_items: only applies to lists, requires the field to have a minimum number of
      elements. The schema will have a ``minItems`` validation keyword
    :param max_items: only applies to lists, requires the field to have a maximum number of
      elements. The schema will have a ``maxItems`` validation keyword
    :param min_length: only applies to strings, requires the field to have a minimum length. The
      schema will have a ``minLength`` validation keyword
    :param max_length: only applies to strings, requires the field to have a maximum length. The
      schema will have a ``maxLength`` validation keyword
    :param frozen: a boolean which defaults to True. When False, the field raises a TypeError if the field is
      assigned on an instance.  The BaseModel Config must set validate_assignment to True
    :param pattern: only applies to strings, requires the field match against a regular expression
      pattern string. The schema will have a ``pattern`` validation keyword
    :param discriminator: only useful with a (discriminated a.k.a. tagged) `Union` of sub models with a common field.
      The `discriminator` is the name of this common field to shorten validation and improve generated schema
    :param repr: show this field in the representation
    :param json_schema_extra: extra dict to be merged with the JSON Schema for this field
    :param strict: enable or disable strict parsing mode
    """
    current_json_schema_extra: dict[str, Any] | None = None
    if kwargs:
        print("what")
        warnings.warn(
            'Arbitrary Field keywords (**kwargs) have been deprecated, to extend the '
            'generated JSON Schema use the new dict parameter json_schema_extra, the '
            f'invalid keyword arguments are: {kwargs}',
            DeprecationWarning,
        )
        current_json_schema_extra = kwargs.copy()
    if current_json_schema_extra and json_schema_extra:
        current_json_schema_extra.update(json_schema_extra)
    else:
        current_json_schema_extra = json_schema_extra
    return FieldInfo.from_field(
        default,
        default_factory=default_factory,
        alias=alias,
        title=title,
        description=description,
        examples=examples,
        exclude=exclude,
        include=include,
        gt=gt,
        ge=ge,
        lt=lt,
        le=le,
        multiple_of=multiple_of,
        allow_inf_nan=allow_inf_nan,
        max_digits=max_digits,
        decimal_places=decimal_places,
        min_items=min_items,
        max_items=max_items,
        min_length=min_length,
        max_length=max_length,
        frozen=frozen,
        pattern=pattern,
        discriminator=discriminator,
        repr=repr,
        json_schema_extra=current_json_schema_extra,
        strict=strict,
    )

errors feature request

I think that should work too, and it could probably be another PR on top. The main important thing is that the current approach doesn't make that impossible to achieve/add later. 🎉

dmontagu · 2023-02-21T17:55:45Z

@samuelcolvin

My preference would be to add a config setting for JSON Schema generation, something like mode: Literal['validation', 'serialisation'] or 'input' | 'output' which indicates whether we're build a JSON schema for what's required for MyModel(**data) vs. my_model.model_json() etc.

Then the JSON Schema for validation should be number, but should be string for serialisation where we're going to default

There is an important consideration here that I discuss in #5072, which is that when generating clients based on an OpenAPI spec (one of the biggest benefits of FastAPI imo, and one of the main reasons I have used it for years), models are frequently used both as inputs (so will be "validated"), and as outputs (so will be "serialized"). And unless you make two separate models (which comes with its own issues), they will share a schema. So it's not uncommon that you'll have to have one schema for the both inputs and outputs, and therefore need to resolve this.

And even if we were okay with creating two separate schemas for the model based on whether it is an "input" or an "output" of the API — and I am not sure we should be, at least in the context of FastAPI — there's still the issue that we'd probably want a single generator instance to produce the "output" format in some places and the "input" format in others, within the same schema (at least that's the closest to how FastAPI does it today, I think). So I'm thinking it may make more sense to somehow make it an annotation on the core schema (that would be set by FastAPI/similar), as opposed to on the generator.

I'm not sure, but I'd be inclined to postpone addressing this in this PR and hopefully have some discussion in #5072

dmontagu · 2023-02-21T17:58:48Z

I was playing around and found a way to achieve both, I think.

@tiangolo Yeah this was my plan for how to do this. We can remove the overload down the line after people have had time to migrate, and it will still cause mypy/IDE errors for them now.

👍

dmontagu · 2023-02-21T22:38:54Z

@samuelcolvin

from your suggestion to @tiangolo's feature request:

schema_generator = schema_generator_cls(by_alias=by_alias, ref_template=ref_template)
s = s.generate(cls.__pydantic_core_schema__)
if s.errors:
    raise PydanticInvalidForJsonSchema(...)

Then FastAPI have have it's own logic to generate JSON Schema with very little duplication.

The main shortcoming I see with this is providing context about where the error was raised. Right now we don't have a good system for understanding the "path" that was taken through the crazy recursive function calls to produce the error, which I think may be necessary for the kind of errors @tiangolo wants to show. I am not opposed to implementing that logic in principle, but that logic would end up reflecting the structure of the CoreSchema more than the JsonSchema (when they differ anyway), which I could imagine leading to some confusion. (And would require some logic..) Maybe you've thought of a good way to do it.

Another downside to always deferring error raising is that it makes it a lot more annoying to debug when you want it to raise, and see where it was raised. If we add a deferred-error-collection mode I would suggest we still retain the ability to raise immediately (through one choice of an 'errors' kwarg or similar) so that we can easily get a stack trace to where the exception was raised when desired.

Either way, I think let's create a separate issue for this. (Actually I might just open a PR making the change I suggested above that at least retains information about where the error is within the final json schema, at least then the alternative is made concrete.)

samuelcolvin · 2023-02-22T21:11:41Z

Thanks so much for this @dmontagu, amazing to have this merged.

On the error stuff, my instinct is that the traceback might not be that useful anyway, we could even store the traceback with the errors if really necessary, but I'm also not that bothered about it.

It might be easier to just add a kwarg config setting to GenerateJsonSchema to do all the things with errors that @tiangolo wants, rather than aiming for absolute flexibility and making a horrible API.

dmontagu · 2023-02-23T20:20:05Z

@Julian I've done some work on JSON schema for discriminated unions in #5051; I know it's more OpenAPI than JSON schema, so not sure if it's something you can help with, but I would appreciate any insight there. (And thanks for your offer to look at the JSON schema stuff, whether or not you can help in this particular case.)

David Montague added 3 commits February 8, 2023 07:55

WIP

ae04868

Initial work

93ed68e

Ready to work on tests

64300c7

dmontagu changed the title ~~JSON schema support~~ Add JSON schema support (v2) Feb 9, 2023

David Montague and others added 7 commits February 8, 2023 20:28

Merge branch 'main' into dm-json-schema

70ec62d

Make more tests pass

69ffeba

Get some more tests passing

918441f

Refactor the approach to dead-definition-removal

b5d011c

Address some TODOs

f2f0137

Add more improvements and get more tests passing

a956406

More fixes

6084033

dmontagu mentioned this pull request Feb 10, 2023

Move ref from TypedDictSchema to ModelSchema (v2) #5038

Closed

dmontagu added 5 commits February 10, 2023 14:03

More improvements

56b1c5c

Merge branch 'main' into dm-json-schema

1666d26

WIP

37d3afd

Improve enum support

acfda10

Get all tests passing

efaead6

dmontagu marked this pull request as ready for review February 11, 2023 20:36

dmontagu added 2 commits February 11, 2023 14:06

Attempt to get CI passing for other python versions

2be6c04

Make the diff a bit smaller

0916451

Kludex reviewed Feb 13, 2023

View reviewed changes

pydantic/_internal/_std_types_schema.py Outdated Show resolved Hide resolved

samuelcolvin mentioned this pull request Feb 13, 2023

JSON Schema 2020-12 [work in progress] #4947

Closed

5 tasks

dmontagu added 3 commits February 14, 2023 15:42

Some clean-up

a845cff

Fix issues with Python<3.10

db6a5e1

Add some more docs, rename some things

b390b32

dmontagu commented Feb 15, 2023

View reviewed changes

pydantic/_internal/_core_metadata.py Show resolved Hide resolved

dmontagu commented Feb 15, 2023

View reviewed changes

pydantic/_internal/_core_metadata.py Outdated Show resolved Hide resolved

dmontagu commented Feb 15, 2023

View reviewed changes

pydantic/_internal/_typing_extra.py Outdated Show resolved Hide resolved

dmontagu commented Feb 15, 2023

View reviewed changes

pydantic/main.py Outdated Show resolved Hide resolved

Attempt to fix CI again

b45728b

delete empty pydantic/json_schema_misc.py

56f180e

samuelcolvin approved these changes Feb 21, 2023

View reviewed changes

pydantic/json_schema.py Outdated Show resolved Hide resolved

pydantic/schema.py Show resolved Hide resolved

pydantic/json_schema.py Outdated Show resolved Hide resolved

pydantic/json_schema.py Show resolved Hide resolved

samuelcolvin reviewed Feb 21, 2023

View reviewed changes

pydantic/json_schema.py Outdated Show resolved Hide resolved

dmontagu added 5 commits February 21, 2023 15:06

Address some feedback

1381041

Merge branch 'main' into dm-json-schema

ef1d3aa

Merge main

f84f805

Fix issues after merging with main

fa28dbb

Delete schema.py; address other feedback

197904f

Rename test_schema to test_json_schema

ffd2bf5

dmontagu merged commit 73373c3 into pydantic:main Feb 21, 2023

dmontagu deleted the dm-json-schema branch February 23, 2023 02:28

This was referenced Feb 24, 2023

Disable description generation for jsonschema #5106

Closed

tests/test_types_color.py::test_model_validation gets deprecation warning pydantic/pydantic-extra-types#19

Closed

samuelcolvin mentioned this pull request Mar 9, 2023

Generate proper JSON Schema for nullable properties #1611

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSON schema support (v2) #5029

Add JSON schema support (v2) #5029

dmontagu commented Feb 9, 2023 •

edited

Loading

dmontagu commented Feb 11, 2023

tiangolo commented Feb 21, 2023 •

edited

Loading

samuelcolvin commented Feb 21, 2023

samuelcolvin left a comment

tiangolo commented Feb 21, 2023

dmontagu commented Feb 21, 2023 •

edited

Loading

dmontagu commented Feb 21, 2023 •

edited

Loading

dmontagu commented Feb 21, 2023 •

edited

Loading

samuelcolvin commented Feb 22, 2023

dmontagu commented Feb 23, 2023

Add JSON schema support (v2) #5029

Add JSON schema support (v2) #5029

Conversation

dmontagu commented Feb 9, 2023 • edited Loading

dmontagu commented Feb 11, 2023

tiangolo commented Feb 21, 2023 • edited Loading

samuelcolvin commented Feb 21, 2023

Field - example, const, regex

Config.schema_extra

default values

__pydantic_modify_json_schema__

Decimal

@tiangolo errors feature request

@tiangolo migration tool

samuelcolvin left a comment

Choose a reason for hiding this comment

tiangolo commented Feb 21, 2023

Field() with **kwargs

errors feature request

dmontagu commented Feb 21, 2023 • edited Loading

dmontagu commented Feb 21, 2023 • edited Loading

dmontagu commented Feb 21, 2023 • edited Loading

samuelcolvin commented Feb 22, 2023

dmontagu commented Feb 23, 2023

dmontagu commented Feb 9, 2023 •

edited

Loading

tiangolo commented Feb 21, 2023 •

edited

Loading

`Field` - `example`, `const`, `regex`

`Config.schema_extra`

`default` values

`__pydantic_modify_json_schema__`

`Decimal`

`Field()` with `**kwargs`

dmontagu commented Feb 21, 2023 •

edited

Loading

dmontagu commented Feb 21, 2023 •

edited

Loading

dmontagu commented Feb 21, 2023 •

edited

Loading