New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional parameters to Schema for validation and annotation #311

Merged
merged 15 commits into from Dec 3, 2018

Conversation

3 participants
@tiangolo
Contributor

tiangolo commented Nov 22, 2018

Change Summary

Add additional parameters to Schema for validation and annotation, in a single declaration.

Integrates and uses constr, conint, confloat and condecimal. Doesn't change any of the current functionality.

As a result, a current model created as:

from pydantic import BaseModel, constr, Schema

class Foo(BaseModel):
    a: constr(min_length=2, max_length=10) = Schema('bar')

can also be created as:

from pydantic import BaseModel, Schema

class Foo(BaseModel):
    a: str = Schema('bar', min_length=2, max_length=10)

by being able to declare a as a pure str makes it simpler for some editors to get the hint that the property a is a str and provide support for it (autocomplete, type error checking, etc).

Note

The generated JSON Schema will output validation keywords of minLength and maxLength in this case.

For numbers validations like gt it outputs exclusiveMinimum.

Other option for the parameters would be to use the literal name used in JSON Schema, as exclusiveMinimum, but it wouldn't be very "Pythonic".

And other option would be using JSON Schema names and change them to snake-case, as exclusive_minimum.

But I imagine the preferred option is to keep the same parameter names used in the rest of the code, as in conint, etc. Otherwise, let me know and I'll change it.

Specific parameters added to Schema

  • description
  • gt
  • ge
  • lt
  • le
  • min_length
  • max_length
  • regex

Related issue number

I didn't create a specific issue for this. But the changes where discussed here: #291 (comment)

Performance Changes

pydantic cares about performance, if there's any risk performance changed on this PR,
please run make benchmark-pydantic before and after the change:

  • before:
                                pydantic time=0.775s, success=49.95%
                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.792s, success=49.95%
                                pydantic time=0.773s, success=49.95%
                                pydantic time=0.790s, success=49.95%
                                pydantic best=0.773s, avg=0.785s, stdev=0.010s

                                pydantic best=25.770μs/iter avg=26.155μs/iter stdev=0.332μs/iter
  • after:
                                pydantic time=0.794s, success=49.95%
                                pydantic time=0.801s, success=49.95%
                                pydantic time=0.809s, success=49.95%
                                pydantic time=0.765s, success=49.95%
                                pydantic time=0.797s, success=49.95%
                                pydantic best=0.765s, avg=0.793s, stdev=0.017s

                                pydantic best=25.507μs/iter avg=26.441μs/iter stdev=0.554μs/iter

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes
  • No performance deterioration (if applicable)
  • HISTORY.rst has been updated
    • if this is the first change since a release, please add a new section
    • include the issue number or this pull request number #<number>
    • if you're not a regular contributer please include your github username @whatever
@codecov

This comment has been minimized.

codecov bot commented Nov 22, 2018

Codecov Report

Merging #311 into json-schema-extra will not change coverage.
The diff coverage is 100%.

@@               Coverage Diff                @@
##           json-schema-extra   #311   +/-   ##
================================================
  Coverage                100%   100%           
================================================
  Files                     13     14    +1     
  Lines                   1708   1770   +62     
  Branches                 323    338   +15     
================================================
+ Hits                    1708   1770   +62
:param gt: only applies to numbers, requires the field to be "greater than". The schema
will have an ``exclusiveMinimum`` validation keyword
:param ge: only applies to numbers, requires the field to be "greater than or equal to". The
schema will have a ``minimum`` validation keyword

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

subsequent lines need to be indented as above.

schema will have a ``maxLength`` validation keyword
:param regex: only applies to strings, requires the field match agains a regular expression
pattern string. The schema will have a ``pattern`` validation keyword
:param **extra: any additional keyword arguments will be added as is to the schema

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

is this still true, is it valid in json-schema?

This comment has been minimized.

@tiangolo

tiangolo Nov 27, 2018

Contributor

TL;DR: Yes, it's still valid JSON Schema. Extensibility by adding extra keywords is part of the standard.


Additionally, there are some keywords that are part of the standard and are not declared as parameters of the Schema class. So, passing them as part of the **extra is the current only option to include them.

For example, the examples keyword could have a list of JSON "objects" (dicts) with examples of valid JSON "documents". ...I might create a new PR later with auto-generation of an example for each generated schema. But for now, they could be declared by hand by developers / Pydantic users using the **extra.

Also, I didn't add all the possible extra keywords as parameters to the Schema class because there are some keywords that would apply a validation that doesn't have an equivalent in Pydantic yet. For example, max and min size of an array (list). So, to not confuse users, I just added as parameters of Schema the things that actually implement a validation on data. And what I think are the most common extra informational (non-validation) fields: title and description.

This comment has been minimized.

@samuelcolvin
@@ -36,16 +51,117 @@ class Validator(NamedTuple):
class Schema:

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

Maybe best to move this into schema.py?

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

Given that pydantic now requires dataclasses, why don't we make Schema a dataclass. It's just what they're made for.

This comment has been minimized.

@tiangolo

tiangolo Nov 27, 2018

Contributor

Sure, I'll move it to schema.py. Makes sense.


I would like to use dataclasses, and avoid all that code / parameter duplication. The problem is that editors don't seem to support them very well yet. So, completion and type checks still depend on / expect a __init__.

I guess soon, at some point PyCharm, Jedi and Python Language Server will end up supporting them and we'll be able to use them.

But for now, I would say it's better to keep the __init__. Nevertheless, if you want me to, I can add the decorator to make it explicit.

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 27, 2018

Owner

Let's use dataclasses.

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

Done. I made it a dataclass with the decorator.

It still has the __init__, let me know if you want me to remove it (although that would limit some of the editor's support 😨 ).


So, I tried moving Schema to schema.py, and I ran into several issues with circular imports. I tried importing the whole module (as it seems to work with some type declarations) and other "tricks". But I didn't find another way to solve it.

I'm putting the Schema class in its own schema_class.py file that is then imported by the rest, to reduce the noise in fields.py while avoiding circular imports. It is still exported by schema.py, but it is actually declared in schema_class.py and imported from there. I would like to put it in schema.py as it seems like the place it should live, but I don't know how to avoid the circular imports.

Let me know if that sounds OK, if you want me to put it back into fields.py ...or what should we do.

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

@samuelcolvin

Leave it here and I'll move it.

Oops, I had already moved it to schema_class.py, I'm not sure what you refer to with here. Let me know if you want me to move it somewhere else.

Yes, please remove __init__, that's the whole point in dataclasses. PyCharm does lots of things badly, that shouldn't constrain what we do with python.

So, I was removing the __init__ and leaving it as a pure dataclass, and now I see that we can't declare **kwargs using pure dataclasses. We need __init__ for that. How do you want me to proceed?


@pauleveritt

We (PyCharm) have more dataclasses work scheduled for next release. Let me know some specifics (separate ticket?) and I can get the right person involved.

That sounds great! Where should we have the conversation?

In summary:

from dataclasses import dataclass

@dataclass
class A:
    a: str
    b: int

my_a = A()

When calling my_a = A() we should see autocompletion and type checking for the arguments passed to A() as if the class was declared like:

class A:
    def __init__(self, a: str, b: int):
        self.a = a
        self.b = b

This comment has been minimized.

@pauleveritt

pauleveritt Nov 28, 2018

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

@tiangolo Can we move it to another ticket in this repo? Then, if needed, I'll make a ticket in PyCharm repo and get Seymon (the one owning dataclasses) involved.

Yeah, I guess. Although it wouldn't be specifically related to this repo... @samuelcolvin would that be ok?

This comment has been minimized.

@pauleveritt

pauleveritt Nov 28, 2018

Or email me directly, paul dot everitt at jetbrains dot com

This comment has been minimized.

@tiangolo

tiangolo Nov 30, 2018

Contributor

@pauleveritt thanks for the interest.

I was writing and documenting everything with screenshots, etc. But during the process, I realized that I had misconfigured the PyCharm environment in which I did the previous tests and that PyCharm actually does support dataclasses. All I was asking is already implemented.

So, up to now, PyCharm actually has the best dataclass support, the rest of the editors are the ones still behind.

def get_annotation_from_schema(annotation, schema):
"""Get an annotation with validation implemented for numbers and strings based on the schema.

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

(sorry to be a pedant) please move the first line of the docstring onto it's own line.

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

No worries. Done.

isinstance(annotation, type)
and issubclass(annotation, (str,) + _numeric_types)
and not issubclass(annotation, bool)
and not any([issubclass(annotation, c) for c in _blacklist])

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

surely add bool to _blacklist, also you don't need square brackets here, indeed square brackets make this line lower by meaning issubclass has to be evaluated for each item in _blacklist even if the first is True

def get_field_schema_validations(field):
"""Get the JSON Schema validation keywords for a ``field`` with an annotation of

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

again docstrings in the format

    """
    this is the first line.
    """
a Pydantic ``Schema`` with validation arguments.
"""
f_schema = {}
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

we do this a lot, should we have a method lenient_issubclass?

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

Agreed.
I added it to utils.py.
Added a couple of tests for it.
Update the code that used that to use the new function.

and not issubclass(annotation, bool)
and not any([issubclass(annotation, c) for c in _blacklist])
):
if issubclass(annotation, str):

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

(str, bytes)?

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

Yeah, but as I'm calling constr to generate the final type, it would always output str, instead of bytes.

I guess one option would be adding a conbytes, or using the validators directly, like anystr_length_validator, but I wanted to use internally the same functions that already exist.

What do you think we should do about that?

@@ -140,6 +146,38 @@ def field_schema(
return s, f_definitions
numeric_types = (int, float, Decimal)
_str_types_attrs = {
'max_length': (numeric_types, 'maxLength'),

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

maybe these should be 3 element tuples?

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

You're right. I was initially planning on using it differently. But now it doesn't make sense to have it as a dict.

f_schema = {}
if isinstance(field.type_, type) and issubclass(field.type_, (str, bytes)):
for attr, (t, keyword) in _str_types_attrs.items():
if getattr(field._schema, attr) and isinstance(getattr(field._schema, attr), t):

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 26, 2018

Owner

call getattr once not 3 times, also surely isinstance(None, str) == False anyway?

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 26, 2018

Thanks for the review! I'll come, check and fix in a couple days.

def get_annotation_from_schema(annotation, schema):
"""Get an annotation with validation implemented for numbers and strings based on the schema.
:param annotation: an annotation from a field specification, as ``str``, ``ConstrainedStr`` or the

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 28, 2018

Owner

no need for or the return value of constr(max_length=10) since that is a ConstrainedStr

This comment has been minimized.

@tiangolo

tiangolo Nov 28, 2018

Contributor

Sure.

I had it that ways as I understand the return of constr would be of type ConstrainedStrValue, that inherits from ConstranedStr, because of the:

return type('ConstrainedStrValue', (ConstrainedStr,), namespace)

But I also think that it sounds difficult to understand and would be an overcomplication there in the docstings. So, if it's ok for you to just have up to the ConstrainedStr, then I'm going to remove that last part.

This comment has been minimized.

@samuelcolvin

@samuelcolvin samuelcolvin changed the base branch from master to json-schema-extra Nov 28, 2018

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 28, 2018

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 28, 2018

I've changed the base for this PR so we can merge it without it going into master.

Fix the stuff that makes sense above, then I'll work on anything I want to change. Easier than going back and fourth for ever.

Great.


Do you want me to mark as "resolved" the comments in the review that are already done?

I'm leaving them there so you can come back, check and confirm, before marking them as resolved. But let me know if you want me to mark the ones that are done and don't have a pending discussion.

@samuelcolvin samuelcolvin referenced this pull request Nov 30, 2018

Open

conbytes #315

@samuelcolvin samuelcolvin merged commit 38fba73 into samuelcolvin:json-schema-extra Dec 3, 2018

3 checks passed

codecov/project 100% (+0%) compared to a0aa9e7
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details

samuelcolvin added a commit that referenced this pull request Dec 3, 2018

Add additional parameters to Schema for validation and annotation (#319)
* Add additional parameters to Schema for validation and annotation (#311)

* Add tests for validation declared via Schema class in defaults

* Add validations to field from declarations in Schema

* Add annotations in generated JSON Schema from validations via Schema

* Augment tests for schema generation

* Simplify validations from Schema in fields, from coverage hints

* Update schema test to use the spec plural "examples"

* Add docs and simple example of the additional parameters for Schema

* Update history

* Fix number of PR in HISTORY

* Refactor check for numeric types, remove break to make coverage happy

* Fix typo in docs, I confused gt with maximum

* Finish docstring for Schema (I had forgotten about it)

* Implement code review requests and lenient_issubclass with tests

* Move Schema to its now file to extract from fields.py but avoid circular imports

* Control coverage

* Schema fixes (#318)

* rearrange code

* cleanup get_annotation_from_schema

* fix typo

* rename _schema to schema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment