New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema update/refactor/augment, to conform to spec #308

Merged
merged 66 commits into from Nov 22, 2018

Conversation

2 participants
@tiangolo
Contributor

tiangolo commented Nov 17, 2018

Update/augment JSON Schema implementation to conform to JSON Schema spec

Note

I renamed BaseModel.schema_json to BaseModel.schema_str, as it was a bit confusing for me the first time I tried Pydantic. As this is a change mostly related to "taste", let me know if you want me to revert it. Or maybe rename it to something like schema_dump.

Description

  • Generated JSON schema from a model is now fully conforming to the JSON Schema spec
  • Added a lot of logic to declare special and custom types, including:
    • Date and time types
    • Pydantic's types, including constraints, as max_length, regex, etc.
  • Types are mapped to match as possible the specs for JSON Schema Core, JSON Schema Validation and OpenAPI. E.g. EmailStr generates a "type": "string", "format": "email", equivalent for datetime. bytes get a "format": "binary", although there is no equivalent in JSON Schema, that is the format defined in OpenAPI, etc.
  • Refactor all schema handling out of the main classes
  • Models are defined in a separate definitions key, this allows better re-use of the definitions
    • Only applies for models and when not overwritten (with default values, custom title, etc)
  • Models defined in definitions key are checked and processed to have unique names, so modulea.Model will be named Model if there is no competing class, but if there is a moduleb.Model, both models will have corresponding unique names (modulea__Model and moduleb__Model)
  • utility function to generate a single JSON Schema with definitions from several models that can be re-utilized (e.g. in OpenAPI components).
  • Update all (or most of) the tests for JSON Schema
  • Add many new tests to check all the new functionality
  • I plan on reviewing the tests again and augment them if necessary once we check that these changes seem acceptable. After that, I'll update the docs for Schema generation.

Related issue number

#291

Performance Changes

pydantic cares about performance, if there's any risk performance changed on this PR,
please run make benchmark-pydantic before and after the change:

  • before: ?
  • after: ?

JSON Schemas are generaly only run once, and are also cached. As they are more like a "colateral" feature that is not involved in the normal worflow of Pydantic, I don't think performance could be affected or relevant for this PR.

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%: Test coverage for the new module at 100%. Added tests for utils to cover stuff that was covered with the previous implementation.
  • Documentation reflects the changes: Docs for Schema JSON were updated accordingly, as thoroughly and clear as possible.
  • [ 馃 ] No performance deterioration (if applicable): I think it doesn't apply to this PR.
  • HISTORY.rst has been updated: Updated.
    • if this is the first change since a release, please add a new section
    • include the issue number or this pull request number #<number>
    • if you're not a regular contributer please include your github username @whatever
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 17, 2018

  • Conflicts
  • not sure why tests aren't running, commit again and see if it's fixed
  • Do we really need the new test module? I doubt it.
  • Coverage should be 100% as it is on Master.

@tiangolo tiangolo force-pushed the tiangolo:json-schema-flat branch from a7669b8 to 8c555ba Nov 17, 2018

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 17, 2018

  • Rebased and pushed.
  • The tests are passing locally. Let me fix the remote errors that seem to be flake, I was not running it locally, I was just running make test.
  • The new test module are just dummy models. I don't like having an extra module with sub-modules just for testing either. But that is the way we can test having two models named Model and check that the JSON Schema generation creates unique names for both of them based on the module path. And that it doesn't duplicate an imported model that was already created somewhere else.
  • I'll work on the coverage.
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 17, 2018

Run just make to run both tests and linting.

@codecov

This comment has been minimized.

codecov bot commented Nov 18, 2018

Codecov Report

Merging #308 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #308    +/-   ##
======================================
  Coverage     100%   100%            
======================================
  Files          12     13     +1     
  Lines        1532   1703   +171     
  Branches      285    321    +36     
======================================
+ Hits         1532   1703   +171

@tiangolo tiangolo changed the title from [WIP] JSON Schema update/refactor/augment, to conform to spec to JSON Schema update/refactor/augment, to conform to spec Nov 18, 2018

@samuelcolvin

Looking good initially, but quite a lot to change.

I'll probably have some more stuff I want fixing once this is resolved but hard to see without fixing my initial concerns.

If there's loads more to do I might submit a PR against this PR to save you some time.

Show resolved Hide resolved pydantic/main.py Outdated
Show resolved Hide resolved pydantic/schema.py Outdated
def get_flat_models_from_model(model: Type['main.BaseModel']) -> Set[Type['main.BaseModel']]:
flat_models: Set[Type[main.BaseModel]] = set()

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

we generally don't both with type annotations inside methods.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done. I removed the others too.

from .utils import clean_docstring
def get_flat_models_from_model(model: Type['main.BaseModel']) -> Set[Type['main.BaseModel']]:

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

is main.* enough for mypy or ids to work out what BaseModel is? Surely pydantic.BaseModel or pydantic.main.BaseModel is required?

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

also if we can import BaseModel, just use that rather than a string.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

I can't import it directly as these functions are imported and used inside of BaseModel. I had to use that "trick" from PEP 484.

If you want I can change it to:

def get_flat_models_from_model(model: Type) -> Set[Type]:

But for it to work it needs the models to be Pydantic, i.e, to have __fields__. That's why I annotated it like that.

Let me know what do you prefer.

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

you already import main above, so therefore surely you can use main.BaseModel or better import BaseModel from main and use it here.

This comment has been minimized.

@tiangolo

tiangolo Nov 21, 2018

Contributor

It can't be done with circular imports. main.BaseModel already depends on these methods, so BaseModel is not yet defined by the moment these methods are defined, as they are imported and used by main.BaseModel first, check the errors in CI: https://travis-ci.org/samuelcolvin/pydantic/jobs/457945798

From PEP 484: Forward references:

When a type hint contains names that have not been defined yet, that definition may be expressed as a string literal, to be resolved later.

A situation where this occurs commonly is the definition of a container class, where the class being defined occurs in the signature of some of the methods. For example, the following code (the start of a simple binary tree implementation) does not work:

class Tree:
    def __init__(self, left: Tree, right: Tree):
        self.left = left
        self.right = right

To address this, we write:

class Tree:
    def __init__(self, left: 'Tree', right: 'Tree'):
        self.left = left
        self.right = right

Later in the same section:

[...] The solution is to switch to module-only imports and reference the models by their module.class name:

# File models/a.py
from models import b
class A(Model):
    def foo(self, b: 'b.B'): ...
return flat_models
def get_flat_models_from_sub_fields(fields) -> Set[Type['main.BaseModel']]:

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

rename to get_flat_models_from_fields and reuse in get_flat_models_from_model

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done

if field.type_.le is not None:
f_schema.update({'maximum': field.type_.le})
# Sub-classes of str must go before str
if issubclass(field.type_, EmailStr):

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

again some kind of iterable and loop would be much better here.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done

from .types import DSN, UUID1, UUID3, UUID4, UUID5, DirectoryPath, EmailStr, FilePath, Json, NameEmail, UrlStr
from .utils import clean_docstring

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

best to put all these "helper" methods at the bottom of the file and the "primary" methods (those people might want to read first) at the top.

Also, you should probably implement __all__ here.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done

@@ -0,0 +1,5 @@
from pydantic import BaseModel

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

please remove all this and put it into the tests.

If you really need some python modules to test use tmpdir and write the files required within the test.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done

}
def test_date_types():

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

all these multiple type tests would be better rewritten as parameterised tests - if one case here fails it'll be a lot easier to work out which one an fix if they're 4 separate tests.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

Done

class Pizza(BaseModel):
name: str
ingredients: List[Ingredient]

This comment has been minimized.

@samuelcolvin

samuelcolvin Nov 18, 2018

Owner

do we really need all these models to test the functionality? Unit tests should be as simple as possible while testing the desired behaviour.

This comment has been minimized.

@tiangolo

tiangolo Nov 18, 2018

Contributor

I removed one of the submodels. But having at least two models with submodels would verify that the submodels are included not only for the first model in the list. Let me know if you still want me to reduce more.

Show resolved Hide resolved HISTORY.rst Outdated
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 18, 2018

Byt the way,

  1. You can run make format to call black and isort and fix all formatting, best to do before every commit.
  2. you can rerun tests from within travis, also with empty commits useing --allow-empty

@samuelcolvin samuelcolvin force-pushed the samuelcolvin:master branch from 2e10b34 to 9ff946d Nov 18, 2018

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 18, 2018

@tiangolo so far I have 51 emails from you (via github) in the last few days.

Please get the PR to the point where you'd like me to review it and let me know.

I don't need replies to every single code comment and every line changed doesn't need its own commit.

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 18, 2018

  1. You can run make format to call black and isort and fix all formatting, best to do before every commit.
  2. you can rerun tests from within travis, also with empty commits useing --allow-empty

Thanks for the tips.

@tiangolo so far I have 51 emails from you (via github) in the last few days.
Please get the PR to the point where you'd like me to review it and let me know.
I don't need replies to every single code comment and every line changed doesn't need its own commit.

Sorry for that. Now I tried to avoid pushing before finishing with the rest of the review.


I see now that there are new conflicts with HISTORY.rst and pydantic/fields.py. Should I rebase or merge master?

Edit: I see you requested a rebase in another PR so I did that.

@tiangolo tiangolo force-pushed the tiangolo:json-schema-flat branch from 517bad9 to a73818e Nov 18, 2018

tiangolo added some commits Nov 18, 2018

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 18, 2018

OK, that's it for now.

Python 3.7-dev keeps failing with some random race condition or something that comes from time to time. I could try re-pushing empty commits to try and make it run again, but I guess I that would trigger more emails. And we non-owners don't see the "Re-start build" button in Travis CI.

I think I already implemented all your requested changes.

If something else is missing or should be changed, let me know. I'll (hopefully) check it in the next days. But it's almost midnight here and I can't "extend" the weekend more. ...and my wife is not loving the project right now 馃槀

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 21, 2018

@samuelcolvin let me know if there's anything missing.

samuelcolvin added some commits Nov 22, 2018

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 22, 2018

frustratingly although it should let me, github isn't letting me push to your branch and therefore make changes to the PR.

Instead, I've submitted a PR tiangolo#1 to your repo on this branch. If you could review that, ask any questions and merge it.

If that's all ok, I think we'll be ok to merge this.

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 22, 2018

frustratingly although it should let me, github isn't letting me push to your branch and therefore make changes to the PR.

Strange...

Instead, I've submitted a PR tiangolo#1 to your repo on this branch. If you could review that, ask any questions and merge it.

Awesome! I'll check it right away.

samuelcolvin and others added some commits Nov 22, 2018

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 22, 2018

I changed the links to the specs. If you want, check them.

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 22, 2018

I'm sure they're right, are you happy for me to merge this?

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 22, 2018

Yes! 馃尞

@samuelcolvin samuelcolvin merged commit 94706bc into samuelcolvin:master Nov 22, 2018

3 checks passed

codecov/project 100% (+0%) compared to 19ec7c6
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented Nov 22, 2018

I'll release in a couple of days. If anyone has any further feedback on this, let us know here.

@tiangolo

This comment has been minimized.

Contributor

tiangolo commented Nov 22, 2018

Awesome, thanks. 馃帀

I'll start working on a PR to integrate validation and schema annotations passed to the Schema class.

@tiangolo tiangolo deleted the tiangolo:json-schema-flat branch Nov 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment