Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict configuration #1098

Open
maxrothman opened this issue Dec 15, 2019 · 32 comments
Open

Strict configuration #1098

maxrothman opened this issue Dec 15, 2019 · 32 comments
Labels
feature request strictness
Milestone

Comments

@maxrothman
Copy link

@maxrothman maxrothman commented Dec 15, 2019

Feature request: strict mode configuration flag

This issue has been a sensitive one in the past, so I'm trying to tread lightly here. Pydantic's position so far has been that because it's a parsing, not a validation library, fields should coerce their values to the specified type if possible (e.g. a field type-hinted with float given "1" coerces it to 1), and that Strict* fields (e.g. StrictFloat, StrictBool, …) are available if users prefer to use them. Requests for a way to make default types behave strictly have been closed due to implementation concerns.

However, the codebase has evolved significantly since #578, #360, and #284 were closed, and I think that some of the earlier difficulties in building this feature are no longer present today.

In #284, the reason given for not including a strict mode config was that pydantic would no longer be able to just call float(x) and pass the errors along, and that there are edge cases around bools and ints. But in 4f4e22e, the validator for float types became validators.float_validator() rather than float(), strict_int_validator() was added in 1b467da and distinguishes between bools and ints, and on master, float_validator() actually does extra work compared to strict_float_validator() to accept non-float values.

As far as I can tell, the only thing required to implement this flag now would be to build validators._VALIDATORS conditionally for each model based on a config value, using the strict_* validators rather than the standard ones when the flag was set.

There are several different ways this feature could be implemented: as a global flag, as another property on the model config, etc. I don't have strong opinions on this topic, since in my code I would probably enable this configuration universally.

Thoughts? The lack of this feature is the only thing keeping me from strongly recommending the use of Pydantic in my workplace, and it's clearly a feature that others are interested in having as well, as evidenced by the prior requests. I would be interested in contributing a PR for this change if you're interested in pursuing it.

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 16, 2019

Could you describe the use case you have in mind for this feature? I'm trying to understand whether your goal is closer to using it for runtime type-checking when instantiating BaseModel subclasses, or if it is specific to the parsing process (e.g., you always want to raise errors when parsing a json payload containing "1.0" instead of 1.0).


I would be in favor of supporting this if we could come up with a simple implementation with minimal impact on existing logic. I can't tell right now from just thinking about it whether I think the approach you have described would meet the bar in terms of simplicity; @samuelcolvin might have different opinions.


One other possible approach might be to introduce a generic type called Strict that specifically just adds an issubclass check. We could probably tweak the mypy plugin so that Strict[MyType] is treated by mypy as an alias for MyType. (It would still make sense to keep StrictInt and StrictFloat since they offer constraint functionality.)

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Dec 17, 2019

My use case is that I'm planning on using Pydantic's JSON Schema integration to autogenerate clients for my frontend applications. In that setup, it should theoretically be impossible for my backend to receive "1.0" instead of 1.0, so if it does, it indicates that there's a weird bug in the frontend, and I want it exposed quickly and loudly.

I'm not a huge fan of the Strict generic type approach, because I'd still have to wrap it around the types for all of my fields. I'd rather set a global config flag or set a class-level flag on a base class and inherit from it.

Sounds like you're at least open to seeing a PR? Since you indicated others might have differing opinions, I'll wait a few days, and if nothing changes, I'll give it a shot and we can go from there?

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 17, 2019

For what it's worth, I think you could implement what you are describing through the use of a shared base class:

from typing import Any, Type, Callable
from pydantic import BaseModel


class PossiblyStrictModel(BaseModel):
    def __init_subclass__(cls):
        if getattr(cls.__config__, "strict", False):
            def __init__(__pydantic_self__, **data: Any) -> None:
                for k, v in data.items():
                    field = __pydantic_self__.__fields__[k]
                    if not isinstance(v, field.type_):  # not right for container types
                        raise TypeError(f"Received argument {v!r} of incorrect type for field {field}")
                super().__init__(**data)
            cls.__init__ = __init__


class Model(PossiblyStrictModel):
    a: int
    class Config:
        strict = True

Model(a=1)
Model(a="1")  # error; stops erroring if you set Config.strict = False

I'm not super excited about the prospect of officially supporting this in pydantic because 100% correctness would probably come with a large number of edge cases to test, and an on-going burden of ensuring both approaches work for every possible field type.

I'm certainly open to seeing a PR, I just think acceptance would be dependent on reaching a relatively high bar of simplicity in order to ensure it doesn't increase the on-going maintenance burden, and doesn't result in discouraging future feature work because of backwards compatibility challenges. If you want to take a shot, I'll review it. @samuelcolvin tends to have strong opinions on these sorts of issues though so you might want to wait for his response first.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 18, 2019

Hi, sorry for not replying earlier. Thanks for submitting the request and taking the time to explain why you need it.

I'm basically pro implementing this if it's possible for the following reasons:

  • people want the feature. Too many open source projects get supercilious about refusing what they perceive to be "bad" feature requests, while ignoring that people wanting a feature is a reason in itself, e.g. psf/black#118.
  • it would resolve the confusion and resulting caveats about what pydantic is/does.
  • I'm intending to work on a type checking decorator as per #347, and to me it would be very confusing if a type checking decorator silently coerced arguments.

One point where I think you're incorrect @maxrothman is that validating an input to a float field, does just call float(v) inside float_validator that hasn't changed significantly for a long time.

In terms of implementation, I think the simplest solution would be to simply call isinstance() before all values are validated. I think that would be simpler to implement, reason with and maintain than switching validators._VALIDATORS.

We'd need to store the expected type on a field, and might need some special logic for cases like booleans, but I think that's doable.

This would make strict mode slightly slower than normal mode, but I think that's bearable, if that becomes a problem I can think of a number of ways of working around it.

In terms of a switch to enable this, I think a config flag would be the easiest solution, we could even have a StrictBaseModel class which just sets that config flag.

Thoughts?

@Pastromhaug
Copy link

@Pastromhaug Pastromhaug commented Dec 18, 2019

I'm going to second that having a strict mode would be very useful. I would like to default to strict while allowing certain type coercions when specified.

To illustrate, yesterday I spent some time debugging a unit test that was broken in a non-obvious way. It turned out that Pydantic was coercing a UUID to an int 😐. In practice, we almost always know exactly what the type should be and would like to enforce that.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 18, 2019

For the foreseeable future (at least until another major release) strict mode will not be the default. But usage should be as simple as changing your import to from pydantic import StrictBaseModel as BaseModel.

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Dec 18, 2019

@samuelcolvin thank you for being open to this proposal! Your approach towards this ("bad") request is an exemplar of great open source management.

I'm not sure I understand how the isinstance checks of "strict mode" would be slower than the default mode, most of the validators already call isinstance. Am I missing something?


Having given this a little more thought, I've realized a slight problem in the straight-up "isinstance" approach: it'll result in a worse dev experience when deserializing complex objects (e.g. datetimes). To pass the isinstance check, you'd have to deserialize the datetime before passing it to the model, losing the benefits of Pydantic's unified approach to validation. I suspect this is why @Pastromhaug suggested allowing certain coercions.

To fix that, I think we need a split interface: one for instantiating model instances internally, and one for deserializing external data.

The deserialization API would be strict-ish, in that the wire-types of certain Python types would be the same type in JSON. For example UUID and str would both be represented as JSON strings. The key difference is that there'd be only 1 valid (de)serialization for each type, such that the following invariant holds:

forall x: x == serialize(deserialize(x))

This could mostly be implemented by using the existing strict_* validators and basic isinstance checks, but there'd be some special cases to sort out. For example, datetimes currently accept multiple formats, which would break the above invariant. Defaulting to a specific format (probably ISO-8601) would resolve the issue (the format could be overridable somehow, maybe via Field()).

There's already a parse_obj method on models, I think it'd be a natural fit for this deserialization API.

The interface for instantiating instances internally could just be __init__. I'd even potentially support doing no runtime typechecking in this interface, since mypy should be able to do it statically. If people wanted to do runtime typechecking, it could be added via a separate feature, such as the decorator @samuelcolvin's proposed.


With all that out of my brain, here's my proposal:

Add a strict flag to model config. This flag changes the behavior of BaseModel.__init__ and BaseModel.parse_obj (or enables some other method, I'm not married to the parse_obj part).

  • __init__: disables coercion, this method becomes a dumb constructor
  • parse_obj: performs strict deserialization, such that forall json_value, Model: json_value == Model.parse_raw(json_value).json

Some method is built to deal with types that have multiple useful serializations. Looking through _VALIDATORS, only the datetime types seem to be the issue, so it might be easy enough to just add a format parameter to Field().

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 18, 2019

I'm not sure I understand how the isinstance checks of "strict mode" would be slower than the default mode, most of the validators already call isinstance

I think @samuelcolvin's point is that the isinstance checks would be run in addition to all the usual validation checks, so it would be strictly slower.

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 18, 2019

To fix that, I think we need a split interface: one for instantiating model instances internally, and one for deserializing external data.

This awkwardness has come up various times recently. I think the problem boils down to different "degrees" of desired parsing, and the fact that there are a huge number of edge cases and desired behaviors to handle if you try to go beyond "parse everything" and "parse nothing".

For what it's worth, if you are "instantiating model instances internally", there is the BaseModel.construct method which will perform no validation whatsoever, and I've personally seen it run >30x faster than BaseModel.__init__ even for small models (with some nested lists and dicts).


__init__: disables coercion, this method becomes a dumb constructor

This was the reason that BaseModel.construct was created, and is probably a better fit for this case if you are trying to accomplish disabled validation for performance reasons.

(For what it's worth, I totally think it would make sense to offer runtime type-checking of BaseModel.construct via config flag during testing, and it could be done in the metaclass to ensure zero runtime cost if not used, but that's a separate feature request. If anyone wants it please open an issue!)


I could see an argument being made for having a config setting that lets you toggle whether __init__ behaves more like parse_obj or more like construct. In practice, I think this could amount to having a config settings for disabling the "default" (i.e., for built-in types, or pydantic-provided) and/or "custom" validators (i.e., user defined using @validator) in calls to __init__.

This would retain the ability to use custom validators to parse missing fields, perform non-idempotent transformations, etc., and would make it easy to turn on the usual validators for testing, but would also let you get closer to raw BaseModel.construct speed in production by disabling the "default" validations via config where you are confident it is safe due to your testing.

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Dec 18, 2019

Oh, I didn't know about BaseModel.construct.. Is it type-checkable by mypy? If so, then it definitely serves the role I intended for __init__ in my proposal. I don't have strong feelings about what __init__ actually does, I'm more concerned about the separation between the two, which it seems like already exists.


My proposal again, but edited taking the above into account:

Add a strict flag to model config. This flag changes the parsing behavior of BaseModel (whether in __init__, parse_obj, or otherwise; I don't have strong feelings about the interface).

The parsing behavior would be altered to perform strict deserialization, such that forall json_value, Model: json_value == Model.parse_raw(json_value).json

Some method is built to deal with types that have multiple useful serializations. Looking through _VALIDATORS, only the datetime types seem to be the issue, so it might be easy enough to just add a format parameter to Field().

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 18, 2019

Yes, it can be properly type checked, but you have to use the pydantic mypy plugin (I believe the latest release still requires the use of mypy 0.740; hopefully we can push a new release soon to be compatible with latest).

Docs here.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 19, 2019

I'm pretty opposed to changing the behaviour of __init__, even via a config flag.

I think one of the reasons pydantic is popular is that for many cases it "just works" - it's also (I think) relatively easy to use for inexperienced developers. I'm opposed to changes which increase the cognitive burden of getting started with pydantic, even if it helps some people.

Looking through _VALIDATORS, only the datetime types seem to be the issue

I'm afraid that's not the whole story, there's also all the custom pydantic types, I count 33 there. Each would need consideration, tests and in some cases modification.


I think to keep things simple we should have two modes:

  1. The current "coercion inclined" mode. With strict types when you want to constrain things.
  2. A new "strict" mode (used via StrictBaseModel), where all type checks are "hard", e.g. use isinstance(v, field_type) or in some cases v.__class__ == field_type

Then we add two generic types (or at least types that can be parameterised as Json can now):

  • Strict[], usage: Strict[int], equivalent of StrictInt, which can be used in "coercion inclined" mode to make a field strict - I think this was discussed somewhere but I can't find the issue right now.
  • Coerce[], usage: Coerce[int], equivalent of int in normal mode, which can be used in strict mode to disable the strict check and therefore act like pydantic does now.

Does that make sense?

@dmontagu do you think Strict[] and Coerce[] can me made to play nicely with mypy via the plugin (or perhaps without the plugin)?

@koxudaxi do you think Strict[] and Coerce[] can me made to play nicely with pycharm via the plugin?

There are still some weird cases like AnyUrl, where I imagine people would expect a string to work, but technically strict mode would refuse a string since it's not an instance of AnyStr, do we make a few exceptions even in strict mode for things like this?


If you want a way to parse datetime's from ISO-8601 strings but nothing else, either:

  • you can implement your own custom type
  • or we could add a IsoDateTime type to pydantic, please create a separate issue

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Dec 19, 2019

@samuelcolvin I agree with the point about having it "just work", not to mention how much pain would be involved in migrating if we fundamentally changed the behavior of __init__ 😄.


I'm fine with the idea of StrictBaseModel and Strict/Coerce.

In some ways I think it could actually simplify the mypy plugin, potentially allowing us to remove the "strict" config setting since it would be specified on the actual object.

That said, I also think it would be a substantial amount of work to get it working with mypy, but probably not more than it would take to carefully and correctly implement the feature anyway.

@koxudaxi
Copy link
Sponsor Contributor

@koxudaxi koxudaxi commented Dec 20, 2019

@samuelcolvin
I think great to treat StrictBaseModel Strict/Coerce with the pycharm plugin.
Because the plugin can inspect values, and type-checking comfortable with theses class.

I agree with your thoughts. I desire pydanitc is a simple design.
I'm afraid to decrease the performance of the plugin by individual cases.

BTW, I feel Coerce is a problematic word on non-native English speakers. I know type coercion is ordinary words in computer science.
I guess a lot of people use Python and Pydantic, who might not export. Also, I understand a lot of non-native English speakers use Pydanitc.

Of course, I know Cast is not a perfect word choice.
If It is implemented as Coerce then, we should write documents easy to understand everyone.

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Dec 20, 2019

I think that by using float as an example in my original posting I may have muddied the waters, and that suggesting changing how __init__ and parse_obj work added to the confusion, so I apologize for that. I think we're talking past each other a little here, let's take a step back and I'll reiterate my use case.

My goal is to use Pydantic to define domain types within my backend system that can be bridged to systems in other languages (e.g. Javascript) over a network boundary utilizing Pydantic's JSON Schema features. The idea is that I'll be able to define, say, a User that has a UUID, a hair color, and a number of shirts:

class HairColor(Enum):
    RED = auto()
    BROWN = auto()
    BLONDE = auto()
    BLUE = auto()

class User(BaseModel):
    uuid: UUID4
    hair_color: HairColor
    num_shirts: int = Field(..., le=100)

Thanks to Pydantic's JSON Schema feature, I can now use OpenAPI's codegen library to autogenerate a client library for Javascript. That way, it'll never even be possible for my frontend to send an invalid hair color or an illegally-large value for num_shirts over the wire. Those constraints might be critical to my system, and by ensuring that all of the domain types are correct-by-construction, and through the use of type checkers like Typescript and Mypy, I can extend the type-safety of those constraints across those network boundaries and prove important properties about my system.

Basically, I can make the shape of my objects over the wire an implementation detail and pretend the network boundary doesn't exist for the purposes of my application logic, which greatly simplifies development while improving correctness.

However, imagine that a weird bug appeared in my frontend application that somehow circumvented my autogenerated client and put some other value into num_shirts that happened to be 1.2, but in fact represented some other type entirely. As-written, the models above would silently ignore that error, coercing 1.2 : float into 1 : int. Since it passes the maxImum constraint accidentally, this bug could persist indefinitely and I could end up with a nasty data corruption problem. I could use StrictInt, but then I'd have to remember (and get my whole team to remember) to always use StrictInt instead of the more natural int.

Regardless of whether I use int or StrictInt, I can deserialize Users as suggested in the Pydantic documentation:

>>> data == '{"uuid": "f84ede9d-fb19-4f35-8223-a209a858df57", "hair_color": 1, "num_shirts": 5}'
>>> User(**json.loads(data))
User(uuid=UUID4(f84ede9d-fb19-4f35-8223-a209a858df57), hair_color=HairColor.BROWN, num_shirts=5)

So in essence, I'm looking for a (de)serializer that's capable of easily parsing wire formats into type-safe domain types, including non-primitive types (such as the UUID in the example above), and that can parse without information loss. I have no need for a runtime type checker, and if I did, either I could write one, or I could use one of the other projects on PyPI that provides that functionality. Pydantic is almost what I'm looking for, except that it is lenient when parsing certain types (e.g. bool, float, datetime) in such a way that information loss can occur, which could hide bugs (which would be bad).

A class-level flag (or a different base class) that made non-strict types act like strict types would fulfill my use case: I'd still be able to parse complex types from raw data easily, and the risk of information loss would be removed. A flag that replaced the non-strict parsers with isinstance checks would remove my ability to parse complex types:

data = {"uuid": "f84ede9d-fb19-4f35-8223-a209a858df57", "hair_color": 1, "num_shirts": 5}
>>> User(data)
Error: uuid is not of type UUID4, hair_color is not of type HairColor
>>> # I'd have to take extra steps to parse this type:
>>> User(uuid=UUID4(data['uuid'], hair_color=[k for k in HairColor if k.value == data['hair_color']][0], num_shirts=5)
User(uuid=UUID4(f84ede9d-fb19-4f35-8223-a209a858df57), hair_color=HairColor.BROWN, num_shirts=5)

In short, I still want Pydantic to be a parsing library, I just want to be able to configure it to be pickier.

Obviously this wouldn't achieve my desire to have all parsing comply with forall x: x == serialize(deserialize(x)) immediately: all the datetime types have more lenient parsers, and I suspect there are other edge cases. But strict versions of those types could be added over time, and their addition is orthogonal to the ability to switch between the lenient and strict versions with a class-level flag.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 20, 2019

@maxrothman I understand where you're coming from but what you're asking for is a very specific mixture of strictness and coercion.

There are hundreds of potential cases where it would be reasonable to be strict in one case and lenient in another, for example:

  • should a tuple be coerced to a list?
  • should an int-valid string be coerced to an int? Or to a float?
  • should a string be coerced to a path?
  • Should a path be coerced to a str?
  • should list of pairs be coerced to a dict?
  • should a set be coerced to a frozen set but not to a list?
  • ... and on and on

These questions aren't specific to pydantic or even python, javascript and ruby grapple with them too. Remember this?

I suspect our misunderstand comes from the following: you're using JSON to transfer data. You therefore want to coerce types when there's no JSON equivalent type (e.g. string to UUID or str to datetime) but not between JSON types (e.g. from float to int or str to float).

The problem is that pydantic isn't just used for http API's and therefore the input data isn't always JSON. Look at popular projects using pydantic:

url stars JSON/API
https://github.com/tiangolo/fastapi 6909 yes
https://github.com/uber/ludwig 6100 no
https://github.com/awslabs/gluon-ts 827 no
https://github.com/nornir-automation/nornir 484 no
https://github.com/BGmi/BGmi 266 no
https://github.com/samuelcolvin/arq 233 no
https://github.com/gammapy/gammapy 104 no
https://github.com/MolSSI/QCFractal 81 no
https://github.com/awesometoolbox/ormantic 70 yes?
https://github.com/Magdoll/cDNA_Cupcake 68 no

Most of them are much closer to runtime type checking than HTTP APIs. Or more correctly: pydantic is generally used to "secure" data types at code boundaries, but those boundaries take numerous forms and we can't make assumptions about the types people will be passing across them.


I see a few options here, I'm genuinely not sure which is best:

  1. Change "strict mode" to "slightly stricter mode" and default to coercion if the field type is not one of the basic JSON types, but default to strictness otherwise. You could still use Strict[] if you wanted to enforce full type checks.
  2. Have both "stricter" and "strict" mode
  3. You implement what you want using a wildcard validator and ignore StrictBaseModel (partially implemented here to demonstate the idea, not copying the code here since this conversation is already too verbose)

Edit. Poll of users and what they want:

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 20, 2019

"Plus one" me if you want full strictness; effectively isinstance(input_value, field_type) - close to the type checks on statically typed language.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Dec 20, 2019

"Plus one" me if you want partial strictness;

  • coerce from JSON types to more "complex" python types, e.g. string to UUID
  • but validation errors if you pass the wrong JSON type, e.g. float > int

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Dec 21, 2019

Ahh, I understand now. I hadn't made the connection that Pydantic was agnostic as to the wire format. I can see now why my original request doesn't make much sense, stable de/serialization is a problem that can only really be solved for specific formats.

In light of that, I definitely don't support a class-level "strictish" config flag now. I don't have an opinion about a truly strict (isinstance) flag since it doesn't solve my use case. What I would be interested in seeing would be a flag on a BaseModel method that parses raw JSON that could flip it into a stable/strict mode. I'd be hesitant to put it on parse_raw since that does double-duty for pickle as well as JSON, but I could imagine a parse_json method with a strict flag that swapped in a different _VALIDATORS list. If you're not interested in building that into Pydantic directly, I'd also be open to buliding it as a companion library, provided you'd be open to some refactoring to make such a library possible. I could imagine other similar libraries for other formats, such as form-encoded data, URL-encoded data, XML, etc.

Thoughts?

@virtuald
Copy link

@virtuald virtuald commented Dec 31, 2019

Ignore my last comment, apparently that's a different strictness that is already supported. It's late. I'm tired. This is a pretty cool library though, thank you.

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Jan 2, 2020

Happy new year everyone! Just wanted to get this discussion moving again. Any thoughts on my last comment @samuelcolvin @dmontagu?

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jan 2, 2020

If sounds like you want the partial strictness I suggested above.

parse_raw is just decoding followed by parse_obj, I don't what that to change and do more complex things like switching validators since that could be very confusing.

Out of interest, what's wrong with the wildcard validator approach partially demonstated here? I know it'll be slightly slower than proper validators, but I very much doubt you'd notice the difference.

You might also be interested in #1130 (comment)

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Jan 3, 2020

Out of interest, what's wrong with the wildcard validator approach partially demonstated here? I know it'll be slightly slower than proper validators, but I very much doubt you'd notice the difference.

It'd get me part of the way there, but it would depend on validators/parsers for complex types working the way I'd want them to with their JSON representations. I'd end up with a case for every type, which would start to look a whole lot like the _VALIDATORS dict.


I think my most recent comment might've gotten a bit lost in the shuffle, so I'm going to lay out exactly what I am and am not suggesting:

What I'm NOT suggesting

  • Changing Pydantic's parsing behavior, regardless of which flags are enabled
  • Adding any configuration flags, class-level or otherwise
  • Modifying the behavior or parse_obj, parse_raw, or any other methods, regardless of which flags are enabled

What I am suggesting

  • Adding a new method to BaseModel, named something like parse_json
  • The method does de/serialization such that forall x: serialize(deserialize(x)) == x, probably by swapping in a different _VALIDATORS
    • The "strict" parsing behavior could be hidden behind a kwarg if necessary

My questions

  • Are you interested in having this feature in Pydantic?
  • If not, would you be open to the refactoring PRs that would enable this feature to be built as an external companion library?

Idle thoughts:

  • One could imagine other similar libraries for other formats, e.g. form encoding, URL encoding, xml, etc.
  • Could be an opportunity for Pydantic to become a parsing platform

Thoughts?

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jan 3, 2020

Are you interested in having this feature in Pydantic

Yes I want the functionality that I think you're looking for. Hence why I'm spending time on this issue, if I just didn't want it, I'd close the issue.

It's just that we disagree about how to go about making it available, and what work might be required.

I'm afraid I don't like the idea of parse_json. It's another decision for people when they starting using pydantic: "do i use Model(**data) or Model.parse_obj(data) or Model.parse_raw(body) or Model.parse_json(body)?" there are already too many options probably.

I would accept:

  • an alternative base model e.g. StrictBaseModel
  • a config flag
  • reusable wildcard validators, perhaps accessible via a short cut of an alternative base model
  • a refactor of pydantic's internals to make custom functionality easier to implement, provided:
    • it didn't effect performance adversely
    • it didn't add significant extra complexity to the library and thereby an extra long term maintenance burden
    • it didn't change functionality at all or we waited until a new major version

forall x: serialize(deserialize(x)) == x

Making pydantic parsing/serialization idempotent is a whole different story. It's heavily related to #951 #624 and the whole way we do serialisation, it's another subject altogether, I have some ideas I want to work on quite closely related to #317 but I haven't had time yet. I already spend too much time on pydantic, for example I've spent multiple hours on this issue alone: reading, replying, thinking, writing gists.


I'd end up with a case for every type

I don't think that's true. If I understand you correctly, your input data is JSON so it can only have 7 types, you want to make sure data is the correct type of those 7. This should be possible in a relatively short function.


Please have a go at implementing this as a validator and come back when it's working or not working. More verbose discussion on this issue without trying something isn't moving things forward.

@samuelcolvin samuelcolvin added the strictness label Jan 3, 2020
@Pastromhaug
Copy link

@Pastromhaug Pastromhaug commented Jan 8, 2020

I think to keep things simple we should have two modes:

  1. The current "coercion inclined" mode. With strict types when you want to constrain things.
  2. A new "strict" mode (used via StrictBaseModel), where all type checks are "hard", e.g. use isinstance(v, field_type) or in some cases v.__class__ == field_type

Then we add two generic types (or at least types that can be parameterised as Json can now):

  • Strict[], usage: Strict[int], equivalent of StrictInt, which can be used in "coercion inclined" mode to make a field strict - I think this was discussed somewhere but I can't find the issue right now.
  • Coerce[], usage: Coerce[int], equivalent of int in normal mode, which can be used in strict mode to disable the strict check and therefore act like pydantic does now.

This sounds like a really elegant solution to the problem to me. I'd be pretty excited if this got implemented!

@maxrothman
Copy link
Author

@maxrothman maxrothman commented Jan 9, 2020

Are you interested in having this feature in Pydantic

Yes I want the functionality that I think you're looking for. Hence why I'm spending time on this issue, if I just didn't want it, I'd close the issue.

I'm sorry if I came off as frustrated, there's no body language on the internet 😅 Since my most recent proposal departed fairly significantly from my original proposal, I want to make sure you were still on board. It sounds like you are, which is great!

I'm afraid I don't like the idea of parse_json. It's another decision for people when they starting using pydantic: "do i use Model(**data) or Model.parse_obj(data) or Model.parse_raw(body) or Model.parse_json(body)?" there are already too many options probably.

I can understand that, it definitely would create a more confusing API. My thinking was that if you assume that a user wants stable/idempotent/whatever-you-want-to-call-it parsing, then it's impossible to support that without having a way to designate which format you want to de/serialize into. You could put that designation on the model, but that'd preclude the ability to have multiple serializations of the same object: to JSON, to querystring, to form-encoding, etc. That's why I ended up on named methods, it's the only design I could think of that supports both stable serialization and multiple serialization formats on a single model.

In any case, I thought there was a chance that the API change would be both niche enough and inconvenient enough to most users that it'd make sense to split it out into a separate library, hence that suggestion.

I would accept:
...

  • a refactor of pydantic's internals to make custom functionality easier to implement, provided:
    • it didn't effect performance adversely
    • it didn't add significant extra complexity to the library and thereby an extra long term maintenance burden
    • it didn't change functionality at all or we waited until a new major version

Ok, so it sounds like the companion library option is definitely on the table. Great! I would have those same requirements too, so we're on the same page. To be clear, if I did go for the companion library approach, it'd be to expand the Pydantic ecosystem while have a way to experiment without affecting the majority of users. A fork, hostile or otherwise, is not an option for me.

It also sounds like you're interested in tackling stable de/serialization at some point, which is also great!


I think my next step will be to attempt a proof-of-concept to get a feel for the design space, then check in again on where Pydantic is with stable de/serialization. I'm more than happy to prove a concept out independently and present it for inclusion once it's more well-developed.

In any case, it sounds like stable de/serialization a problem you're thinking about, so we'll just see who gets there first 🙂

@francoposa
Copy link

@francoposa francoposa commented Feb 15, 2021

This would be wonderful, especially being able to specify by field.

I am fine if a "1" is a coerced into a 1 for a integer field, but there is no way I want a "1" coerced into a datetime field. I don't see myself using Pydantic for anything until I can prevent random numbers from becoming datetimes

@PrettyWood
Copy link
Collaborator

@PrettyWood PrettyWood commented Feb 15, 2021

I've posted a possible workaround here #2079 (comment).
You can create a custom generic Strict type that is compliant with mypy and should work in most cases

from types import new_class
from typing import Any, Callable, Generator, Generic, TypeVar, cast

from pydantic.utils import display_as_type
from typingx import isinstancex

T = TypeVar("T")


class Strict(Generic[T]):
    __typeform__: T

    @classmethod
    def __class_getitem__(cls, typeform: T) -> T:
        new_cls = new_class(
            f"Strict[{display_as_type(typeform)}]",
            (cls,),
            {},
            lambda ns: ns.update({"__typeform__": typeform}),
        )
        return cast(T, new_cls)

    @classmethod
    def __get_validators__(cls) -> Generator[Callable[..., Any], None, None]:
        yield cls.validate

    @classmethod
    def validate(cls, value: Any) -> Any:
        if not isinstancex(value, cls.__typeform__):
            raise TypeError(f"{value!r} is not a valid {display_as_type(cls.__typeform__)}")
        return value

Then use it wherever you want

class M(BaseModel):
    x: int
    x_strict: Strict[int]
    y: List[Dict[str, int]]
    y_strict: Strict[List[Dict[str, int]]]


M(x='1', x_strict='1', y=[{'a': 1}, {'b': '2'}], y_strict=[{'a': 1}, {'b': '2'}])
# pydantic.error_wrappers.ValidationError: 1 validation error for M
# x_strict
#   '1' is not a valid int (type=type_error)
# y_strict
#   [{'a': 1}, {'b': '2'}] is not a valid List[Dict[str, int]] (type=type_error)

But as Samuel explained, "strictness" depends a lot on types. For example here, setting y=True will work since isinstance(True, int) is True. So it would probably require extra code for some types and edge cases.

Anyway hope it helps! 😄

@MVrachev
Copy link

@MVrachev MVrachev commented Mar 9, 2021

@samuelcolvin what is the stage of this issue?

@PrettyWood PrettyWood mentioned this issue Mar 11, 2021
4 tasks
@amirfru
Copy link

@amirfru amirfru commented Mar 14, 2021

Hi, I've prepared a tiny PR #2509 which adds an option to treat any int field as StrictInt, Basically so that if someone puts into an int configuration a float, instead of clipping the value to the integer part, it will raise an exception.
This lowers the overhead of going through each configuration line and explicitly set the types of the ints into StrictInts, especially since in my projects I tend not to explicitly set the type of each parameter in the config, but let Pydantic infer the type from the input.

MVrachev added a commit to MVrachev/tuf that referenced this issue Mar 15, 2021
Strict types are available in pydantic:
https://pydantic-docs.helpmanual.io/usage/types/#strict-types
it's just sad there is no class-wide strict mode implemented yet:
See: samuelcolvin/pydantic#1098

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@PrettyWood PrettyWood added this to the Version 2 milestone Aug 29, 2021
@Varriount
Copy link

@Varriount Varriount commented Sep 13, 2021

Rather than provide a "strictness" flag for Pydantic, a more flexible approach would be to allow overriding the "default" set of validators/converters used by Pydantic, which are currently hardcoded. This would reduce the need for one-off additions to Pydantic's configuration for default validation characteristics such as maximum string length, whitespace stripping, etc.

@erlendvollset
Copy link

@erlendvollset erlendvollset commented Jan 11, 2022

Sharing this workaround I made in case it might come in handy for anyone else.

from typing import Any

import pydantic.errors
import pydantic.validators

def _strict_bool_validator(v: Any) -> bool:
    if v is True or v is False:
        return v
    raise pydantic.errors.BoolError()

for i, (type_, _) in enumerate(pydantic.validators._VALIDATORS):
    if type_ == int:
        pydantic.validators._VALIDATORS[i] = (int, [pydantic.validators.strict_int_validator])
    if type_ == float:
        pydantic.validators._VALIDATORS[i] = (float, [pydantic.validators.strict_float_validator])
    if type_ == str:
        pydantic.validators._VALIDATORS[i] = (str, [pydantic.validators.strict_str_validator])
    if type_ == bool:
        pydantic.validators._VALIDATORS[i] = (bool, [_strict_bool_validator])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request strictness
Projects
None yet
Development

No branches or pull requests