Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StrictBool #579

Closed
cazgp opened this issue Jun 5, 2019 · 36 comments
Closed

StrictBool #579

cazgp opened this issue Jun 5, 2019 · 36 comments

Comments

@cazgp
Copy link
Contributor

@cazgp cazgp commented Jun 5, 2019

Thank you for bringing types into python -- it makes daily life so much easier :)

However, I think there's a pretty serious bug with the bool validator.

Minimal working example:

from pydantic import BaseModel

class Model(BaseModel):
    a: bool

Model(a=1) # should cause exception
Model(a="snap") # should cause exception

a should be a bool and only a bool. If a user provides any other value to the endpoint (using fastapi), the database throws because it's the wrong type.

I cannot think of a single type that can be passed to bool(X) that won't return a boolean, and yet bool_validator returns that coercion. e.g. bool({"moo": 1}) == True. This makes the validator completely useless :(

Is there a reason behind this? And if it needs to stay as is, do you think adding a strict mode to the validator is an option?

Also, as an aside, there are no unit tests for the function.

I'm very happy to make a PR (adding some unit tests as well!) because it shouldn't take long, but I need to understand if there's a reason for this first.

Thanks!

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 5, 2019

Yes there's a reason for this.

Please read #578, #284 and #360. You are not the first person to ask about this - pydantic is a parsing library and not a validation library.

There definitely are tests for this, hence the 100% coverage, here are 13 at least.

If you do not want this behaviour:

  • you can implement a validator with pre=True that only accepts a set list of things
  • you can implement your own StrictBool type.

Regarding your examples, since bool in python are a sub-type of int, I think most people would except 1 (and therefore probably '1') to be valid as a boolean value.

Other strings are trickier, in python bool('any string with length > 0') == True, but that's awkward since bool('false') == True, hence it's not trivial to decide which values are truey, eg. DRF and yaml accept values like yes.

I'd accept a PR to add support for StrictBool.

@samuelcolvin samuelcolvin changed the title Type coercion in bool validator is far too liberal StrictBool Jun 5, 2019
@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 5, 2019

There's also the case of non strings, most people would (I imagine) expect any positive int to be coersed to True, "special" values like -1 are trickier.

What about dictionaries and lists? Same behaviour as python? Or maybe same as javascript if we've parsed JSON - it's not that simple.

What about arbitrary class? I imagine most people would expect most of them to be truey like in python, but that can get tricky.

@cazgp
Copy link
Contributor Author

@cazgp cazgp commented Jun 5, 2019

Thanks for responding so quickly!

Thanks for pointing out the tests. I grepped an awful lot and couldn't find anything, and then was in test_validators for a very long time.

StrictBool sounds like it could be useful, however I still think that the coercion at the end is not expected behaviour. I cannot imagine that someone would expect {"test": 1} to be parsed as a bool. It really seems like that's user error which should be caught.

So, in your examples, I would say throw an error for all of them! Unless the value is a bool, or it matches a particular set of strings, I say reject it.

@cazgp
Copy link
Contributor Author

@cazgp cazgp commented Jun 5, 2019

Oh and 1 and 0 should probably be exceptions too

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 5, 2019

I can of understand that {"test": "john doe"} being parsed to {'test': False} would be confusing for a lot of people.

However this is a breaking change, and people don't like breaking changes.

So what do other people think?

In particular:

  • which strings should be true/false?
  • what should we do with numbers other than 0/1?
  • what should we do with lists and dicts, what about if they're empty? true/false/error?
  • what should we do with other classes? true/false/empty? Should we look for the __bool__ method?

@cazgp cazgp mentioned this issue Jun 5, 2019
4 tasks
@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jun 11, 2019

which strings should be true/false?

We could parse based on Yaml's boolean syntax spec;

Regexp:
 y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF

what should we do with numbers other than 0/1?

Error, or, if going with the above, not integers whatsoever

what should we do with lists and dicts, what about if they're empty? true/false/error?

Error

what should we do with other classes?

Error

@jasonkuhrt jasonkuhrt mentioned this issue Jun 11, 2019
5 tasks
@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jun 11, 2019

Oh and 1 and 0 should probably be exceptions too

Not 100% sure about this, but I get it. I think for a StrictBool type it is less acceptable than a bool type.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 11, 2019

Definitely 0 and 1 should be allowed.

And also the strings of those values.

@samuelcolvin samuelcolvin added this to the Version 1 milestone Jun 11, 2019
@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jun 11, 2019

Sure, and yeah the string variants too then

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Jun 11, 2019

My two cents regarding how I'd personally like to see these classes implemented:


  • For StrictBool:
    • bools pass validation (duh)
    • The only strings that don't raise validation errors are the yaml strings and "0" and "1"
    • The only ints that don't raise validation errors are 0 and 1
    • Everything else is a validation error -- if you use the StrictBool type, then you take responsibility for explicitly casting to bool.
      • I think checking for __bool__ would be okay, but if used, should probably require it to be explicitly implemented by the user (rather than inherited). But I don't know if there is a clean way to accomplish this, and would probably rather not have to worry about it.

  • For regular bool:
    • Follow standard python bool behavior except parse "0" and the yaml false strings as False.
    • (If this differs from the current implementation, e.g., other special cases are handled, I'm fine keeping it as is.)

I can of understand that {"test": "john doe"} being parsed to {'test': False} would be confusing for a lot of people.

Yeah, I would definitely vote that this case be treated as a validation error before it get parsed as False. True would be okay if parsed as a regular bool.

@cazgp
Copy link
Contributor Author

@cazgp cazgp commented Jun 12, 2019

Follow standard python bool behavior except parse "0" and the yaml false strings as False.

Standard python behaviour is to parse literally anything as True, so I really don't think that's sensible.

class Test:
    pass

def test():
    pass

bool(Test())       # true
bool({"test": 1})  # true
bool(test)         # true

It all but renders this type as useless IMHO.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 12, 2019

It all but renders this type as useless IMHO.

Happy to have a discussion and hear opposing views, but please let's not resort to hysterics. 99.9% of pydantic usage is in parsing data from "dumb" formats, eg. JSON, YAML, form data, URL arguments where classes and functions can't be defined, so parsing the above as True would manifestly NOT render the type "useless".


In answer to the question, I think @jasonkuhrt's answer plus 0, 1, '0', '1' is probably the best option.

Have a look here for how DRF does this.

@cazgp
Copy link
Contributor Author

@cazgp cazgp commented Jun 12, 2019

hysterics

That wasn't a wildly exaggerated and emotional reaction. I believe that a bool parser which permits nearly anything to be coerced into a boolean (particularly one named after a portmanteau of "pedantic") is not that useful. I think the behaviour is unexpected and leads to problems for people relying on the library to act as a type guard between their python functions. In my case, a non-boolean value was hitting the database, causing it to throw. Luckily it was caught in development.

Unfortunately github is unhappy with the DRF link.

Happy to go with @jasonkuhrt + 0, 1 as it seems straightforward and expected.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jun 12, 2019

Weird, the link works fine for me. It was line 636 and below in /rest_framework/fields.py.

@cazgp
Copy link
Contributor Author

@cazgp cazgp commented Jun 12, 2019

Sweet, that works now!

Yeah that looks sound.

I might not get a chance to do it this week, but happy to have a pop next week :)

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Jun 12, 2019

It all but renders this type as useless IMHO.

My reasoning here was that if StrictBool were an alternative option, then I would expect the non-strict version to just follow python's built-in behavior (seems like the least surprising option, and is consistent with how str vs StrictStr works). If StrictBool were an option, and I cared to be strict, I would just use that.

Validation isn't pydantic's only capability -- I would still want to annotate fields as bool for type hinting / schema building, etc., even if I didn't care much about how it was validated (more complex validation logic may incur unwanted overhead; if that were the case I'd be fine with minimal validation in most cases). And I may care more about validating other fields, while still wanting type hints, etc. So I do think this implementation of this type could be useful.

But this is a weakly held opinion, and I'm okay with a stricter implementation.

@layday
Copy link
Contributor

@layday layday commented Jul 10, 2019

Here we have a bool that accepts any value but only calls bool on that value in a limited number of cases. We also have a str validator that accepts a variety of types but converts all of them to strs. We have ints that only accept ints but not bools though bool derives from int. It's not surprising that people are taken by surprise by Pydantic's handling of built-in types!

My opinion is this is scope creep. bool is used as a type hint. The semantics is straightforward: you can't subclass bool so anything other than True and False is not a bool. All of this back and forth about type casting is irrelevant. If you want to have a bool-like type that accepts all possible values in YAML, you should not call it a bool - the alternative is to smash everyone's type checkers to smithereens.

@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jul 10, 2019

@layday I don't think you're taking BaseSettings into account?

@layday
Copy link
Contributor

@layday layday commented Jul 10, 2019

@jasonkuhrt It makes sense that you'd want to convert env vars in BaseSettings (it's strings all the way down) but in routine BaseModels I don't think we should lump type hints together with preprocessing - at least not when using built-in types. Depending on your requirements you might find this behaviour very unintituive.

@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jul 10, 2019

@layday I'm not sure what your proposal is, though. Examples? My take on what you're saying:

class A(BaseModel):
    foo: bool

A(foo="true") # ERR
A.parse_obj(dict(foo="true")) # OK A<foo=True>

This seems intuitive, to me.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jul 10, 2019

@layday what about parsing url queries? bool has to be able to accept strings on many scenarios.

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Jul 10, 2019

@layday

the alternative is to smash everyone's type checkers to smithereens.

What issues do you see/foresee with type checking? The value will be True or False once parsed (so will have the specified type), and pre-parsed values are already not expected to be the correct type in many other reasonable cases; e.g., annotating a type as int will cause '3' to be cast to 3. I am not aware of any issues that this is currently causing with type checking.

It's not surprising that people are taken by surprise by Pydantic's handling of built-in types!

I view pydantic's goal not as maintaining 100% consistency with how python handles casting, but rather to be as broadly useful for parsing applications as possible. Sometimes, I think it is justified to have parsing logic that differs from standard python, especially when

  1. the logic is more-or-less standard in other contexts (yaml, web APIs)
  2. the python-default parsing behavior is surprising / not useful ("false" would be cast to True by default), and
  3. it improves/simplifies the day-to-day usage of the library (rather than needing to import some CustomBool type every time I want to have a reasonably-parsed bool type)

This thread's discussion has convinced me that each of these points is met by this change.

While this approach to bool parsing may be unintuitive* for pydantic newcomers, I personally would much prefer the default behavior to be more useful in most cases over marginally more intuitive, and I believe this is what the discussed modification to default bool parsing achieves.

* For what it's worth, I think this thread has already highlighted that python's standard bool parsing behavior is viewed as less intuitive by many users than the proposed bool parsing anyway.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jul 10, 2019

@dmontagu thank you. You've said all the I was thinking.

@layday
Copy link
Contributor

@layday layday commented Jul 10, 2019

What issues do you see/foresee with type checking?

Pydantic encourages the following pattern:

class Foo(BaseModel):
    some_url: URL = 'http://example.com/'

... where the string will be cast to URL on init.

I view pydantic's goal not as maintaining 100% consistency with how python handles casting ...

Type casting is orthogonal to type checking.

... but rather to be as broadly useful as possible for parsing applications as possible. ... While this approach to bool parsing may be unintuitive* for pydantic newcomers, I personally would much prefer the default behavior to be more useful in most cases over marginally more intuitive, and I believe this is what the discussed modification to default bool parsing achieves.

The premise here doesn't really hold. The behaviour is not unintuive to Pydantic newcomers; it's unintuitive in specific contexts. Let's take a real-world example - suppose you want to implement JSON-RPC in Pydantic. JSON-RPC says the value of id can be a number, a string or null. So in Python you write:

class JsonRpcRequest(BaseModel):
    id: Union[None, int, str]


class JsonRpcResponse(BaseModel):
    id: Union[None, int, str]


request = JsonRpcRequest(id='1')
response = JsonRpcResponse(id=request.id)
send(response.json())    # '{"id": 1}'

... and your client can no longer reconcile their request with your response. To me it'd make more sense to default to a simple isinstance check for built-in types and have something akin to:

from pydantic.types import LooseInt

class DataMungingModel(BaseModel):
    id: LooseInt

... for type casting. This would not only be easier than using custom validators to reverse quirks in Pydantic's handling of built-in types but it would also encourage people to use the appropriate type/validator for their use case.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jul 10, 2019

I respect your opinion but this is simply not what pydantic is about, switching everything to isinstance checks would not only break almost every single existing use of pydantic, but it would also IMHO yield a less useful library.

You might get a better understanding from #578 and the issues referenced from there, particularly the "strict mode" referenced in #284.

The only option I can think of to accommodate your view of what pydantic should be, would be a hook in validators.get_validators which allows you to provide your own validators for builtin types? If you'd like that, please create a new issue.

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Jul 10, 2019

@layday

... where the string will be cast to URL on init.
Your example (adapted to use the pydantic type UrlStr) raises a mypy error:

from pydantic import BaseModel, UrlStr

class Foo(BaseModel):
    some_url: UrlStr = 'http://example.com/'
# error: Incompatible types in assignment (expression has type "str", variable has type "UrlStr")

While pydantic would cast the string to a UrlStr, I would argue that the right way to accomplish this in a type-checked way would be to replace it with:

from pydantic import BaseModel, UrlStr

class Foo(BaseModel):
    some_url: UrlStr = UrlStr('http://example.com/')

(which does not raise a mypy error).

I don't view pydantic as opinionated either way here; if you want your code to pass static type checking, there's no obstruction. If not using mypy, I don't see much benefit to the cast; if using mypy, you'll be made aware of the problem immediately.

... and your client can no longer reconcile their request with your response.

I think this example is a valid issue with how the Union type is handled -- I think it is totally reasonable to expect that if you provide a value with a type in the Union that it prefer to retain its original type, rather than casting to the first type it can. (While I could imagine arguments in favor of alternatives, I think this would be less surprising in almost all cases.) This is probably worth a separate issue.

@layday
Copy link
Contributor

@layday layday commented Jul 10, 2019

The only option I can think of to accommodate your view of what pydantic should be, would be a hook in validators.get_validators which allows you to provide your own validators for builtin types? If you'd like that, please create a new issue.

Fair enough. A get_validators hook sounds like a nice compromise solution.

I think this example is a valid issue with how the Union type is handled ...

I'm not sure if special-casing Union would be better than using a strict variant of the type. The presence of Union implies to me that you require types to be parsed 'strictly,' which you'd have an easier time accomplishing universally with the hook suggested above.

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jul 10, 2019

I agree the Union handling is not ideal, maybe a hook for processing Union too would help?

@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jul 10, 2019

What do people think about:

class A(BaseModel):
    foo: bool

A(foo="true") # ERR
A.parse_obj(dict(foo="true")) # OK A<foo=True>

?

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Jul 10, 2019

These two cases are exactly the same and should behave the same as they do now.

@dmontagu
Copy link
Collaborator

@dmontagu dmontagu commented Jul 10, 2019

To be fair, I think the union issue is explicitly addressed in #436:

In the case of pydantic the order is not ignored, we could better document this, but I don't think it's worth fixing since it would require lots more checking and would slow things down.

Given the recent string of github issues related to Union handling, it might be worth revisiting.


@jasonkuhrt I think your suggestion is potentially related to #650 as well -- I think I personally could get on board with that sort of change, though it would obviously break the majority of pydantic code currently in existence.

@jasonkuhrt
Copy link
Contributor

@jasonkuhrt jasonkuhrt commented Jul 10, 2019

I agree that pydantic has been a parsing library. I just don't see the following as a parsing API:

A(foo="true")

That looks like straight forward in-app (e.g. not at IO boundary) model creation to me. Wondering now why pydantic cannot be both:

  1. A replacement for dataclasses
  2. A parsing library

parsing will invariable come from input sources like http requests, stdin, ... and will be passed to e.g. parse_obj?

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Aug 12, 2019

fixed by #617

@samuelcolvin
Copy link
Owner

@samuelcolvin samuelcolvin commented Aug 12, 2019

Sorry, @jasonkuhrt that ship has sailed I'm afraid. 🚢

Maybe A(foo="true") shouldn't have done parsing/validation (don't think I agree, but I can understand the argument), but that was how I designed pydantic back when it was single class in a file somewhere parsing some http headers. It has stayed the same ever since and most people seem quite happy with it that way, this isn't going to change now.

@ievgennaida
Copy link

@ievgennaida ievgennaida commented May 3, 2021

Hello @samuelcolvin,
I have tried the latest version and behavior is still quite strange.
You have stated above that library suppose to be used for the conversion, not for the validation, but current approach is failing to really convert original JSON where bool and str are clearly defined.

My example JSON is:

{values:[false, 'F', 'P', null]}

pydantic:

class Filter(BaseSchema):
    values: Optional[List[Union[bool, str, None]]] = []

How "F" can be False? In the JSON coming type is clearly a string but parsed as bool.
Types are unexpectedly converted when union of allowed types are used.
In a case if str becomes first than all are converted as string including first bool value.

class Filter(BaseSchema):
    values: Optional[List[Union[str, bool, None]]] = []

What I want is just to preserve a type, any workaround for this one?

@ievgennaida
Copy link

@ievgennaida ievgennaida commented May 3, 2021

I have changed to use StrictBool, works like a charm!
Thank you!


from pydantic import (
    StrictBool
)

class Filter(BaseSchema):
    values: Optional[List[Union[StrictBool, None str]]] = []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants