-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string constraints are applied after pattern #8577
Comments
Hi @dhofstetter, Thanks for the example + question. Here's a snippet from the As to the motivation behind this ordering, I think the thought is that after the length + pattern checks, you can standardize the string to some extend with settings like I see your argument re having these transformations take place before the pattern check. However, changing this behavior now would constitute a breaking change for many users. One workaround that you could use in the meantime would be to use a validator with mode set to @samuelcolvin, is this something that we'd consider changing in V3, or are we set on this order of validation checks + transformations for strings? |
Alright, marking this with the I don't think that changing the default (current behavior) would be a great idea, though I can see an argument for supporting customization of the order in which string constraints are applied 🚀 |
I see the argument for changing it, but it's a very annoying breaking change to understand, so my instinct is we don't change this, even in a major update. As for the reason - my guess is that the best explanation is "because that's how it was first done" and there hasn't been a strong (enough) argument to change it. |
@samuelcolvin Thanks for your explanation. My strong argument is simply: That writing a pattern that accepts leading/and trailing whitespace is not a big deal, but makes the regex pattern even harder to understand. The same is for to_upper and lower, writing a pattern accepting both if I just want one, creates a pattern that is harder to understand @sydney-runkle Thanks for digging into it and providing such a detailed response. Conclusio BR Daniel |
@dhofstetter I can think of one pretty compelling reason not to change it: Changing it as you suggest would make any OpenAPI schema generated from your model WRONG, since the pattern included in that schema would no longer necessarily match strings that your model supports. |
May you provide an example? I guess I don't get your point. |
see below, lowercase/uppercase is set to true and strip whitespaces is set to true. Interestingly, we see the second part of email is always lowercase. This is very strange behavior. Though, I can exclude strip_whitespace as EmailStr takes care of striping whitespaces. I'm bit confused when to use and not to use from pydantic import BaseModel, EmailStr, StringConstraints
from typing import Annotated
EStr = Annotated[EmailStr, StringConstraints(to_lower=True, strip_whitespace=True)]
class Foo(BaseModel):
bar: EStr
Foo(bar="uSeR@ExAmPlE.com")
>>> Foo(bar='uSeR@example.com') |
Hmph, that looks like a bug, feel free to open a new issue with that report! |
@sydney-runkle and @samuelcolvin you are saying that fixing/implementing this would introduce a breaking change, but this was the way it worked with Pydantic V1. So it seems Pydantic V2 introduced a breaking change. Looking at the Pydantic V1 source code, I think it has to do with https://github.com/pydantic/pydantic/blob/v1.10.17/pydantic/types.py#L428-L429 where validators are applied before the regex check: @classmethod
def __get_validators__(cls) -> 'CallableGenerator':
yield strict_str_validator if cls.strict else str_validator
yield constr_strip_whitespace
yield constr_upper
yield constr_lower
yield constr_length_validator
yield cls.validate # last
@classmethod
def validate(cls, value: Union[str]) -> Union[str]:
# ...
if cls.regex:
if not re.match(cls.regex, value):
raise errors.StrRegexError(pattern=cls._get_pattern(cls.regex)) Here is below an MRE taken from the official doc: Working example using Pydantic V1from pydantic import BaseModel, constr
class Foo(BaseModel):
bar: constr(strip_whitespace=True, to_upper=True, regex=r'^[A-Z]+$')
foo = Foo(bar=' hello ')
print(foo)
#> bar='HELLO' Same (failing) example using Pydantic V2from pydantic import BaseModel, constr
class Foo(BaseModel):
bar: constr(strip_whitespace=True, to_upper=True, pattern=r'^[A-Z]+$')
foo = Foo(bar=' hello ')
print(foo)
Version: $ python -c "import pydantic.version; print(pydantic.version.version_info())"
pydantic version: 2.8.2
pydantic-core version: 2.20.1
pydantic-core build: profile=release pgo=true
install path: /tmp/wksp/venv/lib/python3.10/site-packages/pydantic
python version: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
related packages: typing_extensions-4.12.2 fastapi-0.111.1
commit: unknown Please note the above code is taken from the official doc and throws an error, whereas the doc says it prints |
Initial Checks
Description
Hi guys,
I found out that string constraints (like
to_lower
orstrip_whitespace
are applied after a defined pattern check.I.m.o that doesn't make sense, as operations like
to_lower
orstrip_whitespace
would make a string eventually starting to match the pattern.See the code example below to get an idea what I mean.
Is there a special reason for the order of operations?
Example Code
Python, Pydantic & OS Version
The text was updated successfully, but these errors were encountered: