Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string constraints are applied after pattern #8577

Open
1 task done
dhofstetter opened this issue Jan 18, 2024 · 9 comments
Open
1 task done

string constraints are applied after pattern #8577

dhofstetter opened this issue Jan 18, 2024 · 9 comments
Labels
change Suggested alteration to pydantic, not a new feature nor a bug v3 Under consideration for V3

Comments

@dhofstetter
Copy link

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

Hi guys,
I found out that string constraints (like to_loweror strip_whitespace are applied after a defined pattern check.

I.m.o that doesn't make sense, as operations like to_loweror strip_whitespace would make a string eventually starting to match the pattern.
See the code example below to get an idea what I mean.

Is there a special reason for the order of operations?

Example Code

from typing import Annotated

from pydantic import GetPydanticSchema, TypeAdapter, ValidationError
from pydantic_core import PydanticCustomError, core_schema


class MyStringType(str):
    @classmethod
    def from_string(cls, v: str) -> "MyStringType":
        if not v.startswith("my"):
            raise PydanticCustomError("invalid_format")
        return MyStringType(v)


MyString = Annotated[
    MyStringType,
    # Expectation is, that whitespace and to_lower are applied before the pattern, as
    # the string might match a pattern after going through the strip and lowercase process?!
    GetPydanticSchema(
        lambda tp, handler: core_schema.no_info_after_validator_function(
            MyString.from_string, handler(core_schema.str_schema(
                to_lower=True, strip_whitespace=True, min_length=1, pattern=r"^my.*$",
            ))
        )
    ),
]

adapter = TypeAdapter(MyString)

assert adapter.validate_python("mystring") == "mystring"   # OK
assert adapter.validate_python("mystring ") == "mystring"  # OK
try:
    assert adapter.validate_python(" MYSTRING") == "mystring"  # NOK
except ValidationError as e:
    print(e)
    '''
    1 validation error for function-after[from_string(), constrained-str]
        String should match pattern '^my.*$' [type=string_pattern_mismatch, input_value=' MYSTRING', input_type=str]
            For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
    '''

Python, Pydantic & OS Version

pydantic version: 2.5.3
        pydantic-core version: 2.14.6
          pydantic-core build: profile=release pgo=true
                 install path: /Users/dh/Development/yio/yio/yio-fastapi-utils/venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.3 (main, Apr  7 2023, 20:13:31) [Clang 14.0.0 (clang-1400.0.29.202)]
                     platform: macOS-14.2.1-arm64-arm-64bit
             related packages: fastapi-0.108.0 typing_extensions-4.8.0 mypy-1.8.0 pydantic-settings-2.1.0 pydantic-extra-types-2.3.0 email-validator-2.1.0.post1
@dhofstetter dhofstetter added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Jan 18, 2024
@sydney-runkle
Copy link
Member

Hi @dhofstetter,

Thanks for the example + question. Here's a snippet from the pydantic-core string validation logic that shows the order in which string constraints are checked / applied:

https://github.com/pydantic/pydantic-core/blob/4da7192ffc104cd6c424d102bc1c9c5bfdad543e/src/validators/string.rs#L75-L142.

As to the motivation behind this ordering, I think the thought is that after the length + pattern checks, you can standardize the string to some extend with settings like to_lower, etc.

I see your argument re having these transformations take place before the pattern check. However, changing this behavior now would constitute a breaking change for many users. One workaround that you could use in the meantime would be to use a validator with mode set to before to preprocess your data.

@samuelcolvin, is this something that we'd consider changing in V3, or are we set on this order of validation checks + transformations for strings?

@sydney-runkle sydney-runkle added change Suggested alteration to pydantic, not a new feature nor a bug v3 Under consideration for V3 and removed bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Jan 18, 2024
@sydney-runkle
Copy link
Member

Alright, marking this with the change and V3 tags, as I think this is something we could at least revisit for V3.

I don't think that changing the default (current behavior) would be a great idea, though I can see an argument for supporting customization of the order in which string constraints are applied 🚀

@samuelcolvin
Copy link
Member

I see the argument for changing it, but it's a very annoying breaking change to understand, so my instinct is we don't change this, even in a major update.

As for the reason - my guess is that the best explanation is "because that's how it was first done" and there hasn't been a strong (enough) argument to change it.

@dhofstetter
Copy link
Author

@samuelcolvin Thanks for your explanation. My strong argument is simply: That writing a pattern that accepts leading/and trailing whitespace is not a big deal, but makes the regex pattern even harder to understand. The same is for to_upper and lower, writing a pattern accepting both if I just want one, creates a pattern that is harder to understand

@sydney-runkle Thanks for digging into it and providing such a detailed response.

Conclusio
I totally agree, that changing how things were first implemented is a strong argument. But from my point of view, v2 introduced a lot of "breaking" changes. And if one hits yourself by accident (meaning that your own codebase is affected by that change) its personally considered as annoying ;)
So I thing a nice way would be to define the order of StringConstraints by using some list of literals maybe.

BR Daniel

@HansBrende
Copy link

@dhofstetter I can think of one pretty compelling reason not to change it:

Changing it as you suggest would make any OpenAPI schema generated from your model WRONG, since the pattern included in that schema would no longer necessarily match strings that your model supports.

@dhofstetter
Copy link
Author

@dhofstetter I can think of one pretty compelling reason not to change it:

Changing it as you suggest would make any OpenAPI schema generated from your model WRONG, since the pattern included in that schema would no longer necessarily match strings that your model supports.

May you provide an example? I guess I don't get your point.

@SDAravind
Copy link

SDAravind commented Apr 24, 2024

I see the argument for changing it, but it's a very annoying breaking change to understand, so my instinct is we don't change this, even in a major update.

As for the reason - my guess is that the best explanation is "because that's how it was first done" and there hasn't been a strong (enough) argument to change it.

see below, lowercase/uppercase is set to true and strip whitespaces is set to true. Interestingly, we see the second part of email is always lowercase. This is very strange behavior. Though, I can exclude strip_whitespace as EmailStr takes care of striping whitespaces.

I'm bit confused when to use and not to use strip_whitespace. Should we use strip_whitespace with others params?

from pydantic import BaseModel, EmailStr, StringConstraints
from typing import Annotated

EStr = Annotated[EmailStr, StringConstraints(to_lower=True, strip_whitespace=True)]


class Foo(BaseModel):
    bar: EStr


Foo(bar="uSeR@ExAmPlE.com")
>>>  Foo(bar='uSeR@example.com')

@sydney-runkle
Copy link
Member

@SDAravind,

Hmph, that looks like a bug, feel free to open a new issue with that report!

@angely-dev
Copy link

angely-dev commented Jul 25, 2024

@sydney-runkle and @samuelcolvin you are saying that fixing/implementing this would introduce a breaking change, but this was the way it worked with Pydantic V1. So it seems Pydantic V2 introduced a breaking change.

Looking at the Pydantic V1 source code, I think it has to do with https://github.com/pydantic/pydantic/blob/v1.10.17/pydantic/types.py#L428-L429 where validators are applied before the regex check:

    @classmethod
    def __get_validators__(cls) -> 'CallableGenerator':
        yield strict_str_validator if cls.strict else str_validator
        yield constr_strip_whitespace
        yield constr_upper
        yield constr_lower
        yield constr_length_validator
        yield cls.validate  # last

    @classmethod
    def validate(cls, value: Union[str]) -> Union[str]:
        # ...
        if cls.regex:
            if not re.match(cls.regex, value):
                raise errors.StrRegexError(pattern=cls._get_pattern(cls.regex))

Here is below an MRE taken from the official doc:

Working example using Pydantic V1

from pydantic import BaseModel, constr                                                             
                                                                                                   
class Foo(BaseModel):                                                                              
    bar: constr(strip_whitespace=True, to_upper=True, regex=r'^[A-Z]+$')                           
                                                                                                   
                                                                                                   
foo = Foo(bar='  hello  ')                                                                         
print(foo)
#> bar='HELLO'

Same (failing) example using Pydantic V2

from pydantic import BaseModel, constr

class Foo(BaseModel):
    bar: constr(strip_whitespace=True, to_upper=True, pattern=r'^[A-Z]+$')


foo = Foo(bar='  hello  ')
print(foo)
Traceback (most recent call last):
  File "/tmp/wksp/pattern.py", line 7, in <module>
    foo = Foo(bar='  hello  ')
  File "/tmp/wksp/venv/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Foo
bar
  String should match pattern '^[A-Z]+$' [type=string_pattern_mismatch, input_value='  hello  ', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/string_pattern_mismatch

Version:

$ python -c "import pydantic.version; print(pydantic.version.version_info())"
             pydantic version: 2.8.2
        pydantic-core version: 2.20.1
          pydantic-core build: profile=release pgo=true
                 install path: /tmp/wksp/venv/lib/python3.10/site-packages/pydantic
               python version: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
                     platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
             related packages: typing_extensions-4.12.2 fastapi-0.111.1
                       commit: unknown

Please note the above code is taken from the official doc and throws an error, whereas the doc says it prints HELLO:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change Suggested alteration to pydantic, not a new feature nor a bug v3 Under consideration for V3
Projects
None yet
Development

No branches or pull requests

6 participants