string constraints are applied after pattern #8577

dhofstetter · 2024-01-18T07:15:32Z

Initial Checks

I confirm that I'm using Pydantic V2

Description

Hi guys,
I found out that string constraints (like to_loweror strip_whitespace are applied after a defined pattern check.

I.m.o that doesn't make sense, as operations like to_loweror strip_whitespace would make a string eventually starting to match the pattern.
See the code example below to get an idea what I mean.

Is there a special reason for the order of operations?

Example Code

from typing import Annotated

from pydantic import GetPydanticSchema, TypeAdapter, ValidationError
from pydantic_core import PydanticCustomError, core_schema


class MyStringType(str):
    @classmethod
    def from_string(cls, v: str) -> "MyStringType":
        if not v.startswith("my"):
            raise PydanticCustomError("invalid_format")
        return MyStringType(v)


MyString = Annotated[
    MyStringType,
    # Expectation is, that whitespace and to_lower are applied before the pattern, as
    # the string might match a pattern after going through the strip and lowercase process?!
    GetPydanticSchema(
        lambda tp, handler: core_schema.no_info_after_validator_function(
            MyString.from_string, handler(core_schema.str_schema(
                to_lower=True, strip_whitespace=True, min_length=1, pattern=r"^my.*$",
            ))
        )
    ),
]

adapter = TypeAdapter(MyString)

assert adapter.validate_python("mystring") == "mystring"   # OK
assert adapter.validate_python("mystring ") == "mystring"  # OK
try:
    assert adapter.validate_python(" MYSTRING") == "mystring"  # NOK
except ValidationError as e:
    print(e)
    '''
    1 validation error for function-after[from_string(), constrained-str]
        String should match pattern '^my.*$' [type=string_pattern_mismatch, input_value=' MYSTRING', input_type=str]
            For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
    '''

Python, Pydantic & OS Version

pydantic version: 2.5.3
        pydantic-core version: 2.14.6
          pydantic-core build: profile=release pgo=true
                 install path: /Users/dh/Development/yio/yio/yio-fastapi-utils/venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.3 (main, Apr  7 2023, 20:13:31) [Clang 14.0.0 (clang-1400.0.29.202)]
                     platform: macOS-14.2.1-arm64-arm-64bit
             related packages: fastapi-0.108.0 typing_extensions-4.8.0 mypy-1.8.0 pydantic-settings-2.1.0 pydantic-extra-types-2.3.0 email-validator-2.1.0.post1

The text was updated successfully, but these errors were encountered:

sydney-runkle · 2024-01-18T13:18:42Z

Hi @dhofstetter,

Thanks for the example + question. Here's a snippet from the pydantic-core string validation logic that shows the order in which string constraints are checked / applied:

https://github.com/pydantic/pydantic-core/blob/4da7192ffc104cd6c424d102bc1c9c5bfdad543e/src/validators/string.rs#L75-L142.

As to the motivation behind this ordering, I think the thought is that after the length + pattern checks, you can standardize the string to some extend with settings like to_lower, etc.

I see your argument re having these transformations take place before the pattern check. However, changing this behavior now would constitute a breaking change for many users. One workaround that you could use in the meantime would be to use a validator with mode set to before to preprocess your data.

@samuelcolvin, is this something that we'd consider changing in V3, or are we set on this order of validation checks + transformations for strings?

sydney-runkle · 2024-01-18T13:25:09Z

Alright, marking this with the change and V3 tags, as I think this is something we could at least revisit for V3.

I don't think that changing the default (current behavior) would be a great idea, though I can see an argument for supporting customization of the order in which string constraints are applied 🚀

samuelcolvin · 2024-01-18T13:39:54Z

I see the argument for changing it, but it's a very annoying breaking change to understand, so my instinct is we don't change this, even in a major update.

As for the reason - my guess is that the best explanation is "because that's how it was first done" and there hasn't been a strong (enough) argument to change it.

dhofstetter · 2024-01-18T15:07:13Z

@samuelcolvin Thanks for your explanation. My strong argument is simply: That writing a pattern that accepts leading/and trailing whitespace is not a big deal, but makes the regex pattern even harder to understand. The same is for to_upper and lower, writing a pattern accepting both if I just want one, creates a pattern that is harder to understand

@sydney-runkle Thanks for digging into it and providing such a detailed response.

Conclusio
I totally agree, that changing how things were first implemented is a strong argument. But from my point of view, v2 introduced a lot of "breaking" changes. And if one hits yourself by accident (meaning that your own codebase is affected by that change) its personally considered as annoying ;)
So I thing a nice way would be to define the order of StringConstraints by using some list of literals maybe.

BR Daniel

HansBrende · 2024-01-20T16:21:29Z

@dhofstetter I can think of one pretty compelling reason not to change it:

Changing it as you suggest would make any OpenAPI schema generated from your model WRONG, since the pattern included in that schema would no longer necessarily match strings that your model supports.

dhofstetter · 2024-01-22T05:48:30Z

@dhofstetter I can think of one pretty compelling reason not to change it:

Changing it as you suggest would make any OpenAPI schema generated from your model WRONG, since the pattern included in that schema would no longer necessarily match strings that your model supports.

May you provide an example? I guess I don't get your point.

SDAravind · 2024-04-24T15:38:52Z

I see the argument for changing it, but it's a very annoying breaking change to understand, so my instinct is we don't change this, even in a major update.

As for the reason - my guess is that the best explanation is "because that's how it was first done" and there hasn't been a strong (enough) argument to change it.

see below, lowercase/uppercase is set to true and strip whitespaces is set to true. Interestingly, we see the second part of email is always lowercase. This is very strange behavior. Though, I can exclude strip_whitespace as EmailStr takes care of striping whitespaces.

I'm bit confused when to use and not to use strip_whitespace. Should we use strip_whitespace with others params?

from pydantic import BaseModel, EmailStr, StringConstraints
from typing import Annotated

EStr = Annotated[EmailStr, StringConstraints(to_lower=True, strip_whitespace=True)]


class Foo(BaseModel):
    bar: EStr


Foo(bar="uSeR@ExAmPlE.com")
>>>  Foo(bar='uSeR@example.com')

sydney-runkle · 2024-04-28T21:31:59Z

@SDAravind,

Hmph, that looks like a bug, feel free to open a new issue with that report!

angely-dev · 2024-07-25T17:54:38Z

@sydney-runkle and @samuelcolvin you are saying that fixing/implementing this would introduce a breaking change, but this was the way it worked with Pydantic V1. So it seems Pydantic V2 introduced a breaking change.

Looking at the Pydantic V1 source code, I think it has to do with https://github.com/pydantic/pydantic/blob/v1.10.17/pydantic/types.py#L428-L429 where validators are applied before the regex check:

    @classmethod
    def __get_validators__(cls) -> 'CallableGenerator':
        yield strict_str_validator if cls.strict else str_validator
        yield constr_strip_whitespace
        yield constr_upper
        yield constr_lower
        yield constr_length_validator
        yield cls.validate  # last

    @classmethod
    def validate(cls, value: Union[str]) -> Union[str]:
        # ...
        if cls.regex:
            if not re.match(cls.regex, value):
                raise errors.StrRegexError(pattern=cls._get_pattern(cls.regex))

Here is below an MRE taken from the official doc:

Working example using Pydantic V1

from pydantic import BaseModel, constr                                                             
                                                                                                   
class Foo(BaseModel):                                                                              
    bar: constr(strip_whitespace=True, to_upper=True, regex=r'^[A-Z]+$')                           
                                                                                                   
                                                                                                   
foo = Foo(bar='  hello  ')                                                                         
print(foo)
#> bar='HELLO'

Same (failing) example using Pydantic V2

from pydantic import BaseModel, constr

class Foo(BaseModel):
    bar: constr(strip_whitespace=True, to_upper=True, pattern=r'^[A-Z]+$')


foo = Foo(bar='  hello  ')
print(foo)

Traceback (most recent call last):
  File "/tmp/wksp/pattern.py", line 7, in <module>
    foo = Foo(bar='  hello  ')
  File "/tmp/wksp/venv/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Foo
bar
  String should match pattern '^[A-Z]+$' [type=string_pattern_mismatch, input_value='  hello  ', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/string_pattern_mismatch

Version:

$ python -c "import pydantic.version; print(pydantic.version.version_info())"
             pydantic version: 2.8.2
        pydantic-core version: 2.20.1
          pydantic-core build: profile=release pgo=true
                 install path: /tmp/wksp/venv/lib/python3.10/site-packages/pydantic
               python version: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
                     platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
             related packages: typing_extensions-4.12.2 fastapi-0.111.1
                       commit: unknown

Please note the above code is taken from the official doc and throws an error, whereas the doc says it prints HELLO:

dhofstetter added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Jan 18, 2024

sydney-runkle added change Suggested alteration to pydantic, not a new feature nor a bug v3 Under consideration for V3 and removed bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Jan 18, 2024

davidpeckham mentioned this issue Jan 22, 2024

Literals don't accept inputs with whitespace when str_strip_whitespace is True #8597

Closed

1 task

sydney-runkle mentioned this issue Mar 26, 2024

(WIP) high priority / high impact work 🚀 #9102

Open

21 tasks

SDAravind mentioned this issue May 3, 2024

StringConstraints: Bug #9381

Closed

sydney-runkle mentioned this issue Jun 10, 2024

Fix annotation application logic #9623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string constraints are applied after pattern #8577

string constraints are applied after pattern #8577

dhofstetter commented Jan 18, 2024

sydney-runkle commented Jan 18, 2024

sydney-runkle commented Jan 18, 2024

samuelcolvin commented Jan 18, 2024

dhofstetter commented Jan 18, 2024

HansBrende commented Jan 20, 2024

dhofstetter commented Jan 22, 2024

SDAravind commented Apr 24, 2024 •

edited

Loading

sydney-runkle commented Apr 28, 2024

angely-dev commented Jul 25, 2024 •

edited

Loading

string constraints are applied after pattern #8577

string constraints are applied after pattern #8577

Comments

dhofstetter commented Jan 18, 2024

Initial Checks

Description

Example Code

Python, Pydantic & OS Version

sydney-runkle commented Jan 18, 2024

sydney-runkle commented Jan 18, 2024

samuelcolvin commented Jan 18, 2024

dhofstetter commented Jan 18, 2024

HansBrende commented Jan 20, 2024

dhofstetter commented Jan 22, 2024

SDAravind commented Apr 24, 2024 • edited Loading

sydney-runkle commented Apr 28, 2024

angely-dev commented Jul 25, 2024 • edited Loading

Working example using Pydantic V1

Same (failing) example using Pydantic V2

SDAravind commented Apr 24, 2024 •

edited

Loading

angely-dev commented Jul 25, 2024 •

edited

Loading