Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmailStr & friends error as Custom Data Types with typing.Annotated. #6506

Closed
1 task done
zakstucke opened this issue Jul 7, 2023 · 10 comments
Closed
1 task done
Assignees
Labels
bug V2 Bug related to Pydantic V2

Comments

@zakstucke
Copy link
Contributor

zakstucke commented Jul 7, 2023

Initial Checks

  • I confirm that I'm using Pydantic V2 installed directly from the main branch, or equivalent

Description

When EmailStr and it looks like most of the types in pydantic/networks.py are used as subtypes with typing.Annotated they error with below traceback.

I believe all that's needed is adding _handler: _annotated_handlers.GetCoreSchemaHandler as a third argument to each __get_pydantic_core_schema__() method in pydantic/networks.py, it seems to fix it for EmailStr at least, this might be prevalant across more types outside networks.py as well.

https://docs.pydantic.dev/latest/usage/types/custom/#creating-custom-classes-using-__get_pydantic_core_schema__

Traceback (most recent call last):
  File "/tmp/ipykernel_2374349/3082715650.py", line 7, in <module>
    class Foo(BaseModel):
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 172, in __new__
    complete_model_class(
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_model_construction.py", line 420, in complete_model_class
    schema = cls.__get_pydantic_core_schema__(cls, handler)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/main.py", line 533, in __get_pydantic_core_schema__
    return __handler(__source)
           ^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 82, in __call__
    schema = self._handler(__source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 266, in generate_schema
    return self._generate_schema_for_type(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 287, in _generate_schema_for_type
    schema = self._generate_schema(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 477, in _generate_schema
    return self._model_schema(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 353, in _model_schema
    {k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()},
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 353, in <dictcomp>
    {k: self._generate_md_field_schema(k, v, decorators) for k, v in fields.items()},
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 618, in _generate_md_field_schema
    common_field = self._common_field_schema(name, field_info, decorators)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 660, in _common_field_schema
    schema = self._apply_annotations(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1318, in _apply_annotations
    schema = get_inner_schema(source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 82, in __call__
    schema = self._handler(__source_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1400, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: EmailStr.__get_pydantic_core_schema__() takes 2 positional arguments but 3 were given

Example Code

import typing as tp
from pydantic import BaseModel, EmailStr

class Foo(BaseModel):
    email: tp.Annotated[str, EmailStr]

Python, Pydantic & OS Version

pydantic version: 2.0.2
        pydantic-core version: 2.1.2 release build profile
                 install path: venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jun  7 2023, 12:45:48) [GCC 11.3.0]
                     platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
     optional deps. installed: ['devtools', 'email-validator', 'typing-extensions']

Selected Assignee: @dmontagu

@zakstucke zakstucke added bug V2 Bug related to Pydantic V2 unconfirmed Bug not yet confirmed as valid/applicable labels Jul 7, 2023
@hramezani
Copy link
Member

hramezani commented Jul 7, 2023

Thanks @zakstucke for reporting this 🙏

Would you like to work on it and open a PR? Otherwise I will create a PR.

@hramezani hramezani removed the unconfirmed Bug not yet confirmed as valid/applicable label Jul 7, 2023
@Kludex
Copy link
Member

Kludex commented Jul 7, 2023

EmailStr isn't supposed to be used as metadata for Annotated.

You can achieve what you want with the following, since EmailStr is already a str.

import typing as tp
from pydantic import BaseModel, EmailStr

class Foo(BaseModel):
    email: EmailStr

@zakstucke
Copy link
Contributor Author

zakstucke commented Jul 7, 2023

@Kludex it's definitely a contrived example but from what I understand this is a valid thing to do? The use cases are to potentially end up with a different type than EmailStr, i.e. doing futher processing and end up with a completely different type to str like a user object.

I've gotten my understanding from
https://docs.pydantic.dev/latest/usage/types/custom/#creating-custom-classes-using-__get_pydantic_core_schema__
where PostCodeAnnotation is used in the same way with

class Model(BaseModel):
    post_code: Annotated[str, PostCodeAnnotation]

it implements the same method __get_pydantic_core_schema__() which is what get's called in both cases, just with the added _handler argument which fixes the error.

@adriangb
Copy link
Member

adriangb commented Jul 7, 2023

It's not really a valid thing to do. PostCodeAnnotation is designed to be used as metadata in Annotated but EmailStr is not.

The lack of the handler argument is a quirk of us never really deciding if we want to support two signatures or one, I'd ignore the fact that it works at all for now until we document it.

Perhaps you want one of these two options:

from typing import Annotated, Any
from pydantic import AfterValidator, GetCoreSchemaHandler, TypeAdapter, ValidationError
from pydantic.networks import EmailStr
from pydantic_core import CoreSchema, core_schema


UpperEmailStr = Annotated[EmailStr, AfterValidator(str.upper)]

ta = TypeAdapter(UpperEmailStr)
assert ta.validate_python('adrian@example.com') == 'ADRIAN@EXAMPLE.COM'
try:
    ta.validate_python('not an email')
except ValidationError as e:
    print(e)
"""
1 validation error for function-after[upper(), function-after[_validate(), str]]
  value is not a valid email address: The email address is not valid. It must have exactly one @-sign. [type=value_error, input_value='not an email', input_type=str]
"""


class MyType(str):
    @classmethod
    def __get_pydantic_core_schema__(cls, _source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
        email_schema = handler.generate_schema(EmailStr)
        return core_schema.no_info_after_validator_function(MyType, email_schema)


ta = TypeAdapter(MyType)
assert isinstance(ta.validate_python('adrian@example.com'), MyType)
try:
    ta.validate_python('not an email')
except ValidationError as e:
    print(e)
"""
1 validation error for function-after[upper(), function-after[_validate(), str]]
  value is not a valid email address: The email address is not valid. It must have exactly one @-sign. [type=value_error, input_value='not an email', input_type=str]
"""

@zakstucke
Copy link
Contributor Author

zakstucke commented Jul 8, 2023

@adriangb okay I get what you’re saying. So I think this issue becomes more general:

Why shouldn’t all custom pydantic types be usable as intermediary validatory types in Annotated metadata as part of a chain?

Annotated is the unsung hero of V2, it’s just so much more ergonomic and concise than creating custom types for reusables in a lot of scenarios, we’re able to remove loads of custom types from V1 that were a pain and messed with type hints around the place, it’s so nice to have validation but keep the original type statically.

At the minute with the PostCodeAnnotation example in the docs and this conversation, it seems unclear & unintuitive the behaviour with these custom types, how they work in metadata in Annotated, and how they interop with BeforeValidators.

Pitch:

  • Just as BeforeValidators are read from right to left, custom types are read right to left with them intertwined and treated as BeforeValidators, which is along the lines of this shim that does exactly what I'm proposing:
def as_b_v(t: type) -> BeforeValidator:
    return BeforeValidator(lambda v: TypeAdapter(t).validate_python(v))
  • All custom data types are by definition valid to be used as metadata, I get what you’re saying about them not all being valid syntax-wise like EmailStr at the minute, but why shouldn’t they be? I think it's impossible to argue some types have no use case as part of a chain of validation
  • The final proper type is on the left, so validated correctly and intuitively with this logic, after the other types and BeforeValidators, AfterValidators run left to right on the final type afterwards like normal as well.

This seems so intuitive and catch all to me!
I can open a new feature request if there aren’t any objections as its scope is a bit different to this issue.

Here's a complete example using the as_b_v() shim to make concrete what I'm getting at:

import typing as tp
from pydantic_core import core_schema
from pydantic import EmailStr, BeforeValidator, TypeAdapter

encrypt = decrypt = lambda s: "".join(reversed(s))

class User:
    email: str

    def __init__(self, email: str):
        self.email = email

    def __repr__(self):
        return "User(email='{}')".format(self.email)

    @classmethod
    def __get_pydantic_core_schema__(cls, *args, **kwargs) -> core_schema.CoreSchema:
        return core_schema.no_info_after_validator_function(
            cls.from_email,
            core_schema.any_schema(),
        )

    @classmethod
    def from_email(cls, email: str) -> tp.Self:
        """Returns the user model from an email address"""

        print("User.from_email() run with email: {}".format(email))
        return User(email=email)


class StrCustomDecoding(str):
    """Decodes str with custom decoding"""

    @classmethod
    def __get_pydantic_core_schema__(cls, *args, **kwargs) -> core_schema.CoreSchema:
        return core_schema.no_info_after_validator_function(
            cls.decoder,
            core_schema.str_schema(),
        )
    
    @classmethod
    def decoder(cls, v):
        print("StrCustomDecoding.decoder() run with v: {}".format(v))
        return decrypt(v)
    
def as_b_v(t: type) -> BeforeValidator:
    return BeforeValidator(lambda v: TypeAdapter(t).validate_python(v))

UserFromEmail = tp.Annotated[User, as_b_v(EmailStr), as_b_v(StrCustomDecoding)]

user = TypeAdapter(UserFromEmail).validate_python(encrypt("foo@bar.com"))

print(user)
StrCustomDecoding.decoder() run with v: moc.rab@oof
User.from_email() run with email: foo@bar.com
User(email='foo@bar.com')

Note: without the as_b_v() shims in this example it seems to produce undefined behaviour, ignoring the User type and finishing after validating with EmailStr (no errors), outputting a string rather than the user object.

@zakstucke
Copy link
Contributor Author

zakstucke commented Jul 9, 2023

Conveniently #6531 suffers from exactly this generic problem, and would be easily answered if this concept was supported with:

from pydantic import constr

tp.Annotated[str, Field(min_length=3, max_length=10), constr(strip_whitespace=True)]

This can also be shimmed as-is with as_bv():

import typing as tp
from pydantic import BeforeValidator, TypeAdapter, constr, Field

def as_bv(t: type) -> BeforeValidator:
    """Converts a type into a BeforeValidator"""
    ta = TypeAdapter(t)
    return BeforeValidator(lambda v: ta.validate_python(v))

MyT = tp.Annotated[str, Field(min_length=3, max_length=10), as_bv(constr(strip_whitespace=True))]

res = TypeAdapter(MyT).validate_python("   123456789   ")

print(f"res: '{res}'")
res: '123456789'

@adriangb
Copy link
Member

adriangb commented Jul 9, 2023

I probably should have posted #6531 (comment) here, but for those reasons, I don't think this is a viable approach and I'm going to have to close the issue. @zakstucke I really appreciate you thinking about this and am happy to discuss other approaches that solve the problem for you but I just don't think this is a viable path for forward for Pydantic.

@adriangb adriangb closed this as completed Jul 9, 2023
@zakstucke
Copy link
Contributor Author

zakstucke commented Jul 9, 2023

@adriangb fair enough, all valid critiques!

Just to note I've been using this shim a lot today, and seems to do the job, so it's okay if you don't think it's practical internally as it is easy to replicate outside as a user, I'm just sure less performant than it could be :)

Having said that, here's my thoughts on all your critiques:

We've worked very hard to make the entirety of validation happen within a single Rust call

  • The as_bv() shim using TypeAdapter is just a way of getting this concept to work outside pydantic, I would expect it could be made much more performant in-house, at a high level I think it's just a matter of analysing the tp.Annotated[...]/core_schema.CoreSchema(s) from the chains in a similar way to n-depth types. I don't think this idea is mutually exclusive to a singular rust call.

but it's also going to wreak havoc with exceptions (the exception from that validation call will end up just being a repr(ValueError()) instead of being collected into the top level ValueError

On a perhaps more philosophical note the metadata in Annotated is not intended to be types, and while there are no runtime errors by putting a type in there it does seem just wrong to do.

  • Definitely arguable, but I would push-back this is somewhat watered down by the usage of BeforeValidators and AfterValidators, reading https://docs.python.org/3/library/typing.html#typing.Annotated it does seem to leave the usage almost entirely up to the implementor. I struggle to see a huge difference between the validators added in V2 and accessing a the validator from a type.

I think the main point of this whole thing is I'm finding really great value in add_bv(), and haven't been able to find anything simpler/as concise that is more "pydantic" as of yet, when solving the problem of using the validation of an existing custom type, in different ways to the types' core use, and not necessarily becoming that type.

Feels like there's this wall between the two conceptually that doesn't need to be there!

Let me know if it's worth discussing further and if so where that should be done :)

@adriangb
Copy link
Member

adriangb commented Jul 9, 2023

How about we wait a bit to see what other solutions come up or how many others need something like this?

@zakstucke
Copy link
Contributor Author

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2
Projects
None yet
Development

No branches or pull requests

5 participants