V2: The `validate` general method #4669

samuelcolvin · 2022-10-27T16:40:06Z

Pydantic V2 will do a massively better job of validating arbitrary objects.

To accomplish this without many methods, we should provide one function which can:

be used as a decorator on a function to validate it's argument, and optionally return type
be used as a decorator on dataclass, NamedTuple, TypedDict
be used to create a "validated" version of anything, e.g. ValidatedTuple = validate(tuple[int, int, int])

With this, the pydantic version of the dataclass decorator, will just become effectively:

def dataclass(*args, **kwargs):
    return validate(dataclasses.dataclass(*args, **kwargs))

Dataclasses need work (#4670), but validation and generate validation schemas for the rest of these types should already work.

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2022-10-27T17:02:42Z

So usage would be

from pydantic import validate, BaseModel

@validate(...config)
def my_method(...):
   ...

@validate(...config)
@dataclass
class MyDataclass:
   ...

@validate(...config)
class MyTypeDict(TypedDict):
   ...

ValidatedTuple = validate(tuple[int, int, int])
ValidatedIntStr = validate(int | str)

class Cat(BaseModel):
    pet_type: Literal['cat']

class Dog(BaseModel):
    pet_type: Literal['dig']

ValidatedPet = validate(PetCat | Dog, descriminator='pet_name')

Main question: How do we define validators, do we continue to extract them as methods, or do we let them be provided as kwargs to validate, or both?

Updated, added more examples.

marcoo47 · 2022-11-18T19:49:13Z

Is this issue still open? if so I was wondering if it would be a reasonable first contribution for me and my partner @smhavens. thanks!

samuelcolvin · 2022-11-18T21:31:38Z

I'm afraid this is probably quite a challenging first task.

How about something like #4675? It's still pretty hard and complicated, but not as wide ranging as this.

gavindsouza · 2022-12-19T11:59:41Z

Q: Would the validate API do more than raise exceptions? Would it try to parse/coerce objects as parse_obj_as (eg: parse_obj_as(str, 1) => "1") does now?

samuelcolvin · 2022-12-19T23:14:59Z

Yes, unless you use strict mode.

samuelcolvin · 2023-01-04T11:44:53Z

(added a few more examples in the above usage example)

So the question is, what does validate return.

Well, it has to depend on what it's called with:

if it's used as a decorator on a dataclass, it needs to return a valid dataclass, with a custom __init__ method
similar if it's used as a decorator on a NamedTuple or TypedDict
if it's used as as decorator on a function, it can simply return another function - this is the easiest case
if it's called with a Union, I'm not sure if we can create something that looks like a valid typing.Union, otherwise it should just return a function
if it's called on a tuple, like validate(tuple[int, int, int]), hopefully we can create a tuple subclass, same for list and dict etc.
some things can't be subclassed, e.g. validate(None), in this case I guess like Union we return a new function

In all these cases, we should hopefully attach enough information for the following:

json_schema(validate(...)) - generate JSON Schema from the return type
serialize_python(validate(...)(...)) - serialize the result of running validation on the return type - the equivalent of my_model.dict()/my_model.model_dict()
serialize_json(validate(...)(...)) - json serialize the result of running validation on the return type - the equivalent of my_model.json()/my_model.model_json()

adriangb · 2023-02-27T18:14:32Z

For non-models we currently support or plan to support two approaches:

from typing import Annotated
from annotated_types import Predicate
from pydantic import Field

NonNegativeInt = Annotated[int, Field(ge=0)]
EvenInt = Annotated[int, Predicate(lambda x: x % 2 == 0)]

The first one already works with @validate_args (a real world use case of non-model validations/constraints like this).

So I think we should make validate something like:

from functools import wraps
import inspect
from typing import Annotated, Any, TypeVar

T = TypeVar("T")

def validate(__type_or_func: T, *args: Any, **kwargs: Any) -> T:
    if inspect.isfunction(__type_or_func) or inspect.ismethod(__type_or_func):
        @wraps(__type_or_func)
        def wrapped(*args: Any, **kwargs: Any) -> Any:
            # do some validation
            return __type_or_func(*args, **kwargs)
        return wrapped  # type: ignore
    else:
        return Annotated[__type_or_func, "some metadata"]  # type: ignore


# tests
from dataclasses import dataclass
from typing import  Literal, TypedDict
from pydantic import BaseModel

@validate
def my_func(a: int) -> int:
    return a

_1: int = my_func(1)

@validate
@dataclass
class Foo:
    a: int

_2: Foo = Foo(123)

@validate
class MyDict(TypedDict):
    a: int

_3: MyDict = {"a": 123}
_4: MyDict = MyDict(a=123)

ValidatedTuple = validate(tuple[int, int, int])
ValidatedIntStr = validate(int | str)

_5: ValidatedTuple = (1, 2, 3)
_6: ValidatedTuple = ValidatedTuple([1, 2, 3])
_7: ValidatedIntStr = 1
_8: ValidatedIntStr = "1"


class Cat(BaseModel):
    pet_type: Literal['cat']

class Dog(BaseModel):
    pet_type: Literal['dig']

ValidatedPet = validate(Cat | Dog, descriminator='pet_name')

_9: ValidatedPet = Cat(pet_type="cat")

This doesn't seem to work with unions. I think Pylance special cases unions because the result is not a type, it's some sort of "special form". We could just say you need to do ValidatedPet = Annotated[Cat | Dog, pydantic.DiscriminatedUnion(discriminator="pet_type")]. This sounds pretty reasonable given that discriminated unions are a somewhat complex use case.

This also does not make TypedDict.__init__ do validation and such. IMO I think we should either make it explicit that we are returning a thing which is not the original thing but instead a validator for it (validate(__thing: T) -> Validated[T] where Validated[T] ~= Callable[..., T]) or have two functions, one to add the metadata necessary for validation (which is enough for the type to be used in an already validated context like a field of a BaseModel, a function argument with @validate_arguments, a parameter in a FastAPI endpoint, etc.) and another to either perform the validation or create a Validated[T] which is not the same type as the original type.

samuelcolvin · 2023-02-27T20:39:27Z

I agree with most of your examples.

We should definitely make it explicit that we're return a new thing, specifically an instance of Validate

With that a user could do

validate_pet = Validate(Cat | Dog, descriminator='pet_name')

cat: Cat = validate_pet({'pet_type': 'cat'})
validate_pet.whatever_we_call_to_json(cat)

With that, the general usage would be:

Validate creates an instance of validate from __init__ - very traditional
validate is used as a decorator - it returns a function which in turn get's called with something and returns an instance of Validate

On the point of how to define validators, we should support:

BeforeValidator, AfterValidator, WrapValidator - as arguments to Annotated
PlainValidator - which can be used as an argument to Annotated, but actually ignores the type annotation in Annotated - the function is entirely responsible for validation and returning the right thing
Predicate which is just an alias to (I guess) AfterValidator
On dataclasses, named tuples, and typeddicts, we should support validators defined using the validator decorate as they are on BaseModel
we should support some kind of sequence or dict of validators as an argument to Validate or validator
we could also (possibly instead of 5.) all validators to be defined in Config which would be a bit less flat to define, but would be more versatile

WDYT?

adriangb · 2023-02-27T21:37:05Z

validate is used as a decorator - it returns a function which in turn get's called with something and returns an instance of Validate

I think this would be a bit problematic: it would erase the original type and using an instance of Validate as a type would not be valid. Let me know if this is what you were thinking:

from dataclasses import dataclass
from typing import Annotated, Any, Callable, Generic, TypeVar, ParamSpec, reveal_type

P = ParamSpec("P")
T = TypeVar("T")

class Validate(Generic[P, T]):
    def __init__(self, __thing: Callable[P, T], *args: Any, **kwargs: Any) -> None:
        ...

    def __call__(self, *args: P.args, **kwargs: P.kwargs) -> T:
        ...

def validate(__thing: Callable[P, T], *args: Any, **kwargs: Any) -> Validate[P, T]:
    return Annotated[__thing, "metadata"]   # type: ignore

@validate
@dataclass
class MyCls:
    a: int

reveal_type(MyCls(a=1))  # MyCls
reveal_type(MyCls)  # Validate[(*args: Any, **kwargs: Any), MyCls]

# Expected type expression but received "Validate[(*args: Any, **kwargs: Any), MyCls]"
def foo(thing: MyCls) -> None:
    pass

So I don't think @validate should return an instance of Validate, it should attach enough metadata to create that:

@validated(after=lambda x: x.a > 2)
@dataclass
class MyCls:
    a: int

my_cls_validator = Validate(MyCls)

reveal_type(MyCls(a=1))  # MyCls
reveal_type(MyCls)  # type[MyCls]

def foo(thing: MyCls) -> None:
    pass

foo(my_cls_validator({"a": "3"}))
my_cls_validator({"a": "1"})  # fails
reveal_type(my_cls_validator({"a": "1"})  # MyCls

But if used in a validated context, you would never need to create a Validate instance:

@app.get("/foo")
def endpoint(thing: MyCls):  # FastAPI creates the Validate instance internally
   ...

class MyModel(BaseModel):
  field: MyCls  # MyCls' validation logic is applied to this field

adriangb · 2023-02-27T21:48:47Z

BeforeValidator, AfterValidator, WrapValidator - as arguments to Annotated

Yeah this sounds good to me. Something like:

from pydantic.validators import AfterValidator, WrapValidator, PlainValidator

Number = float | int 
ValidatedNumberWrapped = Annoated[Number, WrapValidator(Number, lambda v, handler: handler(v))]
ValidatedNumberAfter = Annoated[Number, AfterValidator(Number, lambda v: v)]
ValidatedNumberPlain = Annoated[Number, PlainValidator(Number, lambda v: v)]

Note that I'm requiring that WrapValidator take in the type as an argument so that it can be type checked.
There is no way for arguments to Annotated to statically type check the type they receive.

PlainValidator - which can be used as an argument to Annotated, but actually ignores the type annotation in Annotated - the function is entirely responsible for validation and returning the right thing

I'm not sure I'm understanding what is special about PlainValidator. I actually may not understand the use case for plain at all.

Predicate which is just an alias to (I guess) AfterValidator

The only difference is that Predicate accepts a function that returns a boolean. It can't modify the value. Pydantic validators return a value or raise an exception.

On dataclasses, named tuples, and typeddicts, we should support validators defined using the validator decorate as they are on BaseModel

Yes agreed:

@dataclass
class MyCls:
  a: int

 @validator("a", mode="wrap")
 def validate_a(cls, v: int, handler: Callable[[int], int]) -> int:
    return handler(v)

Although part of me wants there to be only "one way" to do this (to do a: Annotated[int, WrapValidator(...)]).

we should support some kind of sequence or dict of validators as an argument to Validate or validator

Yep that's the idea 👍🏻 .

adriangb · 2023-03-07T22:03:31Z

What semantics do we expect Validator to have with respect to forward references? Let's look at some simple cases:

from typing import List

from pydantic import BaseModel

IntList = List[int]
OuterList = List["IntList"]

class MyModel(BaseModel):
    x: OuterList

This works on both V1 and V2. It works because MyModel assumes that any forward references it encounters while recursively parsing it's annotations exist either in it's local namespace or it's global namespace. There are lots of situations where this breaks down (models defined in nested functions, forward references imported from another module, etc.). But in general this makes sense.

Take the same situation with Validator:

from typing import List

from pydantic import Validator

IntList = List[int]
OuterList = List["IntList"]

Validator(OuterList)

I think in this case it makes sense for it to work the same as a model. But I expect Validator is going to be used quite differently from a model:

IntList = List[int]
OuterList = List["IntList"]

async def endpoint(body: OuterList): ...

In this case I'd expect Validator to get called from somewhere inside the web framework (something like what FastAPI does here). So the globals or locals from where it is called are pretty much meaningless. In other words: I think that Validator is much closer to create_model than it is to BaseModel. And indeed create_model does not work here:

from typing import List

from pydantic import create_model

IntList = List[int]
OuterList = List["IntList"]

MyModel = create_model("MyModel", x=(OuterList, ...))
MyModel.update_forward_refs()

If you pass in __module__ to create_model it does work:

from typing import List

from pydantic import create_model

IntList = List[int]
OuterList = List["IntList"]

MyModel = create_model("MyModel", __module__=__name__, x=(OuterList, ...))
MyModel.update_forward_refs()

So I think what we should do for Validator is emulate create_model, not BaseModel. It will look something like:

from typing import List

from pydantic import Validator

IntList = List[int]
OuterList = List["IntList"]

Validator(OuterList, globalns=globals(), localns=locals())

dmontagu · 2023-03-07T23:24:58Z

Validator(OuterList, globalns=globals(), localns=locals())

I think if, in practice, there is some significant fraction of usages of Validator where globalns=globals() and localns=locals() are the values that should be used, I'd be inclined to treat those values as defaults in case users (and not just frameworks) do end up wanting to use Validator in their own code.

(As soon as I see APIs where I am forced to pass a namespace in, my eyes glaze over and I assume that I will need to put in more effort to understand what is required than I want to. On the other hand, if I see APIs where I have the option to pass in a namespace but it "just works" if I don't, I proceed with much less fear.)

As a pydantic user I have frequently found pydantic.tools.parse_obj_as (from v1) useful, and would like to retain that functionality in v2. I was under the impression that Validator was meant, at least in part, to replace parse_obj_as in a more reusable manner, and considering parse_obj_as literally just creates a temp-model with a field of the provided type, I think there may be a category of use cases where it would be helpful to use the same/"standard" namespace-inference behavior that is used when building a BaseModel subclass. So unless there is some challenge with that OR if there is a plan for a different API to replace parse_obj_as (perhaps wrapping Validator), I would be inclined to try to match the BaseModel-subclass namespace handling by default.

Either way, I agree that it makes sense to allow namespace overrides for exactly the reasons indicated above.

adriangb · 2023-03-08T00:50:57Z

I haven't used parse_obj_as much but I just tried it on the 1.10.X-fixes branch:

from typing import List

from pydantic import parse_obj_as

IntList = List[int]
OuterList = List["IntList"]

parse_obj_as(OuterList, [[1]])

pydantic.errors.ConfigError: field "__root__" not yet prepared so type is still a ForwardRef, you might need to call ParsingModel[List[ForwardRef('IntList')]].update_forward_refs()

So this doesn't seem to work at all with parse_obj_as 😞

adriangb · 2023-03-08T01:23:30Z

@dmontagu pointed out that we can use sys._getframe(1).f_globals to get the globals of the frame where we were called from. While this is not the same as the globals of the frame where the forward reference is bound or even where it is defined, it is often the same, so we're going to use that as a first step at least 😄

tiangolo · 2023-03-08T15:27:50Z

I wish I had a stronger/useful opinion but I don't. I'm not really sure of the advantages/disadvantages down the line. 🤔

I suspect that stringified forward references will be much less needed/problematic after PEP 649 (which seems like it's probably what's gonna be accepted). But that solves mainly for places that take only type annotations, not that much for Python expressions. Not even sure if that's irrelevant here.

I'm not sure I have a lot of clarity about the difference in semantics between create_model() and BaseModel with respect to Validator.

For FastAPI, it's true that it's important to be able to define in function parameters Pydantic fields, not models, to allow all the extra types (e.g. list[str]), but you pointed it out already above.

I agree that something decorated with @validate should probably not change its actual type when possible, e.g. MyClass should keep being type[MyClass] and not Validator[type[MyClass]] or something similar. To not break type checkers, autocompletion, etc. But I suspect we all agree on that, right?

Sorry if I'm misunderstanding something and talking nonsense/unrelated stuff in some of the points above. 😅

Thanks @adriangb for pinging me in DM! (I wouldn't have noticed a tag in GitHub, I have too many GitHub notifications).

dmontagu · 2023-03-08T16:15:01Z

I agree that something decorated with @Validate should probably not change its actual type when possible, e.g. MyClass should keep being type[MyClass] and not Validator[type[MyClass]] or something similar. To not break type checkers, autocompletion, etc. But I suspect we all agree on that, right?

I'll just note that thanks to improvements in mypy, specifically with improved support for decorator typing, I suspect you could get this to type-check properly. That said, I don't have a strong opinion about what the best approach is; not trying to argue that we should go down that route.

adriangb · 2023-03-22T17:02:42Z

Per discussion on #5235 we might table the Annotated[..., PlainValidator(...)] feature until after the initial V2 release since it is not strictly necessary to replace deprecated functionality as long as we implement #5246

adriangb · 2023-05-15T05:53:05Z

I think what we have now with TypeAdapter is good enough for now. We can circle back once we have feedback on that.

samuelcolvin added feature request help wanted Pull Request welcome labels Oct 27, 2022

samuelcolvin changed the title ~~The validate general method`~~ The validate general method Oct 27, 2022

samuelcolvin changed the title ~~The validate general method~~ V2: The validate general method Oct 27, 2022

samuelcolvin mentioned this issue Oct 27, 2022

V2: Make Config a dict not a class #4673

Closed

samuelcolvin mentioned this issue Nov 12, 2022

V2: Strict mode for types defined in python #4664

Closed

samuelcolvin mentioned this issue Dec 15, 2022

feat(minor): Allow passing config to parse_obj_as #4805

Closed

samuelcolvin mentioned this issue Jan 4, 2023

V2: Descriminated Unions #4675

Closed

This was referenced Jan 4, 2023

Dataclass inheritance breaks default_factory field #4906

Closed

Protected namespaces #4915

Closed

samuelcolvin mentioned this issue Jan 26, 2023

Pydantic Helper Functions Raise json.JSONDecodeError #4985

Closed

15 tasks

samuelcolvin self-assigned this Feb 20, 2023

adriangb self-assigned this Mar 2, 2023

adriangb mentioned this issue Mar 7, 2023

Introduce Validator for arbitrary non-model types #5145

Merged

samuelcolvin mentioned this issue Mar 17, 2023

create free standing functions to match model methods #5213

Open

adriangb closed this as completed May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2: The `validate` general method #4669

V2: The `validate` general method #4669

samuelcolvin commented Oct 27, 2022 •

edited

Loading

samuelcolvin commented Oct 27, 2022 •

edited

Loading

marcoo47 commented Nov 18, 2022

samuelcolvin commented Nov 18, 2022

gavindsouza commented Dec 19, 2022

samuelcolvin commented Dec 19, 2022

samuelcolvin commented Jan 4, 2023

adriangb commented Feb 27, 2023

samuelcolvin commented Feb 27, 2023

adriangb commented Feb 27, 2023

adriangb commented Feb 27, 2023

adriangb commented Mar 7, 2023 •

edited

Loading

dmontagu commented Mar 7, 2023 •

edited

Loading

adriangb commented Mar 8, 2023

adriangb commented Mar 8, 2023

tiangolo commented Mar 8, 2023

dmontagu commented Mar 8, 2023 •

edited

Loading

adriangb commented Mar 22, 2023

adriangb commented May 15, 2023

V2: The validate general method #4669

V2: The validate general method #4669

Comments

samuelcolvin commented Oct 27, 2022 • edited Loading

samuelcolvin commented Oct 27, 2022 • edited Loading

marcoo47 commented Nov 18, 2022

samuelcolvin commented Nov 18, 2022

gavindsouza commented Dec 19, 2022

samuelcolvin commented Dec 19, 2022

samuelcolvin commented Jan 4, 2023

adriangb commented Feb 27, 2023

samuelcolvin commented Feb 27, 2023

adriangb commented Feb 27, 2023

adriangb commented Feb 27, 2023

adriangb commented Mar 7, 2023 • edited Loading

dmontagu commented Mar 7, 2023 • edited Loading

adriangb commented Mar 8, 2023

adriangb commented Mar 8, 2023

tiangolo commented Mar 8, 2023

dmontagu commented Mar 8, 2023 • edited Loading

adriangb commented Mar 22, 2023

adriangb commented May 15, 2023

V2: The `validate` general method #4669

V2: The `validate` general method #4669

samuelcolvin commented Oct 27, 2022 •

edited

Loading

samuelcolvin commented Oct 27, 2022 •

edited

Loading

adriangb commented Mar 7, 2023 •

edited

Loading

dmontagu commented Mar 7, 2023 •

edited

Loading

dmontagu commented Mar 8, 2023 •

edited

Loading