Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: The validate general method #4669

Closed
samuelcolvin opened this issue Oct 27, 2022 · 18 comments
Closed

V2: The validate general method #4669

samuelcolvin opened this issue Oct 27, 2022 · 18 comments
Assignees
Labels
feature request help wanted Pull Request welcome

Comments

@samuelcolvin
Copy link
Member

samuelcolvin commented Oct 27, 2022

Pydantic V2 will do a massively better job of validating arbitrary objects.

To accomplish this without many methods, we should provide one function which can:

  • be used as a decorator on a function to validate it's argument, and optionally return type
  • be used as a decorator on dataclass, NamedTuple, TypedDict
  • be used to create a "validated" version of anything, e.g. ValidatedTuple = validate(tuple[int, int, int])

With this, the pydantic version of the dataclass decorator, will just become effectively:

def dataclass(*args, **kwargs):
    return validate(dataclasses.dataclass(*args, **kwargs))

Dataclasses need work (#4670), but validation and generate validation schemas for the rest of these types should already work.

@samuelcolvin samuelcolvin changed the title The validate general method` The validate general method Oct 27, 2022
@samuelcolvin samuelcolvin changed the title The validate general method V2: The validate general method Oct 27, 2022
@samuelcolvin
Copy link
Member Author

samuelcolvin commented Oct 27, 2022

So usage would be

from pydantic import validate, BaseModel

@validate(...config)
def my_method(...):
   ...

@validate(...config)
@dataclass
class MyDataclass:
   ...

@validate(...config)
class MyTypeDict(TypedDict):
   ...

ValidatedTuple = validate(tuple[int, int, int])
ValidatedIntStr = validate(int | str)

class Cat(BaseModel):
    pet_type: Literal['cat']

class Dog(BaseModel):
    pet_type: Literal['dig']

ValidatedPet = validate(PetCat | Dog, descriminator='pet_name')

Main question: How do we define validators, do we continue to extract them as methods, or do we let them be provided as kwargs to validate, or both?


Updated, added more examples.

@marcoo47
Copy link

Is this issue still open? if so I was wondering if it would be a reasonable first contribution for me and my partner @smhavens. thanks!

@samuelcolvin
Copy link
Member Author

I'm afraid this is probably quite a challenging first task.

How about something like #4675? It's still pretty hard and complicated, but not as wide ranging as this.

@gavindsouza
Copy link

Q: Would the validate API do more than raise exceptions? Would it try to parse/coerce objects as parse_obj_as (eg: parse_obj_as(str, 1) => "1") does now?

@samuelcolvin
Copy link
Member Author

Yes, unless you use strict mode.

@samuelcolvin
Copy link
Member Author

(added a few more examples in the above usage example)

So the question is, what does validate return.

Well, it has to depend on what it's called with:

  • if it's used as a decorator on a dataclass, it needs to return a valid dataclass, with a custom __init__ method
  • similar if it's used as a decorator on a NamedTuple or TypedDict
  • if it's used as as decorator on a function, it can simply return another function - this is the easiest case
  • if it's called with a Union, I'm not sure if we can create something that looks like a valid typing.Union, otherwise it should just return a function
  • if it's called on a tuple, like validate(tuple[int, int, int]), hopefully we can create a tuple subclass, same for list and dict etc.
  • some things can't be subclassed, e.g. validate(None), in this case I guess like Union we return a new function

In all these cases, we should hopefully attach enough information for the following:

  • json_schema(validate(...)) - generate JSON Schema from the return type
  • serialize_python(validate(...)(...)) - serialize the result of running validation on the return type - the equivalent of my_model.dict()/my_model.model_dict()
  • serialize_json(validate(...)(...)) - json serialize the result of running validation on the return type - the equivalent of my_model.json()/my_model.model_json()

@adriangb
Copy link
Member

For non-models we currently support or plan to support two approaches:

from typing import Annotated
from annotated_types import Predicate
from pydantic import Field

NonNegativeInt = Annotated[int, Field(ge=0)]
EvenInt = Annotated[int, Predicate(lambda x: x % 2 == 0)]

The first one already works with @validate_args (a real world use case of non-model validations/constraints like this).

So I think we should make validate something like:

from functools import wraps
import inspect
from typing import Annotated, Any, TypeVar

T = TypeVar("T")

def validate(__type_or_func: T, *args: Any, **kwargs: Any) -> T:
    if inspect.isfunction(__type_or_func) or inspect.ismethod(__type_or_func):
        @wraps(__type_or_func)
        def wrapped(*args: Any, **kwargs: Any) -> Any:
            # do some validation
            return __type_or_func(*args, **kwargs)
        return wrapped  # type: ignore
    else:
        return Annotated[__type_or_func, "some metadata"]  # type: ignore


# tests
from dataclasses import dataclass
from typing import  Literal, TypedDict
from pydantic import BaseModel

@validate
def my_func(a: int) -> int:
    return a

_1: int = my_func(1)

@validate
@dataclass
class Foo:
    a: int

_2: Foo = Foo(123)

@validate
class MyDict(TypedDict):
    a: int

_3: MyDict = {"a": 123}
_4: MyDict = MyDict(a=123)

ValidatedTuple = validate(tuple[int, int, int])
ValidatedIntStr = validate(int | str)

_5: ValidatedTuple = (1, 2, 3)
_6: ValidatedTuple = ValidatedTuple([1, 2, 3])
_7: ValidatedIntStr = 1
_8: ValidatedIntStr = "1"


class Cat(BaseModel):
    pet_type: Literal['cat']

class Dog(BaseModel):
    pet_type: Literal['dig']

ValidatedPet = validate(Cat | Dog, descriminator='pet_name')

_9: ValidatedPet = Cat(pet_type="cat")

This doesn't seem to work with unions. I think Pylance special cases unions because the result is not a type, it's some sort of "special form". We could just say you need to do ValidatedPet = Annotated[Cat | Dog, pydantic.DiscriminatedUnion(discriminator="pet_type")]. This sounds pretty reasonable given that discriminated unions are a somewhat complex use case.

This also does not make TypedDict.__init__ do validation and such. IMO I think we should either make it explicit that we are returning a thing which is not the original thing but instead a validator for it (validate(__thing: T) -> Validated[T] where Validated[T] ~= Callable[..., T]) or have two functions, one to add the metadata necessary for validation (which is enough for the type to be used in an already validated context like a field of a BaseModel, a function argument with @validate_arguments, a parameter in a FastAPI endpoint, etc.) and another to either perform the validation or create a Validated[T] which is not the same type as the original type.

@samuelcolvin
Copy link
Member Author

I agree with most of your examples.

We should definitely make it explicit that we're return a new thing, specifically an instance of Validate

With that a user could do

validate_pet = Validate(Cat | Dog, descriminator='pet_name')

cat: Cat = validate_pet({'pet_type': 'cat'})
validate_pet.whatever_we_call_to_json(cat)

With that, the general usage would be:

  • Validate creates an instance of validate from __init__ - very traditional
  • validate is used as a decorator - it returns a function which in turn get's called with something and returns an instance of Validate

On the point of how to define validators, we should support:

  1. BeforeValidator, AfterValidator, WrapValidator - as arguments to Annotated
  2. PlainValidator - which can be used as an argument to Annotated, but actually ignores the type annotation in Annotated - the function is entirely responsible for validation and returning the right thing
  3. Predicate which is just an alias to (I guess) AfterValidator
  4. On dataclasses, named tuples, and typeddicts, we should support validators defined using the validator decorate as they are on BaseModel
  5. we should support some kind of sequence or dict of validators as an argument to Validate or validator
  6. we could also (possibly instead of 5.) all validators to be defined in Config which would be a bit less flat to define, but would be more versatile

WDYT?

@adriangb
Copy link
Member

validate is used as a decorator - it returns a function which in turn get's called with something and returns an instance of Validate

I think this would be a bit problematic: it would erase the original type and using an instance of Validate as a type would not be valid. Let me know if this is what you were thinking:

from dataclasses import dataclass
from typing import Annotated, Any, Callable, Generic, TypeVar, ParamSpec, reveal_type

P = ParamSpec("P")
T = TypeVar("T")

class Validate(Generic[P, T]):
    def __init__(self, __thing: Callable[P, T], *args: Any, **kwargs: Any) -> None:
        ...

    def __call__(self, *args: P.args, **kwargs: P.kwargs) -> T:
        ...

def validate(__thing: Callable[P, T], *args: Any, **kwargs: Any) -> Validate[P, T]:
    return Annotated[__thing, "metadata"]   # type: ignore

@validate
@dataclass
class MyCls:
    a: int

reveal_type(MyCls(a=1))  # MyCls
reveal_type(MyCls)  # Validate[(*args: Any, **kwargs: Any), MyCls]

# Expected type expression but received "Validate[(*args: Any, **kwargs: Any), MyCls]"
def foo(thing: MyCls) -> None:
    pass

So I don't think @validate should return an instance of Validate, it should attach enough metadata to create that:

@validated(after=lambda x: x.a > 2)
@dataclass
class MyCls:
    a: int

my_cls_validator = Validate(MyCls)

reveal_type(MyCls(a=1))  # MyCls
reveal_type(MyCls)  # type[MyCls]

def foo(thing: MyCls) -> None:
    pass

foo(my_cls_validator({"a": "3"}))
my_cls_validator({"a": "1"})  # fails
reveal_type(my_cls_validator({"a": "1"})  # MyCls

But if used in a validated context, you would never need to create a Validate instance:

@app.get("/foo")
def endpoint(thing: MyCls):  # FastAPI creates the Validate instance internally
   ...

class MyModel(BaseModel):
  field: MyCls  # MyCls' validation logic is applied to this field

@adriangb
Copy link
Member

  1. BeforeValidator, AfterValidator, WrapValidator - as arguments to Annotated

Yeah this sounds good to me. Something like:

from pydantic.validators import AfterValidator, WrapValidator, PlainValidator

Number = float | int 
ValidatedNumberWrapped = Annoated[Number, WrapValidator(Number, lambda v, handler: handler(v))]
ValidatedNumberAfter = Annoated[Number, AfterValidator(Number, lambda v: v)]
ValidatedNumberPlain = Annoated[Number, PlainValidator(Number, lambda v: v)]

Note that I'm requiring that WrapValidator take in the type as an argument so that it can be type checked.
There is no way for arguments to Annotated to statically type check the type they receive.

  1. PlainValidator - which can be used as an argument to Annotated, but actually ignores the type annotation in Annotated - the function is entirely responsible for validation and returning the right thing

I'm not sure I'm understanding what is special about PlainValidator. I actually may not understand the use case for plain at all.

  1. Predicate which is just an alias to (I guess) AfterValidator

The only difference is that Predicate accepts a function that returns a boolean. It can't modify the value. Pydantic validators return a value or raise an exception.

  1. On dataclasses, named tuples, and typeddicts, we should support validators defined using the validator decorate as they are on BaseModel

Yes agreed:

@dataclass
class MyCls:
  a: int

 @validator("a", mode="wrap")
 def validate_a(cls, v: int, handler: Callable[[int], int]) -> int:
    return handler(v)

Although part of me wants there to be only "one way" to do this (to do a: Annotated[int, WrapValidator(...)]).

  1. we should support some kind of sequence or dict of validators as an argument to Validate or validator

Yep that's the idea 👍🏻 .

@adriangb adriangb self-assigned this Mar 2, 2023
@adriangb
Copy link
Member

adriangb commented Mar 7, 2023

What semantics do we expect Validator to have with respect to forward references? Let's look at some simple cases:

from typing import List

from pydantic import BaseModel

IntList = List[int]
OuterList = List["IntList"]

class MyModel(BaseModel):
    x: OuterList

This works on both V1 and V2. It works because MyModel assumes that any forward references it encounters while recursively parsing it's annotations exist either in it's local namespace or it's global namespace. There are lots of situations where this breaks down (models defined in nested functions, forward references imported from another module, etc.). But in general this makes sense.

Take the same situation with Validator:

from typing import List

from pydantic import Validator

IntList = List[int]
OuterList = List["IntList"]

Validator(OuterList)

I think in this case it makes sense for it to work the same as a model. But I expect Validator is going to be used quite differently from a model:

IntList = List[int]
OuterList = List["IntList"]

async def endpoint(body: OuterList): ...

In this case I'd expect Validator to get called from somewhere inside the web framework (something like what FastAPI does here). So the globals or locals from where it is called are pretty much meaningless. In other words: I think that Validator is much closer to create_model than it is to BaseModel. And indeed create_model does not work here:

from typing import List

from pydantic import create_model

IntList = List[int]
OuterList = List["IntList"]

MyModel = create_model("MyModel", x=(OuterList, ...))
MyModel.update_forward_refs()

If you pass in __module__ to create_model it does work:

from typing import List

from pydantic import create_model

IntList = List[int]
OuterList = List["IntList"]

MyModel = create_model("MyModel", __module__=__name__, x=(OuterList, ...))
MyModel.update_forward_refs()

So I think what we should do for Validator is emulate create_model, not BaseModel. It will look something like:

from typing import List

from pydantic import Validator

IntList = List[int]
OuterList = List["IntList"]

Validator(OuterList, globalns=globals(), localns=locals())

@dmontagu
Copy link
Contributor

dmontagu commented Mar 7, 2023

Validator(OuterList, globalns=globals(), localns=locals())

I think if, in practice, there is some significant fraction of usages of Validator where globalns=globals() and localns=locals() are the values that should be used, I'd be inclined to treat those values as defaults in case users (and not just frameworks) do end up wanting to use Validator in their own code.

(As soon as I see APIs where I am forced to pass a namespace in, my eyes glaze over and I assume that I will need to put in more effort to understand what is required than I want to. On the other hand, if I see APIs where I have the option to pass in a namespace but it "just works" if I don't, I proceed with much less fear.)

As a pydantic user I have frequently found pydantic.tools.parse_obj_as (from v1) useful, and would like to retain that functionality in v2. I was under the impression that Validator was meant, at least in part, to replace parse_obj_as in a more reusable manner, and considering parse_obj_as literally just creates a temp-model with a field of the provided type, I think there may be a category of use cases where it would be helpful to use the same/"standard" namespace-inference behavior that is used when building a BaseModel subclass. So unless there is some challenge with that OR if there is a plan for a different API to replace parse_obj_as (perhaps wrapping Validator), I would be inclined to try to match the BaseModel-subclass namespace handling by default.

Either way, I agree that it makes sense to allow namespace overrides for exactly the reasons indicated above.

@adriangb
Copy link
Member

adriangb commented Mar 8, 2023

I haven't used parse_obj_as much but I just tried it on the 1.10.X-fixes branch:

from typing import List

from pydantic import parse_obj_as

IntList = List[int]
OuterList = List["IntList"]

parse_obj_as(OuterList, [[1]])
pydantic.errors.ConfigError: field "__root__" not yet prepared so type is still a ForwardRef, you might need to call ParsingModel[List[ForwardRef('IntList')]].update_forward_refs()

So this doesn't seem to work at all with parse_obj_as 😞

@adriangb
Copy link
Member

adriangb commented Mar 8, 2023

@dmontagu pointed out that we can use sys._getframe(1).f_globals to get the globals of the frame where we were called from. While this is not the same as the globals of the frame where the forward reference is bound or even where it is defined, it is often the same, so we're going to use that as a first step at least 😄

@tiangolo
Copy link
Member

tiangolo commented Mar 8, 2023

I wish I had a stronger/useful opinion but I don't. I'm not really sure of the advantages/disadvantages down the line. 🤔


I suspect that stringified forward references will be much less needed/problematic after PEP 649 (which seems like it's probably what's gonna be accepted). But that solves mainly for places that take only type annotations, not that much for Python expressions. Not even sure if that's irrelevant here.


I'm not sure I have a lot of clarity about the difference in semantics between create_model() and BaseModel with respect to Validator.

For FastAPI, it's true that it's important to be able to define in function parameters Pydantic fields, not models, to allow all the extra types (e.g. list[str]), but you pointed it out already above.


I agree that something decorated with @validate should probably not change its actual type when possible, e.g. MyClass should keep being type[MyClass] and not Validator[type[MyClass]] or something similar. To not break type checkers, autocompletion, etc. But I suspect we all agree on that, right?


Sorry if I'm misunderstanding something and talking nonsense/unrelated stuff in some of the points above. 😅


Thanks @adriangb for pinging me in DM! (I wouldn't have noticed a tag in GitHub, I have too many GitHub notifications).

@dmontagu
Copy link
Contributor

dmontagu commented Mar 8, 2023

I agree that something decorated with @Validate should probably not change its actual type when possible, e.g. MyClass should keep being type[MyClass] and not Validator[type[MyClass]] or something similar. To not break type checkers, autocompletion, etc. But I suspect we all agree on that, right?

I'll just note that thanks to improvements in mypy, specifically with improved support for decorator typing, I suspect you could get this to type-check properly. That said, I don't have a strong opinion about what the best approach is; not trying to argue that we should go down that route.

@adriangb
Copy link
Member

Per discussion on #5235 we might table the Annotated[..., PlainValidator(...)] feature until after the initial V2 release since it is not strictly necessary to replace deprecated functionality as long as we implement #5246

@adriangb
Copy link
Member

I think what we have now with TypeAdapter is good enough for now. We can circle back once we have feedback on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants