-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
I'm wondering about making a major change to pydantic's internals in v2.
Validators are currently a list of functions that are called one after another, here's the actual code
I want to switch to an "Onion" in much the same way many web frameworks implement middleware, here is a description from django's docs.
The idea is that validation would be done by one function, which when called, called the next layer down until you get to the inner-most function which just does some parsing/validation and returns the value.
Advantages of this approach:
No more pre=True, pre=False
Your custom validators can now look like
@validator('foobar')
def foobar_validator(cls, v, handler)
v = v * 2
v = handler(v)
return v.upper()(or something more sane)
Instead of either pre=True or pre=False, and the need for multiple validators if you want to do both.
Skipping futher validation
The validator can also choose not to call handler and thereby skip heavy validation in some cases, or implement custom behaviour.
@validator('ts')
def ts_validator(cls, v, handler)
if v == 'now':
return datetime.now()
else:
return handler(v)This might be useful for the case of None which currently has lots of custom logic around not calling further validators when the value is None.
Catching errors
You could also catch exceptions from inner validation and modify the error or continue with some default value.
Maybe faster in simple cases
For very simple cases (which are very common) e.g. a plain string, validation would be as simple as calling one function.
This should be much faster than the current "iterate oven a list of one element". Edit: or perhaps not, hard to say.
Much faster with lots of validators
Code like
can be modified to
@classmethod
def __validate__(cls, value: Any, **kwargs) -> Decimal:
value = decimal_validator(value, **kwargs)
value = number_size_validator(value, **kwargs)
value = number_multiple_validator(value, **kwargs)
... do custom decimal validationWhich cython should be able to compile to be much faster.
simplify the interface to custom types
Instead of the current __get_validators__ interface, custom types could simply have a __validate__ method, which is called for validation. We could do this anyway I guess, but it would make more sense with the onion.
Better stack trace
If we ever provide a way to get access to the stack trace from the exception when validation fails, it should be longer but clearer tha the current one would be. (This is not currently a feature request, maybe I'm the only person who would want it).
Potential disadvantages
A big rewrite (obviously)
But it might not be too bad if we do it in a major version change and use it as an opportunity to rewrite a lot of fields.py which I think could do with an improvement (e.g. the way we deal with None)
backwards compatibility
I guess for decorated validators we have to provide some backwards compatibility, just a wrapper function which calls the actual validator either before or after handler() depending on pre.
To keep the interface as as simple as possible, it would be useful if handler was always an option argument to a validator, so I guess this could stay for good.
need to change the signature of all the reusable validators (advantage & disadvantage)
Currently we have lots of clever (and ugly) logic so that all the keyword arguments to validators are optional.
As the decimal example above shows, this will very confusing and brittle when validators are manually calling other validators.
I think best that all validators take config, fields and values (and perhaps context #1549) and pass them all on to other validators. Validators that don't need access to some arguments can just ignore them with **kwargs. This should remove a lot of fluff from _generic_validator_basic
Decorated validators are a special case and keyword arguments should still all be optional.
Performance?
I guess this will be faster in some cases and slower in others. Will it make pydantic faster overall?
The main problem is avoiding dynamic functions which cython can't compile.
My first proposal for an implementation of Onion is very simple:
class Onion:
def __init__(self, *functions: Callable):
self.outer_layers = list(functions)
self.inner = self.outer_layers.pop()
self.layer_iter: Iterable[Callable]
def __call__(self, v):
self.layer_iter = iter(self.outer_layers)
return self.call_layer(v)
def call_layer(self, v):
try:
func = next(self.layer_iter)
except StopIteration:
return self.inner(v)
else:
return func(v, handler=self.call_layer)Can we do much better than that performance-wize?
Most system I think do something more like
validator = partial(validator, handler=inner_validator)is that faster with cython?
More generally, does this sound like a good idea?
@PrettyWood @tiangolo @StephenBrown2 @dmontagu, anyone else?