Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose Field components into Annotated #2129

Closed
4 tasks done
JacobHayes opened this issue Nov 17, 2020 · 11 comments
Closed
4 tasks done

Decompose Field components into Annotated #2129

JacobHayes opened this issue Nov 17, 2020 · 11 comments

Comments

@JacobHayes
Copy link
Contributor

JacobHayes commented Nov 17, 2020

Feature Request

PEP 593's typing.Annotated (or typing_extensions.Annotated) provides some scaffolding to do a lot of things that are currently handled with Field. This was recognized in #837, which suggested adding support for var: Annotated[type, Field(...)] in addition to the current var: type = Field(...). I'm proposing we take this one step further and we decompose Field into a collection of classes (1 for each Field kwarg) that could be instantiated and used in Annotated directly. The benefits would be:

  • the usage may be shorter (ie: Annotated[int, Description("...")] vs Annotated[int, Field(description="...")]
  • they'd play/look nicer with non-pydantic metadata and could replace **extra
    • caveat: **extra are explicitly meant for Field, however Annotated values may not. If this is an issue, perhaps we can define a small interface/FieldAnnotation base class that could be use to check.
  • we (maybe?) wouldn't have to deal with a default set here vs assigned if we didn't add a Default to replace the positional arg

One thing we may have to deal with is the ability for there to be multiple instances of any of the classes and define some conflict resolution (likely "take last" or "take outer", however this manifests with Annotated). Arguably, we'd also have to handle this for Annotated[type, Field(...), Field(...) too, but this decomposition of Field would make it much easier to override only a small piece (or perhaps support multiple, in the case of Alias for example).

Unless this were a v2 only change (not sure on timeline), we'll of course have to support both this and the existing var: type = Field(...) patterns. Separate from this vs Annotated[type, Field(...)], we need to support Annotated metadata in general. I have no familiarity with the pydantic code base, but would be happy to help out with some guidance.

It seems like with a little extra work, we could replace all of the con* with these annotations, which would make mypy much happier. We might be able to do that in a backwards compatible-ish way if we make the con* calls wrappers returning the appropriate Annotated hint. Probably tricky for v1.

Example of "my custom code outside pydantic" if implemented:

from typing import Annotated

from pydantic import GE, Alias, BaseModel, Description, FieldAnnotation


# Subclass FieldAnnotation so pydantic will identify the type. Perhaps these could define addition
# validation/checks on the field (eg: `count` below), similar to how GE would presumably work (or is that validation
# separate)?
class PII(BaseModel, FieldAnnotation):
    status: bool


class FooBar(BaseModel):
    count: Annotated[
        int, Alias("cnt"), Description("Count of FooBars"), GE(0), PII(True)
    ]


# Then, FooBar.schema_json would contain my custom PII field and I could of course inspect these things at runtime.

WDYT?

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this feature/change is needed
  • After submitting this, I commit to one of:
    • Look through open issues and helped at least one other person
    • Hit the "watch" button on this repo to receive notifications and I commit to help at least 2 people that ask questions in the future
    • Implement a Pull Request for a confirmed bug
@mpkocher
Copy link

I mentioned some similar ideas here.

#1984 (comment)

@JacobHayes
Copy link
Contributor Author

Working prototype up at https://github.com/JacobHayes/pydantic-annotated

@samuelcolvin
Copy link
Member

High level: This is something I've been meaning to think about for sometime, very happy you've done most of the thinking for me!

I think basic support for Annotated would be great.


In more detail:

var: Annotated[type, Field(...)] sounds fine, but i'm not really sure how far it moves the dial in terms of clarity or elegance.

Really we need PEP-637 when presumably we'll be able to do var: Annotated[type, description='...', etc=whatever]. I guess that's the holy grail (yes that's a python pun) of model field definitions.

Annotated[int, Description("...")] or Annotated[int, Alias("...")] is a bit helpful but means a lot of imports.

Given the following options

  1. var: int = Field(description="...", alias="...")
  2. var: Annotated[int, Field(description="...", alias="...")]
  3. var: Annotated[int, Description("..."), Alias("...")]

Which is really clearest or quickest to type. I would say 1. or 2.

With regard to changes in V2 - I doubt we can really make significant changes to con* methods or the usage of Field() even in V2. What we have now makes intuitive sense to most developers, I think abandoning that and forcing all code to use Annotated would piss a lot of people off.

I think it would be better to have sensible positional arguments to constrained types, e.g. like #2104, and add more arguments to Field()


In summary, let's provide support for var: Annotated[type, Field(...)] asap, but hold off on Description("..."), Alias("...") unless lots of people really want it.

@JacobHayes
Copy link
Contributor Author

Sounds good, happy to take a stab at that. Any preference for "take right", "merge right", or "error" for the (edge) cases of:

ShortInt = Annotated[int, Field(..., lt=256)]

class Model(BaseModel):
    x1: Annotated[int, Field(..., lt=256)] = Field(..., gt=0)
    x2: Annotated[int, Field(..., lt=256), Field(..., gt=0)]
    x3: Annotated[ShortInt, Field(..., gt=0)] # more subtle, but exact same as x2

"error" seems like a fine first pass that would allow us to open up over time (but "merge right" would be very cool 😁).


Ah, glad to see PEP-637 was revived! Not pertinent now, but with kwargs, we may still need to identify pydantic and non-pydantic hints, given:

If a library (or tool) encounters a typehint Annotated[T, x] and has no special logic for metadata x, it should ignore it and simply treat the type as T.

but that might fall inline with separate custom FieldAnnotation-style classes if it's a concern. Similarly, I wonder about kwarg conflicts, but presumably that will come up in a followup to PEP-637 that extends it to Annotated.

@samuelcolvin
Copy link
Member

samuelcolvin commented Nov 30, 2020

Great, regarding errors:

class Model(BaseModel):
    x1: Annotated[int, Field(..., lt=256)] = Field(..., gt=0) # error surely?
    x2: Annotated[int, Field(..., lt=256), Field(..., gt=0)]  # don't really mind, probably error, but we shouldn't document this
    x3: Annotated[ShortInt, Field(..., gt=0)]  # same as now with "x3: ShortInt = Field(..., gt=0)"

@JacobHayes
Copy link
Contributor Author

JacobHayes commented Dec 2, 2020

I think the above should be implemented in the PR now (pending some iteration on error messages and docs).


For further/separate discussion (motivated by being very new to pydantic), I'd love to pick your brain on two more implementation/interface things:

  1. What is the line between con* and Field from a user's perspective in an ideal world?
  2. What is the distinction between a "type" and a "model"?
    • At quick glance, the "types" seem to largely be a binding of a python type to parsers/validators? Does this tight binding force an explosion in the type space as we need to support new combinations of validation?
    • Complex types (eg: HttpUrl, et al) seem a great fit as a model and simple types could be __root__ models (with a slight improvement of kwarg-less init for __root__ models only)
    • I think Model(BaseModel) has the same validation mechanism as the (mostly scalar) core types?

Knowing I'm missing a lot of context (and risking orthogonal topics), I wonder if Annotated opens the door to some tidier interfaces and perhaps simplified implementation. For example:

  1. Annotated[...] passes mypy checking (pending release) while con*() doesn't

    • this seems key to solving mypy: invalid type comment or annotation #156 and inability to create complex models dynamically #1493 and perhaps stricter checks in pydantic.mypy (eg: confirm literals validate gt at mypy time)

    • Ex - mypy
      from pydantic import BaseModel, Field, conint
      from typing_extensions import Annotated
      
      
      class Model(BaseModel):
          x1: Annotated[int, Field(..., gt=5)]
          x2: conint(gt=5)
      $ mypy annotated_vs_con.py
      annotated_vs_con.py:7: error: Invalid type comment or annotation
      annotated_vs_con.py:7: note: Suggestion: use conint[...] instead of conint(...)
      Found 1 error in 1 file (checked 1 source file)
  2. Detour - Decomposing Fields into Validator (+ other composable Metadata classes) would ease adding new validation

    • Instead of new Constrained* types and/or more kwargs to Fields, one could just subclass Validator and it'd be "first class"

      • Reducing __modify_schema__ boilerplate for each Constrained* type (ex: minLength=cls.min_length x4 in types.py) - it'd be on the Validator.
      • This likely means folks would be less dependent on pydantic to implement/expand con*/Fields (though we'd of course provide the same shortcuts we have today)
    • Ex - Validator
      from decimal import Decimal
      from enum import Enum
      from typing import Any, Union
      
      from pydantic import BaseModel, errors
      from typing_extensions import Annotated
      
      Number = Union[int, float, Decimal]
      
      
      # Should it be called 'Parser'/`.parse`? c.f. https://github.com/samuelcolvin/pydantic/issues/578
      class Validator:
          # Probably other args, as per #2034. Avoid conflict with BaseModel.validate
          def validate_value(self, value: Any) -> Any:
              return value
      
      
      # Free to subclass BaseModel for complex validation args
      class LT(Validator, BaseModel):
          max_value: Number
      
          def validate_value(self, value: Number) -> Number:
              if value > self.max_value:
                  raise ValueError(...)
              return value
      
      class Model(BaseModel):
          x: Annotated[int, LT(max_value=100)]
    • We could also add a small decorator that would wrap simple function (like some in pydantic/validators.py) into a Validator

    • Ex - Validator Decorator
      # Continued from above
      
      # May want to take other args too.
      def validator(fn):
          # TODO: Register the validator based on the fn's return signature
          return type(fn.__name__, (Validator,), {"validate_value": staticmethod(fn)})
      
      
      @validator
      def str_validator(v: Any) -> str:
          if isinstance(v, str):
              if isinstance(v, Enum):
                  return v.value
              else:
                  return v
          elif isinstance(v, (float, int, Decimal)):
              # is there anything else we want to add here? If you think so, create an issue.
              return str(v)
          elif isinstance(v, (bytes, bytearray)):
              return v.decode()
          else:
              raise errors.StrError()
      
      
      class Model(BaseModel):
          y: str  # Use default str validation (str_validator) applied as per `find_validators`
  3. Annotated supports easy extension/validation composition (particularly with the decomposed classes, but also possible if we "merge" Fields)

    • which might be able to replace an explosion of Constrained* types

    • since con* functions are used in type hints, they could instead return an equivalent Annotated[<base type>, validators...] value to preserve (some) backwards compatibility (but still likely break many pydantic introspection libs).

    • Ex - Limited con* Compatibility
      from typing import Type
      
      from pydantic import BaseModel
      from typing_extensions import Annotated
      
      # Noop for now
      StripWhitespace = Strict = MinLength = MaxLength = CurtailLength = Regex = lambda x: x
      
      CON_STR_BYTES_KWARG_TO_VALIDATOR = {
          'strip_whitespace': StripWhitespace,
          'strict': Strict,
          'min_length': MinLength,
          'max_length': MaxLength,
          'curtail_length': CurtailLength,
          'regex': Regex,
      }
      
      
      def constr(
          *,
          strip_whitespace: bool = False,
          strict: bool = False,
          min_length: int = None,
          max_length: int = None,
          curtail_length: int = None,
          regex: str = None,
      ) -> Type[str]:
          validators = [
              CON_STR_BYTES_KWARG_TO_VALIDATOR[key](value)
              for key, value in {
                  'strip_whitespace': strip_whitespace,
                  'strict': strict,
                  'min_length': min_length,
                  'max_length': max_length,
                  'curtail_length': curtail_length,
                  'regex': regex,
              }.items() # Or whatever order
          ]
          return Annotated[(str, *validators)]
      
      
      class Model(BaseModel):
          x: constr(min_length=5)
    • I think this would effectively decouple types (python builtins or models) and parsing/validation - but allow very easy coupling by creating Annotated aliases.

    • this might also solve things like BaseModel change type of field #2041 (the base type is always str, we're just adding/composing validators, not changing types)

    • Ex - Composition

      This example uses the "decomposed" Field validation pieces, but in theory we could do Field merging.

      from decimal import Decimal
      from typing import Any, Generic, Tuple, TypeVar, Union
      
      from pydantic import BaseModel
      from typing_extensions import Annotated, get_args
      
      Number = Union[int, float, Decimal]
      
      
      class Validator:  # Dup from above
          def validate_value(self, value: Any) -> Any:
              return value
      
      
      class LT(Validator, BaseModel):  # Dup from above
          max_value: Number
      
          def validate_value(self, value: Number) -> Number:
              if value > self.max_value:
                  raise ValueError(...)
              return value
      
      
      class GT(Validator):
          def __init__(self, min_value: Number):
              self.min_value = min_value
      
          def validate_value(self, value: Number) -> Number:
              if value < self.min_value:
                  raise ValueError(...)
              return value
      
      
      class IsEven(Validator):
          def __init__(self, status: bool = True):
              self.status = status
      
          def validate_value(self, value: Number) -> Number:
              if not (value % 2 == 0 and self.status):
                  raise ValueError(...)
              return value
      
      
      SmallInt = Annotated[int, LT(max_value=256)]
      PositiveInt = Annotated[int, GT(0)]
      
      # Extending a "custom" type is just as easy as builtins
      MyInt = Annotated[PositiveInt, IsEven()]
      
      # To combine two existing "custom" types (ie: compose two types with existing validation), we can add a small
      # ComposedAnnotation helper:
      #
      # typing doesn't support variadic generics [1], so we'll "constrain" the Generic to 2 but can support more at runtime
      # (mypy will just complain when >2).
      #
      # 1: https://github.com/python/typing/issues/193
      AnnotatedLeft = TypeVar("AnnotatedLeft")
      AnnotatedRight = TypeVar("AnnotatedRight")
      
      
      class ComposedAnnotation(Generic[AnnotatedLeft, AnnotatedRight]):
          def __class_getitem__(cls, hints: Tuple[Any, ...]) -> Any:
              if not isinstance(hints, tuple):
                  hints = (hints,)
              hint, *others = hints
              hint_type = get_args(hint)[0]
              for other in others:
                  type_, *annotations = get_args(other)
                  if type_ is not hint_type:  # Probably get smarter/looser here
                      raise ValueError(f"Mismatched types: {hint_type} and {type_}!")
                  hint = Annotated[(hint, *annotations)]
              return hint
      
          # This is only called when ComposedAnnotation is unsubscripted - subscription returns an Annotated. Otherwise, it
          # satisfies mypy when the returned hint is "initialized".
          def __new__(cls, *args, **kwargs):
              raise TypeError("Type ComposedAnnotation cannot be instantiated.")
      
      
      SmallPositiveInt = ComposedAnnotation[SmallInt, PositiveInt]
      SmallPositiveInt(5)  # confirm mypy is ok
      
      
      class Model(BaseModel):
          x: SmallPositiveInt
      
      
      print(Model(x=5))  # Doesn't actually run validate yet
      $ mypy composition.py
      Success: no issues found in 1 source file
      $ python3 composition.py
      x=5
  4. Annotated[<type>, chain, of, validators] seems to fit in quite cleanly with The Big Onion

    • As you note in that issue, these are orthogonal, but compatible changes.
    • The sequence of validation steps shown in the type hint (left -> right) would match the nesting inside the onion
      • Annotated[str, GT(...), LT(...), ...] -> Onion(*[a for a in get_args(annotation) if isinstance(a, Validator]])

Regarding:

What we have now makes intuitive sense to most developers, I think abandoning that and forcing all code to use Annotated would piss a lot of people off.

This makes sense, though I wonder how Annotated will be picked up in the wider community (seems largely dependent on projects like this). I think there's likely ways we could keep the backwards compatibility, though defining some of the order-of-operations and error cases will be a bit tricky. If this is too big an issue, I think a handful of these things could still be built as a wrapper around pydantic (eg: JacobHayes/pydantic-annotated), though I'd of course love to see things pulled upstream!

...

If you made it this far, thanks for the time. I know I'm likely getting quite ahead of myself here (both in designs and general "new to the project" etiquette), but as you can tell, I'm quite excited about the project! I was planning to build (a much worse subset of) this as part of a new data engineering tool before finding the project and seeing how strong a foundation there is. For my uses, I'll be adding dozens/hundreds of new validators and metadata (ex: validators like "reference" and metadata/statistics like "count"), many of which are type independent. That's a large reason I find the "decomposed" pattern so helpful. Anyway, I hope this is well received (even if turned down 😁)!

@PrettyWood
Copy link
Member

Just my 2 cents on this topic.
I feel like splitting into many classes will be in fact more verbose
If I had to write it down, I guess I would like only 2 classes like this

class Pokemon(BaseModel):
    name: Annotated[str, Constraints(min_length=1)] = Field('pikachu', alias='Name')
With this kind of code
from dataclasses import dataclass
from enum import Enum
from typing import Annotated, Any, Callable, Optional

from pydantic import BaseModel

NoArgAnyCallable = Callable[[], Any]


class UnsetEnum(Enum):
    v = 0

Unset = UnsetEnum.v


@dataclass
class Constraints:
    immutable: bool = False

    # number
    gt: float = None
    ge: float = None
    lt: float = None
    le: float = None
    multiple_of: float = None

    # str
    regex: str = None

    # str, list, ... (collections)
    min_length: int = None
    max_length: int = None


@dataclass
class Field:
    default: Any = Unset
    default_factory: Optional[NoArgAnyCallable] = None
    alias: str = None
    title: str = None
    description: str = None

This way we keep all restrictions, constraints on the type on the left side and all the metadata of the field on the right side.

@JacobHayes
Copy link
Contributor Author

JacobHayes commented Jul 14, 2021

Agreed that many classes will be more verbose - particularly on the author side but I think less so on the user side.

One downside with a Constraints style class is that it still requires either changes to pydantic for additional checks OR splitting validation between "builtin" checks and "second class" @validator ones (perhaps Constraints could take *args of these to bridge?). The @validator ones also don't have a chance to influence the rendered .schema or other metadata, unlike the "builtin" checks. I think we could further tidy/unify those two ways of doing the ~same thing, but that may not be a huge motivator/barrier to folks.

That issue is probably less prominent for Field because of the **extra passthrough - we don't need to modify any behavior aside from the automatic rendering of the new metadata.

@selimb
Copy link
Contributor

selimb commented Nov 17, 2021

I've been looking through the issues on constrained types, Annotated and mypy. Here's a thought, one which I think hasn't come up yet -- apologies if it has and I missed it.

This way we keep all restrictions, constraints on the type on the left side and all the metadata of the field on the right side.

I think that's a great idea. Additionally, what if instead of passing pydantic.Field (or its arguments) to Annotated, we could just pass constr & friends:

class Pokemon(BaseModel):
    name: Annotated[str, constr(min_length=1)] = Field('pikachu', alias='Name')
    abilities: Annotated[str, conlist(constr(to_lower=True), min_length=4)]

or generally Annotated[result_type, constrained_type].

Here's why I think that's better than effectively passing jsonschema properties (like min_length and gt):

  • Discoverability. I personally find it odd that Field includes constraints for all the types, and I've been bit by this a few times:
class Box(pydantic.BaseModel):
    stuff: List[str] = pydantic.Field(min_length=1)  # OOPS, should be min_items
  • Nested types. As brought up in Constrained types and mypy support #3080, I fail to see how one would nest validators (as in my first snippet) with Field/Annotated at the moment, or with the suggestions above -- apologies if I missed something!
  • Reuse? While subclassing is verbose for ad-hoc types, it's potentially nice for reuse, and plays nicely with nested types as well.

The only downside (that I can see) is that the type is somewhat duplicated and may be out of sync, e.g. Annotated[int, constr(...)]. It might be possible to verify this in the mypy plugin though?

@JacobHayes
Copy link
Contributor Author

Not sure exactly where https://github.com/annotated-types/annotated-types is going, but if that ends up being compatible with pydantic, I think that would cover the OP ("decompose field components into Annotated")!

@JacobHayes
Copy link
Contributor Author

Gonna close this as it looks like it will be supported with a very similar pattern come v2 using annotated-types! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants