New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for external annotations in the typing module #600

Open
till-varoquaux opened this Issue Dec 13, 2018 · 10 comments

Comments

Projects
None yet
4 participants
@till-varoquaux
Copy link

till-varoquaux commented Dec 13, 2018

We propose adding an Annotated type to the typing module to decorate existing types with context-specific metadata. Specifically, a type T can be annotated with metadata x via the typehint Annotated[T, x]. This metadata can be used for either static analysis or at runtime. If a library (or tool) encounters a typehint Annotated[T, x] and has no special logic for metadata x, it should ignore it and simply treat the type as T. Unlike the no_type_check functionality that current exists in the typing module which completely disables typechecking annotations on a function or a class, the Annotated type allows for both static typechecking of T (e.g., via MyPy or Pyre, which can safely ignore x) together with runtime access to x within a specific application. We believe that the introduction of this type would address a diverse set of use cases of interest to the broader Python community.

Motivating examples:

READING binary data

The struct module provides a way to read and write C structs directly from their byte representation. It currently relies on a string representation of the C type to read in values:

record = b'raymond   \x32\x12\x08\x01\x08'
name, serialnum, school, gradelevel = unpack('<10sHHb', record)

The documentation suggests using a named tuple to unpack the values and make this a bit more tractable:

from collections import namedtuple
Student = namedtuple('Student', 'name serialnum school gradelevel')
Student._make(unpack('<10sHHb', record))
# Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)

However, this recommendation is somewhat problematic; as we add more fields, it's going to get increasingly tedious to match the properties in the named tuple with the arguments in unpack.

Instead, annotations can provide better interoperability with a type checker or an IDE without adding any special logic outside of the struct module:

from typing import NamedTuple
UnsignedShort = Annotated[int, struct.ctype('H')]
SignedChar = Annotated[int, struct.ctype('b')]

@struct.packed
class Student(NamedTuple):
  # MyPy typechecks 'name' field as 'str' 
  name: Annotated[str, struct.ctype("<10s")]
  serialnum: UnsignedShort
  school: SignedChar
  gradelevel: SignedChar

# 'unpack' only uses the metadata within the type annotations
Student.unpack(record))
# Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)

dataclasses

Here's an example with dataclasses that is a problematic from the typechecking standpoint:

from dataclasses import dataclass, field

@dataclass
class C:
  myint: int = 0
  # the field tells the @dataclass decorator that the default action in the
  # constructor of this class is to set "self.mylist = list()"
  mylist: List[int] = field(default_factory=list)

Even though one might expect that mylist is a class attribute accessible via C.mylist (like C.myint is) due to the assignment syntax, that is not the case. Instead, the @dataclass decorator strips out the assignment to this attribute, leading to an AttributeError upon access:

C.myint  # Ok: 0
C.mylist  # AttributeError: type object 'C' has no attribute 'mylist'

This can lead to confusion for newcomers to the library who may not expect this behavior. Furthermore, the typechecker needs to understand the semantics of dataclasses and know to not treat the above example as an assignment operation in (which translates to additional complexity).

It makes more sense to move the information contained in field to an annotation:

@dataclass
class C:
    myint: int = 0
    mylist: Annotated[List[int], field(default_factory=list)]

# now, the AttributeError is more intuitive because there is no assignment operator
C.mylist  # AttributeError

# the constructor knows how to use the annotations to set the 'mylist' attribute 
c = C()
c.mylist  # []

The main benefit of writing annotations like this is that it provides a way for clients to gracefully degrade when they don't know what to do with the extra annotations (by just ignoring them). If you used a typechecker that didn't have any special handling for dataclasses and the field annotation, you would still be able to run checks as though the type were simply:

class C:
    myint: int = 0
    mylist: List[int]

lowering barriers to developing new types

Typically when adding a new type, we need to upstream that type to the typing module and change MyPy, PyCharm, Pyre, pytype, etc. This is particularly important when working on open-source code that makes use of our new types, seeing as the code would not be immediately transportable to other developers' tools without additional logic (this is a limitation of MyPy plugins, which allow for extending MyPy but would require a consumer of new typehints to be using MyPy and have the same plugin installed). As a result, there is a high cost to developing and trying out new types in a codebase. Ideally, we should be able to introduce new types in a manner that allows for graceful degradation when clients do not have a custom MyPy plugin, which would lower the barrier to development and ensure some degree of backward compatibility.

For example, suppose that we wanted to add support for tagged unions to Python. One way to accomplish would be to annotate TypedDict in Python such that only one field is allowed to be set:

Currency = Annotated(
  TypedDict('Currency', {'dollars': float, 'pounds': float}, total=False),
  TaggedUnion,
)

This is a somewhat cumbersome syntax but it allows us to iterate on this proof-of-concept and have people with non-patched IDEs work in a codebase with tagged unions. We could easily test this proposal and iron out the kinks before trying to upstream tagged union to typing, MyPy, etc. Moreover, tools that do not have support for parsing the TaggedUnion annotation would still be able able to treat Currency as a TypedDict, which is still a close approximation (slightly less strict).

Details of proposed changes to typing

Syntax

Annotated is parameterized with a type and an arbitrary list of Python values that represent the annotations. Here are the specific details of the syntax:

  • The first argument to Annotated must be a valid typing type
  • Multiple type annotations are supported (Annotated supports variadic arguments): Annotated[int, ValueRange(3, 10), ctype("char")]
  • When called with no extra arguments Annotated returns the underlying value: Annotated[int] == int
  • The order of the annotations is preserved and matters for equality checks: Annotated[int, ValueRange(3, 10), ctype("char")] != Annotated[int, ctype("char"), ValueRange(3, 10)]
  • Nested Annotated types are flattened, with metadata ordered starting with the innermost annotation: Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] == Annotated[int, ValueRange(3, 10), ctype("char")]``
  • Duplicated annotations are not removed: Annotated[int, ValueRange(3, 10)] != Annotated[int, ValueRange(3, 10), ValueRange(3, 10)]

consuming annotations

Ultimately, the responsibility of how to interpret the annotations (if at all) is the responsibility of the tool or library encountering the Annotated type. A tool or library encountering an Annotated type can scan through the annotations to determine if they are of interest (e.g., using isinstance).

Unknown annotations:
When a tool or a library does not support annotations or encounters an unknown annotation it should just ignore it and treat annotated type as the underlying type. For example, if we were to add an annotation that is not an instance of struct.ctype to the annotation for name (e.g., Annotated[str, 'foo', struct.ctype("<10s")]), the unpack method should ignore it.

Namespacing annotations:
We do not need namespaces for annotations since the class used by the annotations acts as a namespace.

Multiple annotations:
It's up to the tool consuming the annotations to decide whether the client is allowed to have several annotations on one type and how to merge those annotations.

Since the Annotated type allows you to put several annotations of the same (or different) type(s) on any node, the tools or libraries consuming those annotations are in charge of dealing with potential duplicates. For example, if you are doing value range analysis you might allow this:

T1 = Annotated[int, ValueRange(-10, 5)]
T2 = Annotated[T1, ValueRange(-20, 3)]

Flattening nested annotations, this translates to:

T2 = Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)]

An application consuming this type might choose to reduce these annotations via an intersection of the ranges, in which case T2 would be treated equivalently to Annotated[int, ValueRange(-10, 3)].

An alternative application might reduce these via a union, in which case T2 would be treated equivalently to Annotated[int, ValueRange(-20, 5)].

In this example whether we reduce those annotations using union or intersection can be context dependant (covarient vs contravariant); this is why we have to preserve all of them and let the consumers decide how to merge them.

Other applications may decide to not support multiple annotations and throw an exception.

related bugs

  • issues 482: Mixing typing and non-typing information in annotations has some discussion about this problem but none of the proposed solutions (using intersection types, passing dictionaries of annotations) seemed to garner enough steam. We hope this solution is non-intrusive and compelling enough to make it in the standard library.
@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 3, 2019

I like this proposal. It will allow easy experimentation with type system features. I have two questions:

  • Why do we need to allow the single-argument form Annotated[int]?
  • Should we allow subscripting annotated types, for example currently one can write Vec = List[Tuple[T, T]] (where T is a type variable), and then Vec[int] will be equivalent to List[Tuple[int, int]]. Should we allow (and how) Vec = Annotated[List[Tuple[T, T]], MaxLen(10)]; v: Vec[int]?
@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 3, 2019

Also an organizational question: do we need a (mini-)PEP for this? I would say yes (this post can be a starting point for a draft). @gvanrossum what do you think?

@till-varoquaux

This comment has been minimized.

Copy link

till-varoquaux commented Jan 3, 2019

@ilevkivskyi
_ We don't really need the single-argument form, I was just trying to be consistent with Union
_ Yes, supporting generic aliases would be great and would allow us to give more of a first class feeling to some of the Annotated type when we're experimenting.

I'm happy to turn this into a PEP if you think that this is the way forward.

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 4, 2019

We don't really need the single-argument form

OK, let's remove it.

I'm happy to turn this into a PEP if you think that this is the way forward.

I think this is the best way forward (in particular because it aims at improving cross-typechecker compatibility). If you will start working on it, then I would recommend skipping the dataclass example, it doesn't look very convincing TBH.

@valtron

This comment has been minimized.

Copy link

valtron commented Jan 6, 2019

If you haven't already considered this, I think Annotated[None, ...] should be allowed for cases where you want the type to be inferred, but still want to add metadata.

Another use case for this: Annotated[T, ClassVar, Final]; the current approach (e.g. ClassVar[T]) is kind of hacky since ClassVar/Final aren't really types, just specially-cased versions of Annotated.

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 6, 2019

None may be a valid type, so I would rather propose Annotated[..., Immutable] with literal ..., see also #276.

@pfalcon

This comment has been minimized.

Copy link

pfalcon commented Jan 9, 2019

I was referred here from #604. Indeed, this would cover a usecase there too. But for me this looks a bit too verbose, the same concern as for communicated in #482 .

I like the idea of "PEP" in the sense of "more thorough document", though I'm not sure if it would be Python-as-a-language wide PEP, or just local PEP for "typing" project/module.

In either case, considering and discussing different alternatives is a must for PEP, and I'd like a list of other options considered and comments re: them.

Here's my 2 cents:

  1. Why instead Annotated[Type, Ann1, Ann2, ...] not have (Type, Ann1, Ann2, ...) ? By unbreaking grammar a bit, this would allow to annotate entire functions by putting just comma-separate annotations in return annotation position, e.g.:
def foo() -> NoReturn, NoRaise:  # Perhaps an infinite loop?

For function params/variables, parens would be required, but it's still less visual noise than Annotated, which is also a pretty long word, which will easily cause need for line wrapping when annotating e.g. a class method with more than a couple params.

  1. I find it interesting that in the description above, a typo with parens instead of square brackets is made:
Currency = Annotated(
  TypedDict('Currency', {'dollars': float, 'pounds': float}, total=False),
  TaggedUnion,
)

So, given that Annotated isn't really a typing annotation itself, but a kind of meta-annotation, perhaps blindly following typing types syntax isn't a requirement, and other alternatives can be considered. I'm not sure about implementation complications, but imagine that by overriding __new__ and/or __call__ it's doable. Using call syntax would allow to use keyword arguments for example.

Let me clarify again that the above is just to list possible alternative solutions. I'm clearly in favor of p.1 ;-). And while using keyword args is always cute, this statement from the original description:

We do not need namespaces for annotations since the class used by the annotations acts as a namespace.

Indeed, we can't namespace keywords. So, effectively we're trading:

Annotate(T, very_readable_but_not_namespaced_keyword="10s")

with

Annotate[T, my_module.VeryReadableAnnotationName("10s")]

So, choosing between original proposal and p.2, I'm in favor of the original proposal: namespaces are important to avoid incompatibilities and mess.

But even more so I'm in favor of p.1, and would be keen to hear criticism of it.

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 10, 2019

Why instead Annotated[Type, Ann1, Ann2, ...] not have (Type, Ann1, Ann2, ...) ?

There are several reasons that I see:

  • Annotated types can appear in a nested position, that can cause confusions: Callable[[A, B], C] is too similar to Callable[[(A, B)], C], but the meaning would be very different.
  • With the (...) syntax users cannot create generic aliases, for example Vec = Annotated[List[Tuple[T, T]], MaxLen(10)]; Vec[int] will work, but the same will not work with (...) syntax.
  • This will be simpler for people who use annotations for runtime purposes (they will not need to expect a tuple in any position, we can use the same API as for other typing constructs).
@pfalcon

This comment has been minimized.

Copy link

pfalcon commented Jan 10, 2019

There are several reasons that I see:
...

I see, makes sense. The only thing then is to see if someone would be able to come up with something shorter than Annotated. Though fairly speaking one can just do A = Annotated or whatever, so that's covered too.

@ilevkivskyi

This comment has been minimized.

Copy link
Collaborator

ilevkivskyi commented Jan 10, 2019

Though fairly speaking one can just do A = Annotated or whatever, [...]

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment