Skip to content

Latest commit

 

History

History
729 lines (492 loc) · 23.6 KB

pep-0646.rst

File metadata and controls

729 lines (492 loc) · 23.6 KB

PEP: 0646 Title: Variadic Generics Author: Mark Mendoza <mendoza.mark.a@gmail.com>,

Matthew Rahtz <mrahtz@google.com>, Vincent Siles <vsiles@fb.com>

Sponsor: Guido van Rossum <guido@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-Sep-2020 Python-Version: 3.10 Post-History: 07-Oct-2020

Abstract

PEP 484 introduced TypeVar, enabling creation of generics parameterised with a single type. In this PEP, we introduce TypeTuple, enabling parameterisation with an arbitrary number of types - that is, a variadic type variable, enabling variadic generics. This allows the type of array-like structures in numerical computing libraries such as NumPy and TensorFlow to be parameterised with the array shape, enabling static type checkers to catch shape-related bugs in code that uses these libraries.

Motivation

There are two main use-cases for variadic generics.

The primary motivation is to enable typing of array shapes in numerical computing libraries such as NumPy and TensorFlow. This is the use-case much of the PEP will focus on.

Additionally, variadic generics allow us to concisely specify the type signature of map and zip.

We discuss each of these three motivations below.

Array Shapes

In the context of numerical computation with libraries such as NumPy and TensorFlow, the shape of arguments is often just as important as the argument type. For example, consider the following function which converts a batch [2] of videos to grayscale:

def to_gray(videos: Tensor): ...

From the signature alone, it is not obvious what shape of array [3] we should pass for the videos argument. Possibilities include, for example,

batch × time × height × width × channels

and

time × batch × channels × height × width. [4]

Ideally, we should have some way of making the required shape clear in the signature itself. Multiple proposals [9] [10] [11] have suggested the use of the standard generics syntax for this purpose. We would write:

def to_gray(videos: Tensor[Time, Batch, Height, Width, Channels]): ...

However, note that arrays can be of arbitrary rank - Tensor as used above is generic in an arbitrary number of axes. One way around this would be to use a different Tensor class for each rank...

Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')

class Tensor1(Generic[Axis1]): ...

class Tensor2(Generic[Axis1, Axis2]): ...

...but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s and so on throughout their code) and for the authors of tensor libraries (who would have to duplicate implementations throughout multiple classes).

Variadic generics are necessary for a Tensor that is generic in an arbitrary number of axes to be cleanly defined as a single class.

map and zip

PEP 612 [7] introduced ParamSpec, enabling parameter types of one callable to be forwarded to another callable. However, in many cases we actually wish to transform parameter types before using them elsewhere in the signature.

Consider, for example, the signature of map for a particular choice of function and iterables:

def func(int, str) -> float: ...
iter1: List[int]
iter2: List[str]

def map(func: Callable[[int, str], float],
        iter1: Iterable[int],
        iter2: Iterable[str]) -> Iterable[float]: ...

Note that the number of iterables passed to map is dependent on the supplied function, and that the types of those iterables must correspond to the types of supplied function's arguments.

A similar example is zip:

iter1: List[int]
iter2: List[str]

def zip(iter1: Iterable[int],
        iter2: Iterable[str]) -> Iterable[Tuple[int, str]]: ...

Neither of these signatures can be specified in the general case using existing typing mechanisms. The signature of zip, for example, is currently specified [12] with a number of overloads.

Specification

In order to support the above use-cases, we introduce:

  • TypeTupleVar, a TypeVar that acts as a placeholder not for a single type but for an arbitrary number of types.
  • A new syntax for parameterizing generic functions and classes using a type tuple variable.
  • Two new type operators, Apply and Map.

These are described in detail below.

Type Tuple Variables

In the same way that a normal type variable is a stand-in for a single type, a type tuple variable is a stand-in for an arbitrary number of types in a flat ordered list.

Type tuple variables are created with:

from typing import TypeTupleVar

Ts = TypeTupleVar('Ts')

A type tuple variable behaves in a similar way to a parameterized Tuple. For example, in a generic object instantiated with type parameters int and str, Ts behaves similarly to Tuple[int, str].

Parameterizing Types: Star Operator

One use of type tuple variables are to parameterize variadic types such as Tuple.

To differentiate type tuple variables from normal type variables, we introduce a new use for the star operator:

Tuple[*Ts]

The star operator here serves to 'expand' the type tuple into its component types. For example, in a generic object instantiated with Ts being int and str, then Tuple[*Ts] would be equivalent to Tuple[int, str].

For consistency, the star operator can also be applied directly to a parameterised Tuple:

Types = Tuple[int, str, bool, float, double]
Tuple[*Types]  # Also valid

Parameterizing Types: Expand

Because the new use of the star operator requires a syntax change and is therefore incompatible with previous versions of Python, we also introduce the Expand type operator for use in existing versions of Python. Expand behaves identically to the star operator, but without requiring a syntax change. In any place you would normally write *Ts, you can also write Expand[Ts].

Parameterizing Function Signatures and Classes

Type tuple variables can be used anywhere a normal TupleVar can. For example, in class definitions, function signatures, and variable annotations:

Shape = TypeTupleVar('Shape')

class Tensor(Generic[*Shape]):

    def __init__(self, shape: Tuple[int, ...]):
      self.shape: Shape = shape

    def __abs__(self) -> Tensor[*Shape]: ...

    def __add__(self, other: Tensor[*Shape]) -> Tensor[*Shape]: ...

class Height: pass
class Width: pass
x: Tensor[Height, Width] = Tensor(shape=(640, 480))
x.shape     # Inferred type is Tuple[Height, Width]
y = abs(x)  # Tensor[Height, Width]
z = x + y   # Tensor[Height, Width]

Unexpanded Type Tuple Variables

Until now, we have always expanded type tuple variables. However, type tuple variables can also be used without being expanded. When used in this way, the type tuple variable behaves like a Tuple parameterised by the types that the type tuple variable is bound to. That is:

def foo(x: Tuple[*Ts]) -> Tuple[*Ts]: ...
# could also be written as
def foo(x: Ts) -> Ts: ...

Type tuple variables can also be used unexpanded in in the context of generic classes. However, note that when used in this way, type parameters to the generic class must be explicitly enclosed in a Tuple.

class Foo(Generic[Ts]): ...

foo: Foo[Tuple[int, str]]

See Concatenating Multiple type tuple Variables below for why this is important.

*args as a Type Tuple Variable

PEP 484 states that when a type annotation is provided for *args, each argument must be of the type annotated. That is, if we specify *args to be type int, then all arguments must be of type int. This limits our ability to specify the type signatures of functions that take heterogeneous argument types.

If *args is annotated as being an expanded type tuple variable, however, the types of the individual arguments become the types in the type tuple:

def args_to_tuple(*args: *Ts) -> Tuple[*Ts]: ...

args_to_tuple(1, 'a')  # Inferred type is Tuple[int, str]

Inside the body of args_to_tuple, the type of args is Tuple[*Ts] (with *Ts substituted for the actual types at runtime).

Note that, for consistency, the following is also valid syntactically:

def foo(*args: *Tuple[int, str]): ...

However, since it is a strange thing to do (why not just specify the arguments directly as arg1: int, arg2: str?), we recommend type checkers emit a warning when coming across such annotations.

Also note that when a type tuple variable is used in this way, it must be in conjunction with the star operator:

def foo(*args: Ts): ...  # NOT valid

Finally, note that a type tuple variable may not be used as the type of **kwargs. (We do not yet know of a use-case for this feature, so prefer to leave the ground fresh for a potential future PEP.)

def foo(**kwargs: *Ts): ...  # NOT valid

Map

To enable typing of functions such as map and zip, we introduce the Map type operator. Not to be confused with the existing operator typing.Mapping, Map is analogous to map, but for types:

from typing import Map

def args_to_tuples(*args: *Ts) -> Map[Tuple, Ts]: ...

args_to_tuples(1, 'a')  # Inferred type is Tuple[Tuple[int], Tuple[str]]

Map takes two operands. The first operand is a parameterizable type (or type alias [#type_aliases]) such as Tuple or List. The second operand is a type tuple variable or a parameterized Tuple such as Tuple[int, str]. The result of Map is a Tuple, where the Nth type in the Tuple is the first operand parameterized by the Nth type in the second operand.

Because Map returns a parameterized Tuple, it can be used anywhere that a type tuple variable would be. For example:

# Equivalent to 'arg1: List[T1], arg2: List[T2], ...'
def foo(*args: *Map[List, Ts]): ...

# Equivalent to '-> Tuple[List[T1], List[T2], ...]'
def bar(*args: *Ts) -> Map[List, Ts]: ...

bar()        # Inferred type is Tuple[()] (an empty tuple)
bar(1)       # Inferred type is Tuple[List[int]]
bar(1, 'a')  # Inferred type is Tuple[List[int], List[str]]

map and zip

Map allows us to specify the signature of map as:

ArgTs = TypeTupleVar('ArgTs')
ReturnT = TypeVar('ReturnT')

def map(func: Callable[[*ArgTs], ReturnT],
        *iterables: *Map[Iterable, ArgTs]) -> Iterable[ReturnT]: ...

def func(int, str) -> float: ...
# ArgTs is bound to Tuple[int, str]
# Map[Iterable, ArgTs] is Iterable[int], Iterable[str]
# Therefore, iter1 must be type Iterable[int],
#        and iter2 must be type Iterable[str]
map(func, iter1, iter2)

Similarly, we can specify the signature of zip as:

def zip(*iterables: *Map[Iterable, ArgTs]) -> Iterable[*ArgTs]): ...

l1: List[int]
l2: List[str]
zip(l1, l2)  # Iterable[int, str]

Nesting

Because the type of the result of Map is the same as the type of its second operand, the result of one Map can be used as the input to another Map:

Map[Tuple, *Map[Tuple, Ts]]  # Valid!

Accessing Individual Types

Map allows us to operate on types in a bulk fashion. For situations where we require access to each individual type, overloads can be used with individual TypeVar instances in place of the type tuple variable:

Shape = TypeTupleVar('Shape')
Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')
Axis3 = TypeVar('Axis3')

class Tensor(Generic[*Shape]): ...

@overload
class Tensor(Generic[Axis1, Axis2]):

  def transpose(self) -> Tensor[Axis2, Axis1]: ...

@overload
class Tensor(Generic[Axis1, Axis2, Axis3]):

  def transpose(self) -> Tensor[Axis3, Axis2, Axis1]: ...

Concatenating Other Types to a Type Tuple Variable

If a type tuple variable appears with other types in the same type parameter list, the effect is to concatenate those types with the types in the type tuple variable:

Shape = TypeTupleVar('Shape')
class Batch: pass
class Height: pass
class Width: pass

class Tensor(Generic[*Shape]): ...

def add_batch(x: Tensor[*Shape]) -> Tensor[Batch, *Shape]: ...

x: Tensor[Height, Width]
add_batch(x)  # Inferred type is Tensor[Batch, Height, Width]

Type tuple variables can also be combined with regular TypeVar instances:

T1 = TypeVar('T1')
T2 = TypeVar('T2')

class Foo(Generic[T1, T2, *Ts]): ...

foo: Foo[int, str, bool, float]  # T1=int, T2=str, Ts=Tuple[bool, float]

Concatenating Multiple Type Tuple Variables

If multiple type tuple variables appear in a parameter list, in order to prevent ambiguity about which types would be bound to which type tuple variables, the type tuple variables must not be expanded:

# NOT allowed
class Bar(Generic[*Ts1, *Ts2]): ...
# How would we decide which types are bound to Ts1
# and which are bound to Ts2?
bar: Bar[int, str, bool]

# The right way
class Bar(Generic[Ts1, Ts2]): ...
bar: Bar[Tuple[int], Tuple[str, bool]]

Rationale and Rejected Ideas

Supporting Variadicity Through aliases

As noted in the introduction, it is possible to avoid variadic generics by simply defining aliases for each possible number of type parameters:

class Tensor1(Generic[Axis1]): ...
class Tensor2(Generic[Axis1, Axis2]): ...

However, this seems somewhat clumsy - it requires users to unnecessarily pepper their code with 1s, 2s, and so on for each rank necessary.

Naming of Map

One downside to the name Map is that it might suggest a hash map. We considered a number of different options for the name of this operator.

  • ForEach. This is rather long, and we thought might imply a side-effect.
  • Transform. The meaning of this isn't obvious enough at first glance.
  • Apply. This is inconsistent with apply, an older Python function which enabled conversion of iterables to arguments before the star operator was introduced.

In the end, we decided that Map was good enough.

Naming of TypeTupleVar

TypeTupleVar began as ListVariadic, based on its naming in an early implementation in Pyre.

We then changed this to TypeVar(list=True), on the basis that a) it better emphasises the similarity to TypeVar, and b) the meaning of 'list' is more easily understood than the jargon of 'variadic'.

We finally settled on TypeTupleVar based on the justification that c) this emphasises the tuple-like behaviour, and d) type tuple variables are a sufficiently different kind of thing to regular type variables that we may later wish to support keyword arguments to its constructor that should not be supported by regular type variables (such as arbitrary_len [14]).

Accessing Individual Types Without Overloads

We chose to support access to individual types in the type tuple variable using overloads (see the Accessing Individual Types section). One alternative would have been to allow explicit access to arbitrary parts of the type tuple variable - for example, through indexing:

def foo(t: Tuple[*Ts]):
  x: Ts[1] = t[1]

We decided to omit this mechanism from this PEP because a) it adds complexity, b) we were not aware of any use-cases that need it, and c) if it turns out to be needed in the future, it can easily be added in a future PEP.

Integer Generics

Consider a function such as np.tile:

x = np.zeros((3,))      # A tensor of length 3
y = np.tile(x, reps=2)  # y is now length 6

Intuitively, we would specify the signature of such a function as:

@overload
def tile(A: Tensor[N], reps: Literal[2]) -> Tensor[2*N]: ...
# ...and other overloads for different values of `reps`

N is sort of like a type variable. However, type variables stand in for types, whereas here we want N to stand in for a particular value. N should be some sort of 'integer type variable'.

(Note that N could not be created as simply TypeTupleVar('N', bound=int). This would state that N could stand for an int or any subtype of int. For our signature above, we would need N to stand for any instance of type int.)

We decided to omit integer type variables for this PEP, postponing it for a future PEP when necessary.

Integer Parameterization

The examples of this PEP have parameterised tensor types using the semantic meaning of each axes, e.g. Tensor[Batch, Time]. However, we may also wish to parameterize using the actual integer value of each part of the shape, such as Tensor[Literal[64], Literal[64]].

There are two aspects related to such integer parameterization that we decided to ignore in this PEP:

Examples of integer parameterization. Thought it clearly is valid to parameterize with literal types, we wish to encourage the use of semantic labelling of tensor axes wherever possible: having each axis labelled serves as extra protection against mistakes when manipulating axes.

Syntactic sugar for integer parameterization. Typing Literal is cumbersome; ideally, we could write Tensor[64, 64] as syntactic sugar for Tensor[Literal[64], Literal[64]]. However, this would require an inconsistency: because of forward referencing, Tensor['Batch'] and Tensor[Literal['Batch']] mean different things. For this to work, we would have to stipulate this sugar only applies for integers. We leave this discussion for a future PEP. (If you do wish to employ such types in your code currently, we recommend import Typing.Literal as L enabling the much shorter L[64].)

Checking the Number of Types in a Variadic Generic

Consider reduction operations, which behave as:

x = np.zeros((2, 3, 5))
reduce_sum(x, axis=0)    # Shape (3, 5)
reduce_sum(x, axis=1)    # Shape (2, 5)

One way to compactly specify the signature of these operations would be write something like:

Shape = TypeTupleVar('Shape')

# Tensor of rank N goes in, tensor of rank N-1 comes out
def reduce_sum(x: Tensor[Shape[N]], axis: int) -> Tensor[Shape[N-1]]: ...

Shape[N] here states that number of types in Shapes is bound to N, where N is some object that we can perform arithmetic on.

Lacking an urgent use-case for this feature, we omit it from this PEP, leaving it to a future PEP if necessary.

(Note that reduction operations are only used as an example here. Reduction functions can in fact be typed without this feature, using overloads:

@overload
def reduce_sum(x: Tensor[A, B], axis: Literal[0]) -> Tensor[B]: ...

@overload
def reduce_sum(x: Tensor[A, B], axis: Literal[1]) -> Tensor[A]: ...

...

Although more verbose, typing reduction operations this way is superior to the approach above, since it preserves information about which axis has been removed.)

Backwards Compatibility

TODO

  • Tuple needs to be upgraded to support parameterization with a a type tuple variable.

Reference Implementation

TODO

Footnotes

[1]A third potential use is in enabling higher-kinded types that take an arbitrary number of type operands, but we do not discuss this use here.
[2]'Batch' is machine learning parlance for 'a number of'.
[3]We use the term 'array' to refer to a matrix with an arbitrary number of dimensions. In NumPy, the corresponding class is the ndarray; in TensorFlow, the Tensor; and so on.
[4]If the shape begins with 'batch × time', then videos_batch[0][1] would select the second frame of the first video. If the shape begins with 'time × batch', then videos_batch[1][0] would select the same frame.
[5]In the case of **kwargs, we mean the Nth argument as it appears in the function definition, not the Nth keyword argument specified in the function call.
[6]For example, in asyncio [13], it is convenient to define a type alias _FutureT = Union[Future[_T], Generator[Any, None, _T], Awaitable[_T]]. We should also be able to apply Map to alias - e.g. Map[_FutureT, Ts].

References

[7]PEP 612, "Parameter Specification Variables": https://www.python.org/dev/peps/pep-0612
[8]PEP 484, "Type Hints": https://www.python.org/dev/peps/pep-0484
[9]Static typing of Python numeric stack: https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN
[10]Ideas for array shape typing in Python: https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/edit
[11]Shape annotation syntax proposal: https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9osY/edit
[12]typeshed/builtins.pyi: https://github.com/python/typeshed/blob/27dfbf68aaffab4f1ded7dc1b96f6f82f536a09d/stdlib/2and3/builtins.pyi#L1710-L1733
[13]typeshed/asyncio/tasks.pyi: https://github.com/python/typeshed/blob/193c7cb93283ad4ca2a65df74c565e56bfe72b7e/stdlib/3/asyncio/tasks.pyi#L45-L154
[14]Discussion on Python typing-sig mailing list: https://mail.python.org/archives/list/typing-sig@python.org/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/

Acknowledgements

Thank you to Alfonso Castaño, Antoine Pitrou, Bas v.B., David Foster, Dimitris Vardoulakis, Guido van Rossum, Jia Chen, Lucio Fernandez-Arjona, Nikita Sobolev, Peilonrayz, Pradeep Kumar Srinivasan, Rebecca Chen, Sergei Lebedev and Vladimir Mikulik for helpful feedback and suggestions on drafts of this PEP.

Thank you especially to Pradeep for numerous key contributions, including pointing out that unexpanded type tuples allow for clean concatenation of multiple type tuples, and to Lucio, for suggesting the star syntax, which has made multiple aspects of this proposal much more concise and intuitive.

Resources

Discussions on variadic generics in Python started in 2016 with Issue 193 on the python/typing GitHub repository.

Inspired by this discussion, Ivan Levkivskyi made a concrete proposal at PyCon 2019, summarised in Type system improvements and Static typing of Python numeric stack.

Expanding on these ideas, Mark Mendoza and Vincent Siles gave a presentation on Variadic Type Variables for Decorators and Tensors at the 2019 Python Typing Summit.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.