python · gvanrossum · Apr 18, 2021 · Mar 31, 2021 · Mar 31, 2021 · Apr 1, 2021
diff --git a/pep-0646.rst b/pep-0646.rst
@@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
 Rationale and Rejected Ideas
 ============================
 
+Shape Arithmetic
+----------------
+
+Considering the use case of array shapes in particular, note that as of
+this PEP, it is not yet possible to describe arithmetic transformations
+of array dimensions - for example,
+``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
+this out-of-scope for the current PEP, but plan to propose additional
+mechanisms that *will* enable this in a future PEP.
+
 Supporting Variadicity Through Aliases
 --------------------------------------
 
@@ -743,11 +753,220 @@ is available in `cpython/23527`_. A preliminary version of the version
 using the star operator, based on an early implementation of PEP 637,
 is also available at `mrahtz/cpython/pep637+646`_.
 
+Appendix: The Broader Landscape of Array Typing
+===============================================
+
+To give this PEP additional context for those particularly interested in the
+array typing use-case, here we briefly discuss design considerations
+for the venture of array typing in general.
+
+Shaped Types vs Named Axes
+--------------------------
+
+A common source of bugs in machine learning programs is incorrect selection of
+axes. For example, if we have an image stored in an array of shape 64×64x3,
+we might wish to convert to black-and-white by computing the mean over the third
+axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
+hard to spot and will produce a result that means something completely different
+(all while likely allowing the program to keep on running, resulting in a bug
+that is serious but silent).
+
+In response, some libraries have implemented so-called 'named tensors' (in this context,
+'tensor' is synonymous with 'array'), in which
+axes are referred to not by index but by label - e.g. ``mean(image, axis='channels')``.
+While this ameliorates many problems, we still consider it insufficient for three
+reasons:
+
+* **Interface documentation** is still not possible with this approach. If a function should
+  *only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
+  with named tensors.
+* **Static checking** of shape correctness is still not possible.
+* **Poor uptake**. Because each library much implement its own form of named tensors,
+  the burden on library maintainers in onerous - and as a result, named tensors have not
+  seen as widespread support as we might have hoped for.
+
+Can the 'named tensors' approach be combined with the approach we advocate for in
+this PEP? We're not sure yet. One area of overlap is that in some contexts, we could do, say,
+``image: Array[Height, Width, Channels]; mean(image, axis=Channels)``. Ideally,
+we might write something like ``image: Array[Height=64, Width=64, Channels=3]`` -
+but this won't be possible in the short term, due to the rejection of PEP 637.
+In any case, our attitude towards this is mostly "Wait and see what happens before
+taking any further steps".
+
+Named Axes vs Literal Shape Specification
+-----------------------------------------
+
+As we mentioned in `Summary Examples`_, named axes are not the only use of this PEP.
+Instead of using axis names to parameterise array types, we could also parameterise
+with the actual sizes, e.g. ``Array[Literal[64], Literal[64], Literal[3]]``. This approach
+more naturally fits with the idea of shape arithmetic - perhaps in combination
+with something like a type variable-like object but for integers, such that we could
+write ``Array[N] -> Array[2*N]``.
+
+As of writing, we're genuinely unsure which approach will be most fruitful long-term.
+Here too, out attitude is "Take only the steps which we're confident are universally
+beneficial (this PEP) and wait and see what happens from there before committing to
+a specific path."
+
+Meaning and Scope of Axis Names
+-------------------------------
+
+If we attach a name like ``Batch`` to a particular axis, what does that actually mean?
+Is that name a placeholder for some actual value, like 32 - such that we're stipulating
+that all arrays with an axis named ``Batch`` must have the same batch size?
+
+And if so, what should the scope of such labels be? The local function? The module?
+The whole program?
+
+Or is the name ``Batch`` merely referring to the identity, the semantic meaning, of
+the axis? Under this interpretation, different arrays with an axis named ``Batch`` could
+have different sizes, and ``Batch`` would only serve as indication of which axis
+served as the batch-like axis.
+
+And if so, again, what should the scope of such labels be? Should we consider all axes labelled
+as 'Batch' to have the same identity, even if in completely different modules?
+
+We ourselves are unsure of what the right answer is. Below we explore two options
+in the design space.
+
+Names as Local Sizes
+''''''''''''''''''''
+
+One approach would be to adopt the first view with local scoping rules. For example, we could write:
+
+::
+
+    def matrix_matrix_muliply(x: Array[K, N], Array[N, M]) -> Array[K, M]: ...
+    def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
+
+Here, all axes sharing the label ``N`` would be constrained to have the same size within
+a given signature, but there would be no relationship between ``N`` in *different* signatures.
+Such a local scoping rule would be important to avoid forcing, say, all functions within one
+module using a label ``Batch`` to use the same batch size.
+
+The disadvantage of this approach is that we have no ability to enforce shape properties across
+different calls. For example, we can't address the problem mentioned in `Motivation`_: if
+one function returns an array with leading dimensions 'Time × Batch', and another function
+takes the same array assuming leading dimensions 'Batch × Time', we have no way of detecting this.
+(Even allowing for broader scoping rules would not completely address the problem. For example,
+we might have a special kind of label that referred to the same size throughout the whole program.
+But then, what if ``Time`` and ``Batch`` were the same size?)
+
+The main advantage is that in some cases, axis sizes really are what we care about. This is true
+for both simple linear algebra operations such as the matrix manipulations above, but also in more
+complicated transformations such as convolutional layers in neural networks, where it would be of
+great utility to the programmer to be able to inspect the array size after each layer using
+static analysis.
+
+Names as Semantic Identity
+''''''''''''''''''''''''''
+
+A second approach (the one that most of the examples in this PEP are based around)
+would be to have names constrain axis *type* but not axis *size*.
+
+This would enable us to solve the problem of enforcing shape properties across calls.
+For example:
+
+::
+
+    # lib.py
+
+    class Batch: pass
+    class Time: pass
+
+    def make_array() -> Array[Batch, Time]: ...
+
+    # user.py
+
+    from lib import Batch, Time
+
+    # `Batch` and `Time` have the same identity as in `lib`,
+    # so must take array as produced by `lib.make_array`
+    def use_array(x: Array[Batch, Time]): ...
+
+In many cases, this is the more important thing to verify; we care more about
+which axis is which than what the specific size of each axis is.
+
+It also does not preclude use cases where we wish to describe shape transformations
+where we don't know semantic identity ahead of time. For example, we can write:
+
+::
+
+    K = TypeVar('K')
+    N = TypeVar('N')
+
+    def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
+
+We can then use this with:
+
+    class Batch: pass
+    class Values: pass
+
+    batch_of_values: Array[Batch, Values]
+    value_weights: Array[Values]
+    matrix_vector_multiply(batch_of_values, value_weights)
+    # Result is Array[Batch]
+
+How Flexible is this PEP?
+'''''''''''''''''''''''''
+
+The approach described in the previous section is straightforwardly
+compatible with this PEP. It is, essentially, standard parametric subtyping
+as applied to arrays.
+
+What might not be so obvious is that, using ``Literal``, we can *also* apply Python's
+rules for parametric subtyping to the other case, too! For example:
+
+::
+
+    K = TypeVar('K')
+    N = TypeVar('N')
+
+    def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
+
+    a: Array[Literal[64], Literal[32]]
+    b: Array[Literal[32]]
+    matrix_vector_multiply(a, b)
+    # Result is Array[Literal[64]]
+
+However, note that the two approaches are mutually exclusive in user code. Users
+can verify size or semantic type but not both.
+
+As of this PEP, we are agnostic about which approach will provide most benefit.
+Since the features introduced in this PEP are compatible with both approaches, however,
+we leave the door open.
+
+Why Not Both?
+'''''''''''''
+
+Consider the following 'normal' code:
+
+::
+
+    def f(x: int): ...
+
+Note that we have symbols for both the value of the thing (``x``) and the type of
+the thing (``int``). Why can't we do the same with axes? For example, with an imaginary
+syntax, we could write:
+
+::
+
+    def f(array: Array[TimeValue: TimeType]): ...
+
+This would allow us to access the axis size (say, 32) through the symbol ``TimeValue``
+*and* the type through the symbol ``TypeType``.
+
+This might even be possible using existing syntax, through a second level of parameterisation:
+
+::
+
+   def f(array: array[TimeValue[TimeType]]): ..
+
+However, we leave exploration of this approach to the future.
 
 Footnotes
 ==========
 
-
 .. [#batch] 'Batch' is machine learning parlance for 'a number of'.
 
 .. [#array] We use the term 'array' to refer to a matrix with an arbitrary