python · gvanrossum · Apr 18, 2021 · Mar 31, 2021 · Mar 31, 2021 · Apr 1, 2021
diff --git a/pep-0646.rst b/pep-0646.rst
@@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
 Rationale and Rejected Ideas
 ============================
 
+Shape Arithmetic
+----------------
+
+Considering the use case of array shapes in particular, note that as of
+this PEP, it is not yet possible to describe arithmetic transformations
+of array dimensions - for example,
+``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
+this out-of-scope for the current PEP, but plan to propose additional
+mechanisms that *will* enable this in a future PEP.
+
 Supporting Variadicity Through Aliases
 --------------------------------------
 
@@ -743,11 +753,135 @@ is available in `cpython/23527`_. A preliminary version of the version
 using the star operator, based on an early implementation of PEP 637,
 is also available at `mrahtz/cpython/pep637+646`_.
 
+Appendix: The Broader Landscape of Array Typing
+===============================================
+
+To give this PEP additional context for those particularly interested in the
+array typing use-case, here we briefly discuss design considerations
+for the venture of array typing in general.
+
+Shaped Types vs Named Axes
+--------------------------
+
+A common source of bugs in machine learning programs is incorrect selection of
+axes. For example, if we have an image stored in an array of shape 64×64x3,
+we might wish to convert to black-and-white by computing the mean over the third
+axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
+hard to spot and will produce a result that means something completely different
+(all while likely allowing the program to keep on running, resulting in a bug
+that is serious but silent).
+
+In response, some libraries have implemented so-called 'named tensors' (note:
+in this context, 'tensor' is synonymous with 'array'), in which
+axes are referred to not by index but by label - e.g. ``mean(image, axis='channels')``.
+While this ameliorates many problems, we still consider it insufficient for three
+reasons:
+
+* **Interface documentation** is still not possible with this approach. If a function should
+  *only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
+  with named tensors.
+* **Static checking** of shape correctness is still not possible.
+* **Poor uptake**. Because each library much implement its own form of named tensors,
+  the burden on library maintainers in onerous - and as a result, named tensors have not
+  seen as widespread support as we might have hoped for.
+
+Can the 'named tensors' approach be combined with the approach we advocate for in
+this PEP? We're not sure yet. One area of overlap is that in some contexts, we could do, say,
+``image: Array[Height, Width, Channels]; mean(image, axis=Channels)``. Ideally,
+we might write something like ``image: Array[Height=64, Width=64, Channels=3]`` -
+but this won't be possible in the short term, due to the rejection of PEP 637.
+In any case, our attitude towards this is mostly "Wait and see what happens before
+taking any further steps".
+
+Named Axes vs Literal Shape Specification
+-----------------------------------------
+
+As we mentioned in `Summary Examples`_, named axes are not the only use of this PEP.
+Instead of using axis names to parameterise array types, we could also parameterise
+with the actual sizes, e.g. ``Array[Literal[64], Literal[64], Literal[3]]``. This approach
+more naturally fits with the idea of shape arithmetic - perhaps in combination
+with something like a type variable-like object but for integers, such that we could
+write ``Array[N] -> Array[2*N]``.
+
+As of writing, we're genuinely unsure which approach will be most fruitful long-term.
+Here too, out attitude is "Take only the steps which we're confident are universally
+beneficial (this PEP) and wait and see what happens from there before committing to
+a specific path."
+
+Meaning and Scope of Axis Names
+-------------------------------
+
+If we attach a name like ``Batch`` to a particular axis, what does that actually mean?
+Is that name a placeholder for some actual value, like 32 - such that we're stipulating
+that all arrays with an axis named `Batch` must have the same batch size?
+
+And if so, what should the scope of such labels be? The local function? The module?
+The whole program?
+
+Or is the name ``Batch`` merely referring to the identity, the semantic meaning, of
+the axis? Under this interpretation, different arrays with an axis named ``Batch`` could
+have different sizes, and ``Batch`` would only serve as indication of which axis
+served as the batch-like axis.
+
+And if so, again, what should the scope of such labels be? Should we consider all axes labelled
+as 'Batch' to have the same identity, even if in completely different modules?
+
+We espouse the second view of axis names - names as indicative of semantic identity,
+but *not* of actual size. To put it another way, we think that axis names should behave
+like variable *types*, not variable *names*:
+
+::
+
+    # x can take any value, but the value should be int-like
+    def foo(x: int):
+      ...
+
+    # x can have any shape, but the first axis should be batch-like
+    def bar(x: Array[Batch]):
+      ...
+
+The justification is flexibility. Tying names to specific values is overly
+restrictive. For example, it must be possible for a library of helper functions
+to operate on different batch sizes (given, perhaps, different batch sizes for
+train and test). We could define different names - ``Batch1``, ``Batch2`` and so
+on - for different functions - but a) this would result in a huge amount of
+boilerplate, and b) semantic identity is often the more important property
+to verify in the first place ("Is the batch axis the first or second one?").
+
+What about scoping? Here too, we believe that the standard rules of parametric
+typing in Python already provide the right solution. Scoping is explicit, based
+on the identity of the subtype in question:
+
+::
+
+    # module1.py
+
+    class Batch: pass
+
+    # `Batch` refers to the same thing in both of these,
+    # because they both use the type `module1.Batch`.
+    def foo(x: Array[Batch]): ...
+    def bar() -> Array[Batch]: ...
+
+    # module2.py
+
+    class Batch: pass
+
+    # This `Batch` refers to something different, because
+    # it's `module2.batch`, not `module1.Batch`.
+    def baz() -> Array[Batch]: ...
+
+    # module3.py
+
+    from module1 import Batch
+
+    # This array is compatible with `foo` from `module1`,
+    # because it uses `module1.Batch`.
+    def qux() -> Array[Batch]: ...
 
 Footnotes
 ==========
 
-
 .. [#batch] 'Batch' is machine learning parlance for 'a number of'.
 
 .. [#array] We use the term 'array' to refer to a matrix with an arbitrary