Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 646: Add some broader context #1904

Merged
merged 8 commits into from
Apr 18, 2021
136 changes: 135 additions & 1 deletion pep-0646.rst
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
Rationale and Rejected Ideas
============================

Shape Arithmetic
----------------

Considering the use case of array shapes in particular, note that as of
this PEP, it is not yet possible to describe arithmetic transformations
of array dimensions - for example,
``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
this out-of-scope for the current PEP, but plan to propose additional
mechanisms that *will* enable this in a future PEP.

Supporting Variadicity Through Aliases
--------------------------------------

Expand Down Expand Up @@ -743,11 +753,135 @@ is available in `cpython/23527`_. A preliminary version of the version
using the star operator, based on an early implementation of PEP 637,
is also available at `mrahtz/cpython/pep637+646`_.

Appendix: The Broader Landscape of Array Typing
===============================================

To give this PEP additional context for those particularly interested in the
array typing use-case, here we briefly discuss design considerations
for the venture of array typing in general.

Shaped Types vs Named Axes
--------------------------

A common source of bugs in machine learning programs is incorrect selection of
axes. For example, if we have an image stored in an array of shape 64×64x3,
we might wish to convert to black-and-white by computing the mean over the third
axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
hard to spot and will produce a result that means something completely different
(all while likely allowing the program to keep on running, resulting in a bug
that is serious but silent).

In response, some libraries have implemented so-called 'named tensors' (note:
in this context, 'tensor' is synonymous with 'array'), in which
axes are referred to not by index but by label - e.g. ``mean(image, axis='channels')``.
While this ameliorates many problems, we still consider it insufficient for three
reasons:

* **Interface documentation** is still not possible with this approach. If a function should
*only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
with named tensors.
* **Static checking** of shape correctness is still not possible.
* **Poor uptake**. Because each library much implement its own form of named tensors,
the burden on library maintainers in onerous - and as a result, named tensors have not
mrahtz marked this conversation as resolved.
Show resolved Hide resolved
seen as widespread support as we might have hoped for.
mrahtz marked this conversation as resolved.
Show resolved Hide resolved

Can the 'named tensors' approach be combined with the approach we advocate for in
this PEP? We're not sure yet. One area of overlap is that in some contexts, we could do, say,
``image: Array[Height, Width, Channels]; mean(image, axis=Channels)``. Ideally,
we might write something like ``image: Array[Height=64, Width=64, Channels=3]`` -
but this won't be possible in the short term, due to the rejection of PEP 637.
In any case, our attitude towards this is mostly "Wait and see what happens before
taking any further steps".

Named Axes vs Literal Shape Specification
-----------------------------------------

As we mentioned in `Summary Examples`_, named axes are not the only use of this PEP.
Instead of using axis names to parameterise array types, we could also parameterise
with the actual sizes, e.g. ``Array[Literal[64], Literal[64], Literal[3]]``. This approach
more naturally fits with the idea of shape arithmetic - perhaps in combination
with something like a type variable-like object but for integers, such that we could
write ``Array[N] -> Array[2*N]``.

As of writing, we're genuinely unsure which approach will be most fruitful long-term.
Here too, out attitude is "Take only the steps which we're confident are universally
mrahtz marked this conversation as resolved.
Show resolved Hide resolved
beneficial (this PEP) and wait and see what happens from there before committing to
a specific path."

Meaning and Scope of Axis Names
-------------------------------

If we attach a name like ``Batch`` to a particular axis, what does that actually mean?
Is that name a placeholder for some actual value, like 32 - such that we're stipulating
that all arrays with an axis named `Batch` must have the same batch size?

And if so, what should the scope of such labels be? The local function? The module?
The whole program?

Or is the name ``Batch`` merely referring to the identity, the semantic meaning, of
the axis? Under this interpretation, different arrays with an axis named ``Batch`` could
have different sizes, and ``Batch`` would only serve as indication of which axis
served as the batch-like axis.

And if so, again, what should the scope of such labels be? Should we consider all axes labelled
as 'Batch' to have the same identity, even if in completely different modules?

We espouse the second view of axis names - names as indicative of semantic identity,
but *not* of actual size. To put it another way, we think that axis names should behave
like variable *types*, not variable *names*:

mrahtz marked this conversation as resolved.
Show resolved Hide resolved
mrahtz marked this conversation as resolved.
Show resolved Hide resolved
::

# x can take any value, but the value should be int-like
def foo(x: int):
...

# x can have any shape, but the first axis should be batch-like
def bar(x: Array[Batch]):
...

The justification is flexibility. Tying names to specific values is overly
restrictive. For example, it must be possible for a library of helper functions
to operate on different batch sizes (given, perhaps, different batch sizes for
train and test). We could define different names - ``Batch1``, ``Batch2`` and so
on - for different functions - but a) this would result in a huge amount of
boilerplate, and b) semantic identity is often the more important property
to verify in the first place ("Is the batch axis the first or second one?").

What about scoping? Here too, we believe that the standard rules of parametric
typing in Python already provide the right solution. Scoping is explicit, based
on the identity of the subtype in question:

::

# module1.py

class Batch: pass

# `Batch` refers to the same thing in both of these,
# because they both use the type `module1.Batch`.
def foo(x: Array[Batch]): ...
def bar() -> Array[Batch]: ...

# module2.py

class Batch: pass

# This `Batch` refers to something different, because
# it's `module2.batch`, not `module1.Batch`.
def baz() -> Array[Batch]: ...

# module3.py

from module1 import Batch

# This array is compatible with `foo` from `module1`,
# because it uses `module1.Batch`.
def qux() -> Array[Batch]: ...

Footnotes
==========


.. [#batch] 'Batch' is machine learning parlance for 'a number of'.

.. [#array] We use the term 'array' to refer to a matrix with an arbitrary
Expand Down