Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 646: Add some broader context #1904

Merged
merged 8 commits into from
Apr 18, 2021
Merged
221 changes: 220 additions & 1 deletion pep-0646.rst
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
Rationale and Rejected Ideas
============================

Shape Arithmetic
----------------

Considering the use case of array shapes in particular, note that as of
this PEP, it is not yet possible to describe arithmetic transformations
of array dimensions - for example,
``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
this out-of-scope for the current PEP, but plan to propose additional
mechanisms that *will* enable this in a future PEP.

Supporting Variadicity Through Aliases
--------------------------------------

Expand Down Expand Up @@ -743,11 +753,220 @@ is available in `cpython/23527`_. A preliminary version of the version
using the star operator, based on an early implementation of PEP 637,
is also available at `mrahtz/cpython/pep637+646`_.

Appendix: The Broader Landscape of Array Typing
===============================================

To give this PEP additional context for those particularly interested in the
array typing use-case, here we briefly discuss design considerations
for the venture of array typing in general.

Shaped Types vs Named Axes
--------------------------

A common source of bugs in machine learning programs is incorrect selection of
axes. For example, if we have an image stored in an array of shape 64×64x3,
we might wish to convert to black-and-white by computing the mean over the third
axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
hard to spot and will produce a result that means something completely different
(all while likely allowing the program to keep on running, resulting in a bug
that is serious but silent).

In response, some libraries have implemented so-called 'named tensors' (in this context,
'tensor' is synonymous with 'array'), in which
axes are referred to not by index but by label - e.g. ``mean(image, axis='channels')``.
While this ameliorates many problems, we still consider it insufficient for three
reasons:

* **Interface documentation** is still not possible with this approach. If a function should
*only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
with named tensors.
* **Static checking** of shape correctness is still not possible.
* **Poor uptake**. Because each library much implement its own form of named tensors,
the burden on library maintainers in onerous - and as a result, named tensors have not
mrahtz marked this conversation as resolved.
Show resolved Hide resolved
seen as widespread support as we might have hoped for.
mrahtz marked this conversation as resolved.
Show resolved Hide resolved

Can the 'named tensors' approach be combined with the approach we advocate for in
this PEP? We're not sure yet. One area of overlap is that in some contexts, we could do, say,
``image: Array[Height, Width, Channels]; mean(image, axis=Channels)``. Ideally,
we might write something like ``image: Array[Height=64, Width=64, Channels=3]`` -
but this won't be possible in the short term, due to the rejection of PEP 637.
In any case, our attitude towards this is mostly "Wait and see what happens before
taking any further steps".

Named Axes vs Literal Shape Specification
-----------------------------------------

As we mentioned in `Summary Examples`_, named axes are not the only use of this PEP.
Instead of using axis names to parameterise array types, we could also parameterise
with the actual sizes, e.g. ``Array[Literal[64], Literal[64], Literal[3]]``. This approach
more naturally fits with the idea of shape arithmetic - perhaps in combination
with something like a type variable-like object but for integers, such that we could
write ``Array[N] -> Array[2*N]``.

As of writing, we're genuinely unsure which approach will be most fruitful long-term.
Here too, out attitude is "Take only the steps which we're confident are universally
mrahtz marked this conversation as resolved.
Show resolved Hide resolved
beneficial (this PEP) and wait and see what happens from there before committing to
a specific path."

Meaning and Scope of Axis Names
-------------------------------

If we attach a name like ``Batch`` to a particular axis, what does that actually mean?
Is that name a placeholder for some actual value, like 32 - such that we're stipulating
that all arrays with an axis named ``Batch`` must have the same batch size?

And if so, what should the scope of such labels be? The local function? The module?
The whole program?

Or is the name ``Batch`` merely referring to the identity, the semantic meaning, of
the axis? Under this interpretation, different arrays with an axis named ``Batch`` could
have different sizes, and ``Batch`` would only serve as indication of which axis
served as the batch-like axis.

And if so, again, what should the scope of such labels be? Should we consider all axes labelled
as 'Batch' to have the same identity, even if in completely different modules?

We ourselves are unsure of what the right answer is. Below we explore two options
in the design space.

mrahtz marked this conversation as resolved.
Show resolved Hide resolved
Names as Local Sizes
''''''''''''''''''''

One approach would be to adopt the first view with local scoping rules. For example, we could write:

::

def matrix_matrix_muliply(x: Array[K, N], Array[N, M]) -> Array[K, M]: ...
def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...

Here, all axes sharing the label ``N`` would be constrained to have the same size within
a given signature, but there would be no relationship between ``N`` in *different* signatures.
Such a local scoping rule would be important to avoid forcing, say, all functions within one
module using a label ``Batch`` to use the same batch size.

The disadvantage of this approach is that we have no ability to enforce shape properties across
different calls. For example, we can't address the problem mentioned in `Motivation`_: if
one function returns an array with leading dimensions 'Time × Batch', and another function
takes the same array assuming leading dimensions 'Batch × Time', we have no way of detecting this.
(Even allowing for broader scoping rules would not completely address the problem. For example,
we might have a special kind of label that referred to the same size throughout the whole program.
But then, what if ``Time`` and ``Batch`` were the same size?)

The main advantage is that in some cases, axis sizes really are what we care about. This is true
for both simple linear algebra operations such as the matrix manipulations above, but also in more
complicated transformations such as convolutional layers in neural networks, where it would be of
great utility to the programmer to be able to inspect the array size after each layer using
static analysis.

Names as Semantic Identity
''''''''''''''''''''''''''

A second approach (the one that most of the examples in this PEP are based around)
would be to have names constrain axis *type* but not axis *size*.

This would enable us to solve the problem of enforcing shape properties across calls.
For example:

::

# lib.py

class Batch: pass
class Time: pass

def make_array() -> Array[Batch, Time]: ...

# user.py

from lib import Batch, Time

# `Batch` and `Time` have the same identity as in `lib`,
# so must take array as produced by `lib.make_array`
def use_array(x: Array[Batch, Time]): ...

In many cases, this is the more important thing to verify; we care more about
which axis is which than what the specific size of each axis is.

It also does not preclude use cases where we wish to describe shape transformations
where we don't know semantic identity ahead of time. For example, we can write:

::

K = TypeVar('K')
N = TypeVar('N')

def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...

We can then use this with:

class Batch: pass
class Values: pass

batch_of_values: Array[Batch, Values]
value_weights: Array[Values]
matrix_vector_multiply(batch_of_values, value_weights)
# Result is Array[Batch]

mrahtz marked this conversation as resolved.
Show resolved Hide resolved
How Flexible is this PEP?
'''''''''''''''''''''''''
mrahtz marked this conversation as resolved.
Show resolved Hide resolved

The approach described in the previous section is straightforwardly
compatible with this PEP. It is, essentially, standard parametric subtyping
as applied to arrays.

What might not be so obvious is that, using ``Literal``, we can *also* apply Python's
rules for parametric subtyping to the other case, too! For example:

::

K = TypeVar('K')
N = TypeVar('N')

def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...

a: Array[Literal[64], Literal[32]]
b: Array[Literal[32]]
matrix_vector_multiply(a, b)
# Result is Array[Literal[64]]

However, note that the two approaches are mutually exclusive in user code. Users
can verify size or semantic type but not both.

As of this PEP, we are agnostic about which approach will provide most benefit.
Since the features introduced in this PEP are compatible with both approaches, however,
we leave the door open.

Why Not Both?
'''''''''''''

Consider the following 'normal' code:

::

def f(x: int): ...

Note that we have symbols for both the value of the thing (``x``) and the type of
the thing (``int``). Why can't we do the same with axes? For example, with an imaginary
syntax, we could write:

::

def f(array: Array[TimeValue: TimeType]): ...

This would allow us to access the axis size (say, 32) through the symbol ``TimeValue``
*and* the type through the symbol ``TypeType``.

This might even be possible using existing syntax, through a second level of parameterisation:

::

def f(array: array[TimeValue[TimeType]]): ..
mrahtz marked this conversation as resolved.
Show resolved Hide resolved

However, we leave exploration of this approach to the future.
mrahtz marked this conversation as resolved.
Show resolved Hide resolved

Footnotes
==========


.. [#batch] 'Batch' is machine learning parlance for 'a number of'.

.. [#array] We use the term 'array' to refer to a matrix with an arbitrary
Expand Down