Skip to content

New PEP: Reduce-Map Comprehensions (and the "last" builtin) #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions pep-9999.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
=========
Reduce-Map Comprehensions (and the "last" builtin)
=========

Abstract
--------

This PEP proposes the addition of new "Reduce-Map" comprehension. The Reduce-Map comprehension would perform a reduction while keeping intermediate elements. We also propose a built-in function "last", which, combined with a Reduce-Map comprehension, would allow us to replace the :code:`reduce` builtin with a call to get the last element of a Reduce-Map generator.

Rationale
--------

One current awkwardness of Python comes about when you want to perform a chain of operations and save all intermediate processing steps. Let's use the common example of computing an Exponential Moving Average.

.. code-block:: python

def exponential_moving_average(signal: Iterable[float], decay: float, initial_value: float=0.):
running_average = []
average = initial_value
for xt in signal:
average = (1-decay)*average + decay*xt
running_average.append(average)
return running_average

signal = [math.sin(i*0.01) + random.normalvariate(0, 0.1) for i in range(1000)]
smooth_signal = exponential_moving_average(signal, decay=0.05)

A more savvy Pythonista may rewrite this as a generator:

.. code-block:: python

def exponential_moving_average(signal: Iterable[float], decay: float, initial_value: float=0.):
average = initial_value
for xt in signal:
average = (1-decay)*average + decay*xt
yield average

signal = [math.sin(i*0.01) + random.normalvariate(0, 0.1) for i in range(1000)]
smooth_signal = list(exponential_moving_average(signal, decay=0.05))

Still, this seems like an awful lot of lines to express such a simple thing.


Proposal
--------

**Reduce-Map Comprehensions**

We propose the syntax:

.. code-block:: python

smooth_signal = [average = (1-decay)*average + decay*x for x in signal from average=0.]

- Like in a reduce, the :code:`average` variable would be updated and carried forward through iterations.
- The :code:`from` keyword, normally reserved for imports, would be repurposed here to specify an initial value.
- If :code:`average` were not specified in the initial conditions, it could (1) be taken from the scope, or (2) raise a :code:`NameError`.

**The "last" builtin**

Because it may be a common case to only want the last element in this expression (as it is when you use reduce), we propose an additional :code:`last` builtin function to handle this nicely. For example, if you just wanted the last element of the moving average, you could go:

.. code-block:: python

last_smooth_element = last(average = (1-decay)*average + decay*x for x in signal from average=0.)

When used on a ReduceMap generator, the :code:`last` builtin would

- Take the final value produced by the iterable (if the iterable yields any items) OR
- Take the initial value (defined in the :code:`from ...` block) otherwise.

When used on a normal generator (or a Reduce-Map generator without a :code:`from ...` initializer), :code:`last` would behave like:

.. code-block:: python

def last_on_normal_generator(generator):
"""The proposed `last` builtin, as it would behave on a non-ReduceMap generator"""
x = next(generator)
for x in generator:
pass
return x

Like :code:`next`, this would throw a :code:`StopIteration` if given an empty generator with no :code:`from ...` initializer.


Extension: Specifying the value to keep
--------

It may be the case that there are variables in the loop that you want to carry forward through the reduction, but that you do not want in the result. An example that comes to mind is running a Recurrent Neural Network (RNN). In an RNN, we have an update function:

.. code-block:: python

def rnn_step(input_data, last_hidden_state):
"""
:param Array[n_samples,n_input_dim] input_data: The input data at the current time step
:param Array[n_samples,n_hidden_dim] last_hidden_state: The hidden state from the last time step
:return Tuple[Array[n_samples,n_output_dim], Array[n_samples,n_hidden_dim]]: The output and hidden state.
"""
... # Compute update here
return output_data, new_hidden_state


Using the aformentioned Reduce-Map comprehension, we may run this as:

.. code-block:: python

output_and_hidden_generator = (current_output, hidden = rnn_step(current_input, hidden) for current_input in input_timeseries from hidden=np.zeros((n_samples, n_hidden_dim)))


Which would return a generator of 2-tuples.

However, we generally do not want to keep the hidden values (they are just use to carry forward internal state). The proposed extension is to enable the optional syntax:

.. code-block:: python

output_generator = (current_output, hidden = rnn_step(current_input, hidden) -> current_output for current_input in input_timeseries from hidden=np.zeros((n_samples, n_hidden_dim)))

Where the :code:`-> current_output` at the end signifies that we only want to keep the :code:`current_output` at each iteration in the result.

This saves us from the somewhat less readable alternatives of

- :code:`output_timeseries, _ = zip(*output_and_hidden_generator)` (which also prevents garbage collection of hidden states)
- :code:`output_timeseries = list(oh[0] for oh in output_and_hidden_generator)`

And allows the persion building the generator to control which variables should be internal, rather than the user of the generator.

Just as before, if we're only interested in the last output, we can use:

.. code-block:: python

final_output = last(current_output, hidden = rnn_step(current_input, hidden) -> current_output for current_input in input_timeseries from hidden=np.zeros((n_samples, n_hidden_dim)))

Conclusion
-----------

I think there are many situations where the ReduceMap comprehension would be useful. This construct is easily readable, would save many lines of code, and as a bonus would allow us to replace the :code:`reduce` builtin with a more readable, Pythonic comprehension.