Implement new Loop and Scan operators #191

ricardoV94 · 2023-01-10T14:10:29Z

Related to #189

This PR implements a new low level Loop Op which can be easily transpiled to Numba (the Python perform method takes 9 lines, yay to not having to support C in the future).

It also implements a new higher level Scan Op which returns as outputs the last states + intermediate states of a looping operation. This Op cannot be directly evaluated, and must be rewritten as a Loop Op in Python/Numba backends. For the JAX backend it's probably fine to transpile directly from this representation into a lax.scan as the signatures are pretty much identical. That was not done in this PR.

The reason for the two types of outputs, is that they are useful in different contexts. Final states are sometimes all one needs, whereas intermediate states are generally needed for back propagation (not implemented yet). This allows us to choose which one (or both) of the outputs we want during compilation, without having to do complicated graph analysis.

The existing save_mem_new_scan is used to convert a general scan into a loop that only returns the last computed state. It's... pretty complicated (although it also covers cases where more than 1 but less than all steps being requested, but OTOH it can't handle while loops #178):

pytensor/pytensor/scan/rewriting.py

Line 1119 in 8ad3317

def save_mem_new_scan(fgraph, node):

Taking that as a reference I would say the new conversion rewrite from Scan to Loop is much much simpler. Most of it is boilerplate code for defining the right trace inputs and new FunctionGraph

Both Ops expect a FunctionGraph as input. This should probably be created by a user-facing helper that accepts a callable like scan does now. ~~That was not done yet, as I first wanted to discuss the general design.~~ Done

Design issues

1. The current implementation of Loop assumes there are as many states as outputs of the inner function. This does not make sense for mapping or "filling" operations such as filling a tensor with random values. In one of the tests I had to create a dummy x input to accommodate this restriction. Should we use NoneConst to represent outputs that don't feed into the next state? I think there is something similar being done with the old Scan where the outputs_info must explicitly be None in these cases.

Scan and Loop can now take random types as inputs (scan can't return it as a sequence). This makes random seeding much more explicit compared to the old Scan, which was based on default updates of shared variables. However it highlights the awkwardness of the random API when we want to access the next random state. Should we perhaps add a return_rng_update to __call__, so that it doesn't hide the next rng state output?
Do we want to be able to represent empty Loop / Sequences? If so, how should we go about that? IfElse is one option, but perhaps it would be nice to represent it in the same Loop Op?
What do we want to do in terms of inplacing optimizations?

TODO

If people are on board with the approach

Implement Numba dispatch
Implement JAX dispatch
Implement L_op and R_op
Implement friendly user facing functions
Decide on which meta-parameters to preserve (mode, truncate_gradient, reverse and so on)
Add rewrite that replaces trace[-1] by the first set of outputs (final state). That way we can keep the old API, while retaining the benefit of doing while Scans without tracing when it's not needed.

ricardoV94 · 2023-01-10T14:26:29Z

pytensor/loop/op.py

+        assert input_state.type == output_state.type
+
+
+class Loop(Op):


TODO: Add mixin HasInnerGraph so that we can see the inner graph in debug_print

pytensor/loop/basic.py

aseyboldt · 2023-01-11T22:53:10Z

The current implementation of Loop assumes there are as many states as outputs of the inner function. This does not make sense for mapping or "filling" operations such as filling a tensor with random values. In one of the tests I had to create a dummy x input to accommodate this restriction. Should we use NoneConst to represent outputs that don't feed into the next state? I think there is something similar being done with the old Scan where the outputs_info must explicitly be None in these cases.

Wouldn't a fill loop look something like this?

state = (pt.scalar(0), pt.empty(shape, dtype), rng)
def update(idx, values, rng):
    value, rng = rng.normal()  # not exactly the api...
    values = pt.set_subtensor(values[idx], value)
    return (idx + 1, values, rng, idx < maxval)

(and very much need inplace rewrites for good performance...)

Scan and Loop can now take random types as inputs (scan can't return it as a sequence). This makes random seeding much more explicit compared to the old Scan, which was based on default updates of shared variables. However it highlights the awkwardness of the random API when we want to access the next random state. Should we perhaps add a return_rng_update to call, so that it doesn't hide the next rng state output?

Good question...
Don't know either :-)

Do we want to be able to represent empty Loop / Sequences? If so, how should we go about that? IfElse is one option, but perhaps it would be nice to represent it in the same Loop Op?

I think one rewrite that get's easier with the if-else-do-while approach would be loop invariant code motion. Let's say we have a loop like

x = bigarray...
if not_empty:
    val = 0
    do:
        val = (val + x.sum()) ** 2
    while val < 10

# rewrite to
x = bigarray...
if not_empty:
    val = 0
    x_sum = x.sum()
    do:
        val = (val + x_sum) ** 2
    while val < 10

we could move x.sum() out of the loop. But with a while loop we can't as easily, because we only want to do x.sum() if the loop is not empty, and where would we then put that computation?

What do we want to do in terms of inplacing optimizations?

Well, I guess we really need those :-)
I'm thinking it might be worth it to copy the initial state, and then donate the state to the inner function? And I guess we need to make sure rewrites are actually running on inner graphs as well...

ricardoV94 · 2023-01-12T07:27:55Z

we could move x.sum() out of the loop. But with a while loop we can't as easily, because we only want to do x.sum() if the loop is not empty, and where would we then put that computation?

Why can't we move it even if it's empty? Sum works fine. Are you worried about Ops that we know will fail with empty inputs?

About the filling Ops, yeah I don't see it as a problem anymore. Just felt awkward to create the dummy input when translating from scan to loop. I am okay with it now

aseyboldt · 2023-01-12T16:23:53Z

That would change the behavior. If we move it out and don't prevent it from being executed, things could fail for instance if there's an assert somewhere, or some other error happens during it's evaluation. Also, it could be potentially very costly (let's say "solve an ode").

(somehow I accidentally edited your comment instead of writing a new one, no clue how, but fixed now)

ricardoV94 · 2023-01-12T16:54:18Z

In my last commit, sequences are demoted from special citizens to just another constant input in the ScanOp. The user facing helper creates the right graph with indexing that is passed to the user provided function.

I have reverted converting the constant inputs to dummies before calling the user function, which allows the example in the jacobian documentation to work, including the one that didn't work before (because both are now equivalent under the hood :))

https://pytensor.readthedocs.io/en/latest/tutorial/gradients.html#computing-the-jacobian

I reverted too much, and I still need to pass dummy inputs as the state variables, since it doesn't make sense for the user function to introspect the graph beyond the initial state (since it's only valid for the initial state)

ferrine · 2023-01-13T13:45:28Z

pytensor/loop/basic.py

+    return last_states[1:], traces[1:]
+
+
+def map(


What about subclassing Scan into

Map(Scan)

Reduce(Scan)

Filter(Scan)

It will be easier to dispatch into optimized implementations

We can do that later, not convinced we need that yet

ferrine · 2023-01-13T13:50:47Z

pytensor/loop/basic.py

+        if init_state is None:
+            # next_state may reference idx. We replace that by the initial value,
+            # so that the shape of the dummy init state does not depend on it.
+            [next_state] = clone_replace(


Why not graph_replace or using memo for FunctionGraph(memo={symbolic_idx: idx}) (here)?

Why is that better?

ricardoV94 · 2023-01-13T16:04:05Z

Added a simple JAX dispatcher, works in the few examples I tried

ricardoV94 · 2023-01-13T16:08:46Z

pytensor/link/jax/dispatch/loop.py

+    #  explicitly triggers the optimization of the inner graphs of Scan?
+    update_fg = op.update_fg.clone()
+    rewriter = get_mode("JAX").optimizer
+    rewriter(update_fg)


This gives an annoying Supervisor Feature missing warning... gotta clean that up

ricardoV94 · 2023-01-14T07:56:39Z

pytensor/link/jax/dispatch/loop.py

+
+        print(max_iters)
+        states, traces = jax.lax.scan(
+            scan_fn, init=list(states), xs=None, length=max_iters


Todo: Check we are not missing performance by not having explicit sequences.

Todo: When there are multiple sequences PyTensor defines n_steps as the shortest sequence. JAX should be able to handle this, but if not we could consider not allowing sequences/n_steps with different lengths in the Pytensor scan.

Then we could pass a single shape as n_steps after asserting they are the same?

ricardoV94 · 2023-01-16T10:27:00Z

I just found out about TypedLists in PyTensor. That should allow us to trace any type of Variables, including RandomTypes 🤯

Pushed a couple of commits that rely on this.

Co-authored-by: Adrian Seyboldt <adrian.seyboldt@gmail.com>

This was not possible prior to use of TypedListType for non TensorVariable sequences, as it would otherwise not be possible to represent indexing of last sequence state, which is needed e.g., for shared random generator updates.

codecov-commenter · 2023-01-20T16:18:27Z

Codecov Report

Merging #191 (5bc7070) into main (958cd14) will increase coverage by 0.06%.
The diff coverage is 89.11%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #191      +/-   ##
==========================================
+ Coverage   80.03%   80.09%   +0.06%     
==========================================
  Files         170      173       +3     
  Lines       45086    45435     +349     
  Branches     9603     9694      +91     
==========================================
+ Hits        36085    36392     +307     
- Misses       6789     6818      +29     
- Partials     2212     2225      +13

Impacted Files	Coverage Δ
pytensor/compile/mode.py	`84.47% <ø> (ø)`
pytensor/loop/basic.py	`81.44% <81.44%> (ø)`
pytensor/loop/op.py	`90.29% <90.29%> (ø)`
pytensor/link/jax/dispatch/__init__.py	`100.00% <100.00%> (ø)`
pytensor/link/jax/dispatch/loop.py	`100.00% <100.00%> (ø)`
pytensor/link/utils.py	`60.30% <100.00%> (+0.12%)`	⬆️
pytensor/typed_list/basic.py	`89.27% <100.00%> (+0.38%)`	⬆️
pytensor/link/jax/dispatch/extra_ops.py	`74.62% <0.00%> (-20.90%)`	⬇️
pytensor/link/jax/dispatch/shape.py	`80.76% <0.00%> (-7.70%)`	⬇️
pytensor/link/jax/dispatch/basic.py	`79.03% <0.00%> (-4.84%)`	⬇️
... and 11 more

ricardoV94 · 2023-10-23T10:33:52Z

This Discourse thread is a great reminder of several Scan design issues that are fixed here: https://discourse.pymc.io/t/hitting-a-weird-error-to-do-with-rngs-in-scan-in-a-custom-function-inside-a-potential/13151/15

Namely:

Going to the root to find missing non-sequences (instead of using truncated_graph_inputs
Gradient only works by indexing non-sequences
Scans are very difficult to manipulate!!!

ricardoV94 added major request discussion scan labels Jan 10, 2023

ricardoV94 requested a review from aseyboldt January 10, 2023 14:10

ricardoV94 force-pushed the looping branch from 5b25de8 to 2f27cea Compare January 10, 2023 14:10

ricardoV94 requested a review from lucianopaz January 10, 2023 14:22

ricardoV94 commented Jan 10, 2023

View reviewed changes

ricardoV94 force-pushed the looping branch 9 times, most recently from 76a9b4c to f2a2c03 Compare January 11, 2023 15:27

ricardoV94 commented Jan 11, 2023

View reviewed changes

pytensor/loop/basic.py Outdated Show resolved Hide resolved

ricardoV94 force-pushed the looping branch from 72b2a6d to b0c0836 Compare January 12, 2023 16:49

ricardoV94 force-pushed the looping branch 2 times, most recently from 7bcd42c to 6c953b3 Compare January 13, 2023 11:17

ferrine reviewed Jan 13, 2023

View reviewed changes

ricardoV94 force-pushed the looping branch from 6c953b3 to d37e36e Compare January 13, 2023 16:03

ricardoV94 commented Jan 13, 2023

View reviewed changes

ricardoV94 force-pushed the looping branch from d37e36e to 2a6c950 Compare January 13, 2023 16:13

ricardoV94 commented Jan 14, 2023

View reviewed changes

ricardoV94 force-pushed the looping branch from 2a6c950 to 4257a49 Compare January 16, 2023 10:24

ricardoV94 force-pushed the looping branch 5 times, most recently from 5f15c5e to 32b4fb4 Compare January 20, 2023 14:53

ricardoV94 and others added 6 commits January 20, 2023 16:41

Implement new Loop and Scan Operators

3fe901d

Co-authored-by: Adrian Seyboldt <adrian.seyboldt@gmail.com>

Implement new scan constructor user facing functions

78bb829

Co-authored-by: Adrian Seyboldt <adrian.seyboldt@gmail.com>

Add JAX rewrite for new Scan Op

4da2f6e

Override __bool__ of TypedListType

e2fdf28

Allow non-TensorVariable types to be traced in new Scan Op

db7068e

Make scan helper return sequences to match old API

5bc7070

This was not possible prior to use of TypedListType for non TensorVariable sequences, as it would otherwise not be possible to represent indexing of last sequence state, which is needed e.g., for shared random generator updates.

ricardoV94 force-pushed the looping branch from 32b4fb4 to 5bc7070 Compare January 20, 2023 15:44

ricardoV94 mentioned this pull request Aug 10, 2023

Introduce scalars in compiled graphs via the FusionRewrite #349

Open

ricardoV94 mentioned this pull request Mar 12, 2024

idx_to_str appears to significantly slow down scan numba code by applying np.array to scan_inner_func inputs #233

Closed

ricardoV94 mentioned this pull request Mar 28, 2024

Allow Truncation of CustomDist pymc-devs/pymc#6947

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new Loop and Scan operators #191

Implement new Loop and Scan operators #191

ricardoV94 commented Jan 10, 2023 •

edited

ricardoV94 Jan 10, 2023 •

edited

aseyboldt commented Jan 11, 2023 •

edited

ricardoV94 commented Jan 12, 2023 •

edited by aseyboldt

aseyboldt commented Jan 12, 2023

ricardoV94 commented Jan 12, 2023 •

edited

ferrine Jan 13, 2023

ricardoV94 Jan 13, 2023

ferrine Jan 13, 2023 •

edited

ricardoV94 Jan 13, 2023

ricardoV94 commented Jan 13, 2023

ricardoV94 Jan 13, 2023 •

edited

ricardoV94 Jan 14, 2023

ricardoV94 commented Jan 16, 2023 •

edited

codecov-commenter commented Jan 20, 2023

ricardoV94 commented Oct 23, 2023 •

edited

Implement new Loop and Scan operators #191

Are you sure you want to change the base?

Implement new Loop and Scan operators #191

Conversation

ricardoV94 commented Jan 10, 2023 • edited

Design issues

TODO

ricardoV94 Jan 10, 2023 • edited

Choose a reason for hiding this comment

aseyboldt commented Jan 11, 2023 • edited

ricardoV94 commented Jan 12, 2023 • edited by aseyboldt

aseyboldt commented Jan 12, 2023

ricardoV94 commented Jan 12, 2023 • edited

ferrine Jan 13, 2023

Choose a reason for hiding this comment

ricardoV94 Jan 13, 2023

Choose a reason for hiding this comment

ferrine Jan 13, 2023 • edited

Choose a reason for hiding this comment

ricardoV94 Jan 13, 2023

Choose a reason for hiding this comment

ricardoV94 commented Jan 13, 2023

ricardoV94 Jan 13, 2023 • edited

Choose a reason for hiding this comment

ricardoV94 Jan 14, 2023

Choose a reason for hiding this comment

ricardoV94 commented Jan 16, 2023 • edited

codecov-commenter commented Jan 20, 2023

Codecov Report

ricardoV94 commented Oct 23, 2023 • edited

ricardoV94 commented Jan 10, 2023 •

edited

ricardoV94 Jan 10, 2023 •

edited

aseyboldt commented Jan 11, 2023 •

edited

ricardoV94 commented Jan 12, 2023 •

edited by aseyboldt

ricardoV94 commented Jan 12, 2023 •

edited

ferrine Jan 13, 2023 •

edited

ricardoV94 Jan 13, 2023 •

edited

ricardoV94 commented Jan 16, 2023 •

edited

ricardoV94 commented Oct 23, 2023 •

edited