Add support for buffer donation (input/output aliasing) #1733

hawkinsp · 2019-11-21T14:37:45Z

Currently JAX cannot reuse input buffers to a computation for outputs. This means that for a typical neural network training step, we require enough space to store 2 copies of the weights simultaneously in memory.

XLA supports input/output aliasing, which would allow JAX to tell XLA that it may reuse the input weight buffers for output weights, but we haven't yet enabled it from JAX.

There are two basic ways we could try to use XLA's support:
a) opportunistically, i.e., if we detect that the reference count of a buffer is 1 at execution time, we could allow XLA to reuse it. This is somewhat problematic in that it's pretty hard to tell whether the reference count is truly 1 during the Execute() call, because the caller may hold references.

One way around this might be to distinguish between (a) the Python Execute() call, and (b) the time that execution actually takes place, by which time the Python references may have been dropped.

b) explicitly. Here the user would provide an argument to jit or pmap, something like:

@partial(jit, donated_argnums=(1,2,7))
def f(...):
  ...

This would be a promise by the user that they are done with the buffers passed in certain argument positions, and that either (a) the called computation may reuse them for outputs, or (b) they will be freed.

The explicit option seems the simplest to start with, and has the advantage of simplicity of implementation.

The text was updated successfully, but these errors were encountered:

romanngg · 2019-11-21T15:55:43Z

Related: #1273

afrozenator · 2019-11-21T20:04:45Z

/sub

skye · 2019-11-21T23:56:48Z

Should we add donated_argnums to the call site instead of to jit? That'd make it more clear when a DeviceArray is no longer safe to use.

skye · 2019-11-22T00:01:36Z

Or my dream API: jit(f)(jax.move(x))
This maybe isn't a great option though since it'd only make sense at jit/pmap boundaries, but there's nothing stopping you from calling jax.move anywhere.

hawkinsp · 2019-11-22T02:40:12Z

I think we can make something similar to that dream happen.

One minor modification, perhaps:
jit(f)(x.mark_for_donation())
?

I am imagining that DeviceValue.mark_for_donation() sets a flag on the buffer that says the next computation that receives it as input consumes it. I am not 100% happy with that as an API but it would do the job.

mattjj · 2019-11-22T03:28:55Z

We should think through what the common use case looks like. I have a slight aesthetic bias towards jax.move(x) (which could just return a wrapper object with a similar logic to what Peter said). I suspect we'll want to be able to tree-map this function easily. It's worth thinking about if there are any Python refcount games worth playing (like if jax.move(x) could verify that there aren't any other references to x), or if they're all too tricky.

hawkinsp · 2019-11-22T14:15:33Z

I'd argue that jax.move (or whatever we call it) should not be in the business of looking at reference counts. The whole point of the API is that it is completely explicit. This does not preclude the existence of a more magical and implicit way to reclaim buffers, but the explicit API should be predictable and not try to play tricks with reference counts.

Even if there are many outstanding references, jax.move should still take ownership of the buffer. Another way to think about it is move deletes the original DeviceArray object (just as if you called .delete()) and returns you a new object with the same backing buffer, tagged in some way so that the jit dispatch logic knows the computation may consume it. One way to do this would be to return a new subclass of DeviceValue (say, DonatedDeviceArray) that is not a DeviceArray (so it can't be confused with one), but is known to the dispatch logic.

j-towns · 2019-11-22T16:08:29Z

This would be useful for fastar. A note on the API: is it possible that most use cases are in the form of a fixed-point iteration? Gradient descent and the fastar use case are. We could provide a high-level API to donation like

def in_place_fixed_point_iteration(fun, n_iter, x):
  x = x.copy()  # Protect the input array
  for _ in range(n_iter):
    x = fun(jax.move(x))
  return x

and even make move private if we think that in_place_fixed_point_iteration covers all likely use cases.

mattjj · 2019-11-22T16:46:36Z

@hawkinsp I wasn't suggesting anything less explicit. The suggestion (which was tangential to my main point of +1'ing jax.move) was an explicit API with error checking. That is, it's not that jax.move only causes buffer donation when it can; rather, it's that it always causes buffer donation (i.e. the explicit version we all want) but may also be able to catch errors (if we decide that having an alias to x when calling jax.move(x) is actually an error).

mattjj · 2019-11-22T16:50:35Z

That said, we probably don't want it to be an error in that case, since it might be common to write something like

x = ...
y = f(jax.move(x))
... # more stuff, x still in scope!

To do any meaningful checking against reference counts, the user code would have to look more like

x = jax.move(x)
y = f(x)
...

EDIT: one more variant:

x = jax.move(x)  # basically like x = [x]
y = f(x.donate())  # basically like x.pop()

But that just seems annoying.

So ignore my suggestion! But perhaps not the fact that it wasn't arguing for an implicit API :)

mattjj · 2019-11-22T16:53:44Z

@j-towns The fixed-point iteration pattern is probably the most common one, and in that case we have even more information available (not just that these buffers are being donated, but also that it'd be smart to identify the input buffers with the output buffers in a particular way). But in_place_fixed_point_iteration seems too restrictive as an API (basically back to "framework controlled training loops").

I think the ideal is where we have a more flexible API, like jax.move or x.mark_for_donation, that lets user code implement in_place_fixed_point_iteration with the optimal buffer-efficiency.

mattjj · 2019-11-22T17:00:03Z

I like the @hawkinsp thoughts here:

Even if there are many outstanding references, jax.move should still take ownership of the buffer. Another way to think about it is move deletes the original DeviceArray object (just as if you called .delete()) and returns you a new object with the same backing buffer, tagged in some way so that the jit dispatch logic knows the computation may consume it. One way to do this would be to return a new subclass of DeviceValue (say, DonatedDeviceArray) that is not a DeviceArray (so it can't be confused with one), but is known to the dispatch logic.

dm-jrae · 2019-12-30T18:11:56Z

Myself and several other colleagues at DM are pretty excited about some jax.move variant being implemented as the 2x model params+optimizer stats memory overhead is quite significant for us. Has there been any further discussion on this feature?

skye · 2020-01-02T22:39:33Z

I think at this point someone just needs to do it! I can give it a shot next week (or post here if something else comes up :)).

ibab · 2020-02-07T14:10:00Z

@skye: Did you have a chance to look into buffer donation? If the jax team is busy with other things right now, we might be able to help with this one.

girving · 2020-02-07T14:16:53Z

@ibab: @tomhennigan is already on this.

skye · 2020-02-07T17:42:58Z

And yes, I didn't get a chance to look at this after all. I forgot to ping here, sorry!

tomhennigan · 2020-06-10T06:31:30Z

FYI this was fixed for TPU in #2936 and XLA team are in the process of supporting this on GPU right now.

wiep · 2020-12-04T12:56:43Z

my current understanding is that buffer donation works on TPUs and GPUs (since jax 0.1.73) but not on CPUs. are there plans to support CPUs as well?

hawkinsp · 2022-08-12T20:49:55Z

This issue still applies, but only on CPU.

hawkinsp · 2023-05-19T18:55:29Z

This issue is long fixed!

hawkinsp added the enhancement New feature or request label Nov 21, 2019

romanngg mentioned this issue Jan 22, 2020

very large memory footprint for a simple UNet google/neural-tangents#18

Open

romanngg mentioned this issue Apr 14, 2020

Memory and running time issues for CNN google/neural-tangents#29

Closed

Jeevesh8 mentioned this issue Jun 30, 2021

Allow for sharing of donated buffers between different pmap-ed functions #7144

Closed

2 tasks

sudhakarsingh27 added NVIDIA GPU Issues specific to NVIDIA GPUs P2 (eventual) This ought to be addressed, but has no schedule at the moment. (Assignee optional) labels Aug 10, 2022

hawkinsp added XLA CPU Issues related to the CPU compiler/runtime and removed NVIDIA GPU Issues specific to NVIDIA GPUs labels Aug 12, 2022

hawkinsp closed this as completed May 19, 2023

Dongyeongkim mentioned this issue Jul 6, 2024

Allow reshaped buffer donation #11036

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for buffer donation (input/output aliasing) #1733

Add support for buffer donation (input/output aliasing) #1733

hawkinsp commented Nov 21, 2019

romanngg commented Nov 21, 2019

afrozenator commented Nov 21, 2019

skye commented Nov 21, 2019

skye commented Nov 22, 2019

hawkinsp commented Nov 22, 2019

mattjj commented Nov 22, 2019

hawkinsp commented Nov 22, 2019

j-towns commented Nov 22, 2019 •

edited

Loading

mattjj commented Nov 22, 2019

mattjj commented Nov 22, 2019 •

edited

Loading

mattjj commented Nov 22, 2019

mattjj commented Nov 22, 2019

dm-jrae commented Dec 30, 2019

skye commented Jan 2, 2020

ibab commented Feb 7, 2020

girving commented Feb 7, 2020

skye commented Feb 7, 2020

tomhennigan commented Jun 10, 2020

wiep commented Dec 4, 2020 •

edited

Loading

hawkinsp commented Aug 12, 2022

hawkinsp commented May 19, 2023

Add support for buffer donation (input/output aliasing) #1733

Add support for buffer donation (input/output aliasing) #1733

Comments

hawkinsp commented Nov 21, 2019

romanngg commented Nov 21, 2019

afrozenator commented Nov 21, 2019

skye commented Nov 21, 2019

skye commented Nov 22, 2019

hawkinsp commented Nov 22, 2019

mattjj commented Nov 22, 2019

hawkinsp commented Nov 22, 2019

j-towns commented Nov 22, 2019 • edited Loading

mattjj commented Nov 22, 2019

mattjj commented Nov 22, 2019 • edited Loading

mattjj commented Nov 22, 2019

mattjj commented Nov 22, 2019

dm-jrae commented Dec 30, 2019

skye commented Jan 2, 2020

ibab commented Feb 7, 2020

girving commented Feb 7, 2020

skye commented Feb 7, 2020

tomhennigan commented Jun 10, 2020

wiep commented Dec 4, 2020 • edited Loading

hawkinsp commented Aug 12, 2022

hawkinsp commented May 19, 2023

j-towns commented Nov 22, 2019 •

edited

Loading

mattjj commented Nov 22, 2019 •

edited

Loading

wiep commented Dec 4, 2020 •

edited

Loading