jnp.split: use dynamic rather than static slices for speed #13096

jakevdp · 2022-11-03T19:51:36Z

Fixes #12999. See also #9445

This is very similar to the issue & fix in #12219

main branch:

In [1]: import jax.numpy as jnp 
   ...: x = jnp.ones(5000) 
   ...: %time _ = jnp.split(x, jnp.arange(0, 5000, 10))                                                                   
CPU times: user 4.77 s, sys: 118 ms, total: 4.89 s
Wall time: 4.96 s

this branch:

In [1]: import jax.numpy as jnp 
   ...: x = jnp.ones(5000) 
   ...: %time _ = jnp.split(x, jnp.arange(0, 5000, 10))                                                                   
CPU times: user 342 ms, sys: 8.52 ms, total: 351 ms
Wall time: 360 ms

froystig

Do we want benchmarks here, like those you added in #12219?

Does this change affect performance for a small number of splits?

jakevdp · 2022-11-04T16:48:09Z

Good question - for repeated operations after the initial run, we get something like this:

main branch:

In [2]: %timeit jax.block_until_ready(jnp.split(x, jnp.arange(0, 5000, 10)))                                          
40.5 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

this branch:

In [2]: %timeit jax.block_until_ready(jnp.split(x, jnp.arange(0, 5000, 10)))                                          
116 ms ± 4.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So a slowdown of ~3x on repeated calls is the tradeoff for saving ~150x on the initial call.

What do you think?

jakevdp · 2022-11-04T16:56:59Z

jax/_src/numpy/lax_numpy.py

-  return [lax.slice(ary, _subval(starts, axis, start), _subval(ends, axis, end))
+  # Use dynamic rather than static slice to prevent slow execution of large
+  # number of splits; see https://github.com/google/jax/issues/12999
+  return [lax.dynamic_slice(ary, _subval(starts, axis, start), _subval(sizes, axis, end - start))


Another option here: we could replace this with

return [ary[int(start): int(end)] for start, end in zip(split_indices[:-1], split_indices[1:])]

Due to #12219, this would now use dynamic rather than static slices and result in the same performance characteristics. I kind of like the idea of delegating the performance question to existing code. What do you think?

froystig · 2022-11-11T18:29:23Z

From conversation with @hawkinsp: we may want to profile in order to understand why the dynamic slice approach is 3x slower, since it isn't clear that it ought to be (?)

KeAWang · 2022-12-08T08:19:23Z

Why is dynamic_slice faster than static slice in the first place?

jakevdp · 2022-12-08T17:15:54Z

Why is dynamic_slice faster than static slice in the first place?

dynamic_slice is slower than static slice if you call it once, because it does not specialize its code on index values, and so at runtime XLA has to perform some logic regarding the value of the start index. But static slice is specialized on static start indices, so each call with different start indices incurs a small overhead in XLA at compile time, and when you accumulate this small overhead thousands of times, it is slower than dynamic_slice, which does not have such overhead because it is not specialized on the index values.

KeAWang · 2022-12-08T18:48:14Z

Thanks! That explains the performance issues I've been trying to debug with slice vs dynamic_slice vs numpy slice.

jakevdp · 2023-11-03T20:27:24Z

I can no longer reproduce these performance issues, despite this still lowering to static slice.

jakevdp requested a review from froystig November 3, 2022 19:51

jakevdp self-assigned this Nov 3, 2022

jnp.split: use dynamic rather than static slices for speed

2568ab4

jakevdp force-pushed the split-speed branch from 162b908 to 2568ab4 Compare November 3, 2022 19:54

froystig reviewed Nov 4, 2022

View reviewed changes

jakevdp commented Nov 4, 2022

View reviewed changes

jakevdp closed this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jnp.split: use dynamic rather than static slices for speed #13096

jnp.split: use dynamic rather than static slices for speed #13096

jakevdp commented Nov 3, 2022 •

edited

Loading

froystig left a comment

jakevdp commented Nov 4, 2022 •

edited

Loading

jakevdp Nov 4, 2022 •

edited

Loading

froystig commented Nov 11, 2022

KeAWang commented Dec 8, 2022

jakevdp commented Dec 8, 2022

KeAWang commented Dec 8, 2022

jakevdp commented Nov 3, 2023

jnp.split: use dynamic rather than static slices for speed #13096

jnp.split: use dynamic rather than static slices for speed #13096

Conversation

jakevdp commented Nov 3, 2022 • edited Loading

froystig left a comment

Choose a reason for hiding this comment

jakevdp commented Nov 4, 2022 • edited Loading

jakevdp Nov 4, 2022 • edited Loading

Choose a reason for hiding this comment

froystig commented Nov 11, 2022

KeAWang commented Dec 8, 2022

jakevdp commented Dec 8, 2022

KeAWang commented Dec 8, 2022

jakevdp commented Nov 3, 2023

jakevdp commented Nov 3, 2022 •

edited

Loading

jakevdp commented Nov 4, 2022 •

edited

Loading

jakevdp Nov 4, 2022 •

edited

Loading