-
Notifications
You must be signed in to change notification settings - Fork 212
Closed
Description
When running the following code on an EXLA backend with a CUDA GPU, through the livebook:0.14.5-cuda12 image:
key = Nx.Random.key(1)
input = Nx.iota({1_000_000})
{output, _new_key} = Nx.Random.shuffle(key, input)
outputthe result looks like this:
#Nx.Tensor<
s32[1000000]
EXLA.Backend<cuda:0, 0.477854050.2664562754.2456>
[163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, 163918, ...]
>
The bug also occurs for smaller tensors, but with a lesser frequency. The deciding factor is the size of the axis used for the shuffle, the other dimensions of the tensor do not seem to be relevant. It seems to start happening with an axis size around 100,000, and is guaranteed after 1,000,000.
Moreover, about half the time the execution never completes and the Livebook runtime has to be restarted.
This has been observed on two different machines, one with an RTX4090 graphics card and one with a GTX1070ti. The bug did not occur during testing on the CPU on the same machines.
Metadata
Metadata
Assignees
Labels
No labels