-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NNlibCUDA Heisenbug in conv! with nonzero beta #37
Comments
I can replicate it on
|
The first call to conv! is special because it will reliably trigger an algorithm search. What happens if you go a level lower and call the CUDA.jl functions? |
This happens: using CUDA, NNlibCUDA, NNlib
x = CUDA.ones(1,1,1)
w = CUDA.ones(1,1,1)
y = CUDA.zeros(1,1,1)
cdims = NNlib.DenseConvDims(x,w)
d, x, _ = NNlibCUDA.cudnnConvolutionDescriptorAndPaddedInput(cdims, x)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 2
y = CUDA.zeros(1,1,1)
CUDA.CUDNN.cudnnConvolutionForward!(y, w, x, d; alpha=1f0, beta=1f0, z=y) # 1 I suspect that the crux is here. Instead of y, a similar array should be allocated and used in |
I would expect that |
We have to differentiate between the actual convolution and the algorithm search. The convolution needs a zero-initalized output buffer, that it alters. The algorithm search also needs an output buffer for the benchmark, but at the end of this, the values in the buffer are arbitrary. If you use the same for both, you run the algorithm search, the buffer has arbitrary values in it and then the conv adds to that, leading to garbage. As you said correctly, in subsequent calls the algorithm search is omitted, which makes this a semi-heisenbug. Using this branch https://github.com/maxfreu/CUDA.jl/tree/conv-algosearch I get zero errors in the fuzzer. |
That's what I was missing, |
Hmm I wouldn't have expected that the search causes OOMs, as the buffer should be freed right after the search. Apart from the input, weight and output tensors, the search also needs a "workspace", the size of which is calculated here. It already seems to be quite small, but I didn't think it through. Maybe not small enough? Anyway, it should be possible to allocate only if beta != 0. In case the assumption holds that this is the only case y is accumulated into. Should I mark the PR as draft? |
See JuliaGPU/CUDA.jl#736
Describe the bug
When using the
beta
keyword ofNNlib.conv!
onCuArray
there are rare non-deterministic? absurd results.To reproduce
Run the following on a fresh julia session
y_cpu = Float32[1.0]
y_gpu = Float32[2.0]
If I run it again,
y_gpu
will give the correct result. If I do some fuzz testing, it seems that at least the firstconv!
operation of given array sizes goes wrong. I think also that it is not only the first operation that goes wrong, but the first operation reliably goes wrong.I am using current master of CUDA.jl
Manifest.toml
Version info
Details on Julia:
Details on CUDA:
The text was updated successfully, but these errors were encountered: