Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

since new version: Flux throws error when for train! / update! even on quick start problem #2358

Closed
dorn-gerhard opened this issue Dec 3, 2023 · 5 comments
Labels

Comments

@dorn-gerhard
Copy link

I updated to the new version Flux 0.14.6 (and also CUDA 5.1.1) and got the illegal memory access error (see below) when using train! or update
machine: Windows Laptop
GPU: NVIDIA GeForce GTX 1050
CUDA Version: 12.0
Julia Version: 1.9.4
packages in environment:

st
Status `C:\Users\Gerhard\.julia\environments\GPU\Project.toml`
  [052768ef] CUDA v5.1.1
  [587475ba] Flux v0.14.6
  [92933f4c] ProgressMeter v1.9.0
  [02a925ec] cuDNN v1.2.1
  [10745b16] Statistics v1.9.0

I am aware of the new breaking changes in Flux and CUDA API, so
I just tested the quick start example and got the following error:

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
 [2] isdone
   @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:171
 [5] device_synchronize()
   @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169
 [6] top-level scope
   @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\src\initialization.jl:210

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\libcuda.jl:27
  [2] nonblocking_synchronize(val::CuContext)
    @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:163
  [3] device_synchronize(; blocking::Bool, spin::Bool)
    @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:174
  [4] device_synchronize
    @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\synchronization.jl:169 [inlined]
  [5] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:40
  [6] CuModule
    @ C:\Users\Gerhard\.julia\packages\CUDA\YIj5X\lib\cudadrv\module.jl:23 [inlined]
  [7] link(job::GPUCompiler.CompilerJob, compiled::WARNING: Error while freeing DeviceBuffer(256 bytes at 0x0000000204607c00):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Downgrading to Flux v0.14.5 resolved the problem.

Any hints, what I am doing wrong with the new version?

@ToucheSir
Copy link
Member

Can you try running the MWE in FluxML/Zygote.jl#1473 and see if you get a similar error? This looks like a problem on the CUDA.jl side, but to help them we should reduce it as much as possible.

@dorn-gerhard
Copy link
Author

thank you for the hint - you are right, CUDA@5.1.1 triggers the following error:

ERROR: WARNING: Error while freeing DeviceBuffer(400 bytes at 0x0000000205200a00):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)

@henry2004y
Copy link

I have the same issue. With CUDA.jl v5.1.1 and Flux v0.14.7 on Win11 using Julia v1.10-rc3, all the demos with GPU failed with error an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS).

@mcabbott mcabbott added the cuda label Dec 31, 2023
@ToucheSir
Copy link
Member

Per FluxML/Zygote.jl#1473 (comment), the fix hasn't landed in a tagged version of CUDA.jl yet.

@CarloLucibello
Copy link
Member

this should be solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants