Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar indexing with CUDA #176

Closed
vpuri3 opened this issue Jul 11, 2023 · 10 comments · Fixed by #177
Closed

Scalar indexing with CUDA #176

vpuri3 opened this issue Jul 11, 2023 · 10 comments · Fixed by #177
Labels

Comments

@vpuri3
Copy link
Contributor

vpuri3 commented Jul 11, 2023

MWE (copied from test/cuda.jl)

julia> using Tullio, CUDA

julia> CUDA.allowscalar(false)

julia> A, B, C = CUDA.rand(2,2,2), CUDA.rand(2,2), CUDA.rand(2,2,2);

julia> @tullio A[k,i,a] = tanh(B[i,a] + C[k,i,a])                                                      
ERROR: Scalar indexing is disallowed.                                                                  
Invocation of getindex resulted in scalar indexing of a GPU array.                                     
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.                               
If you did intend to index this array, annotate the caller with @allowscalar.                          
Stacktrace:                                                                                            
 [1] error(s::String)                                                                                  
   @ Base ./error.jl:35                                                                                
 [2] assertscalar(op::String)                                                                          
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
 [3] getindex(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64)                          
   @ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:9
 [4] 𝒜𝒸𝓉!                                                                                              
   @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1036 [inlined]
 [5] 𝒜𝒸𝓉!                                                                                              
   @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1041 [inlined]                                        
 [6] threader(fun!::var"#𝒜𝒸𝓉!#7", ::Type{CuArray{Float32, N, CUDA.Mem.DeviceBuffer} where N}, Z::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, As::Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float
32, 3, CUDA.Mem.DeviceBuffer}}, Is::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}, Js::Tuple{}, redfun::Function, block::Int64, keep::Nothing)
   @ Tullio ~/.julia/packages/Tullio/NGyNM/src/eval.jl:104
 [7] top-level scope
   @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1004
 [8] top-level scope
   @ ~/.julia/packages/CUDA/tVtYo/src/initialization.jl:185
(MyPkg) pkg> st CUDA
Project GeometryLearning v0.0.1
Status `/xxx/MyPkg/Project.toml`
  [052768ef] CUDA v4.4.0
(MyPkg) pkg> st Tullio
Project GeometryLearning v0.0.1
Status `/xxx/MyPkg/Project.toml`
  [bc48ee85] Tullio v0.3.5

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.1
NVIDIA driver 525.60.13, originally for CUDA 12.0

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+525.60.13

Julia packages: 
- CUDA: 4.4.0
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

Environment:
- JULIA_CUDA_MEMORY_POOL: none

1 device:
  0: Tesla V100-SXM2-32GB (sm_70, 31.429 GiB / 32.000 GiB available)
julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 40 × Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 40 virtual cores
Environment:
  JULIA_PKG_DEVDIR = /ocean/projects/eng170006p/vpuri1
  JULIA_CUDA_MEMORY_POOL = none
  JULIA_DEPOT_PATH = /jet/home/vpuri1/.julia
  JULIA_NUM_PRECOMPILE_TASKS = 40
@vpuri3 vpuri3 mentioned this issue Jul 11, 2023
@mcabbott
Copy link
Owner

It always needs some companion packages to write a CUDA-specific kernel. However, this still fails right now:

julia> using CUDA, CUDAKernels, KernelAbstractions

julia> @tullio A[k,i,a] = tanh(B[i,a] + C[k,i,a])   
ERROR: MethodError: no method matching length(::Nothing)
Stacktrace:
  [1] #s597#122
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:18 [inlined]
  [2] var"#s597#122"(f::Any, tt::Any, ::Any, job::Any)
    @ GPUCompiler ./none:0
  [3] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:599
  [4] cached_compilation(cache::Dict{…}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:71
  [5] cufunction(f::var"#gpu_##🇨🇺#225#6", tt::Type{…}; name::Nothing, always_inline::Bool, kwargs::Base.Pairs{…})
    @ CUDA ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
  [6] macro expansion
    @ CUDAKernels ~/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
  [7] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, dependencies::CUDAKernels.CudaEvent, workgroupsize::Nothing, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/3IKLV/src/CUDAKernels.jl:283
  [8] 𝒜𝒸𝓉!
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1172 [inlined]
  [9] 𝒜𝒸𝓉!
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1169 [inlined]
 [10] threader(fun!::var"#𝒜𝒸𝓉!#4", ::Type{…}, Z::CuArray{…}, As::Tuple{…}, Is::Tuple{…}, Js::Tuple{}, redfun::Function, block::Int64, keep::Nothing)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/eval.jl:104
 [11] top-level scope
    @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1004
 [12] top-level scope
    @ ~/.julia/packages/CUDA/ZdCxS/src/initialization.jl:155
Some type information was truncated. Use `show(err)` to see complete types.

(@v1.10) pkg> st Tullio CUDA CUDAKernels KernelAbstraction
Status `~/.julia/environments/v1.10/Project.toml`
⌅ [052768ef] CUDA v4.0.1
  [72cfdca4] CUDAKernels v0.4.7
  [bc48ee85] Tullio v0.3.5
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

@mcabbott mcabbott added the GPU label Jul 12, 2023
@vpuri3
Copy link
Contributor Author

vpuri3 commented Jul 12, 2023

any idea what could be causing this?

@mcabbott
Copy link
Owner

It's just #172 I think. The versions loaded here don't work together, sadly.

@vpuri3
Copy link
Contributor Author

vpuri3 commented Jul 12, 2023

as in Tullio with Cuda v4? Do you know what change in v4 is causing this?

This is critical to the work I am doing right now. Maybe I can help in fixing this issue?

@mcabbott
Copy link
Owner

Sorry the issue I meant to link to was this one: #168 (comment)

I played a little bit today, and it can't be that hard to figure out what ought to replace dependencies=Event(CUDADevice()). Maybe it won't be hard to fix.

@vpuri3
Copy link
Contributor Author

vpuri3 commented Jul 13, 2023

thanks @mcabbott , I'm out sick for the few days so can possibly look at this over the weekend. unless someone more familiar with the package wants to take a crack.

@adrhill
Copy link

adrhill commented Sep 8, 2023

I'm running into the same issue.

CUDA version info
CUDA runtime 12.1, artifact installation
CUDA driver 12.2
NVIDIA driver 535.54.3

CUDA libraries:
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+535.54.3

Julia packages:
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.10.0-beta2
- LLVM: 15.0.7
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3090 (sm_86, 23.685 GiB / 24.000 GiB available)

@vpuri3
Copy link
Contributor Author

vpuri3 commented Sep 8, 2023

@adrhill this branch (https://github.com/vpuri3/Tullio.jl/tree/total) has been working with GPU + Zygote for me. It has everything from #178, #177 in it.

@vpuri3
Copy link
Contributor Author

vpuri3 commented Sep 8, 2023

@mcabbott can you review those two PRs?

@vpuri3
Copy link
Contributor Author

vpuri3 commented Sep 26, 2023

@mcabbott ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants