Skip to content

Commit

Permalink
Merge pull request #69 from omlins/amdgpu
Browse files Browse the repository at this point in the history
Add support for AMDGPU
  • Loading branch information
omlins committed Dec 12, 2022
2 parents b985f8f + abb7e75 commit 598a6bb
Show file tree
Hide file tree
Showing 27 changed files with 603 additions and 213 deletions.
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ authors = ["Samuel Omlin", "Ludovic Räss"]
version = "0.6.0"

[deps]
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
CellArrays = "d35fcfd7-7af4-4c67-b1aa-d78070614af4"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"

[compat]
AMDGPU = "0.4"
CellArrays = "0.1"
CUDA = "3.12"
MacroTools = "0.5"
Expand Down
55 changes: 43 additions & 12 deletions src/ParallelKernel/Data.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ The type of numbers used by @zeros, @ones, @rand and @fill and in all array type
--------------------------------------------------------------------------------
Data.Array{ndims}
Expands to `Data.Array{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.Array` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads and CUDA.CuArray or CUDA.CuDeviceArray for CUDA; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required).
Expands to `Data.Array{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.Array` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads, CUDA.CuArray or CUDA.CuDeviceArray for CUDA and AMDGPU.ROCArray or AMDGPU.ROCDeviceArray for AMDGPU; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required).
--------------------------------------------------------------------------------
Data.CellArray{ndims}
Expands to `Data.CellArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.CellArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads and CuCellArray or CuDeviceCellArray for CUDA; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required).
Expands to `Data.CellArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.CellArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads, CuCellArray or CuDeviceCellArray for CUDA and ROCCellArray or ROCDeviceCellArray for AMDGPU; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CellArray automatically to DeviceCellArray when required).
--------------------------------------------------------------------------------
Data.Cell{S}
Expand All @@ -27,18 +27,18 @@ Expands to `Union{StaticArrays.SArray{S, numbertype}, StaticArrays.FieldArray{S,
!!! note "Advanced"
Data.DeviceArray{ndims}
Expands to `Data.DeviceArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.DeviceArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads and CUDA.CuDeviceArray for CUDA).
Expands to `Data.DeviceArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.DeviceArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads, CUDA.CuDeviceArray for CUDA AMDGPU.ROCDeviceArray for AMDGPU).
!!! warning
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required.
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required.
--------------------------------------------------------------------------------
Data.DeviceCellArray{ndims}
Expands to `Data.DeviceCellArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.DeviceCellArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads and CuDeviceCellArray for CUDA).
Expands to `Data.DeviceCellArray{numbertype, ndims}`, where `numbertype` is the datatype selected with [`@init_parallel_kernel`](@ref) and the datatype `Data.DeviceCellArray` is chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads, CuDeviceCellArray for CUDA and ROCDeviceCellArray for AMDGPU).
!!! warning
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required.
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required.
"""

const DATA_DOC_NUMBERTYPE_NONE = """
Expand All @@ -49,12 +49,12 @@ The module Data is created in the module where `@init_parallel_kernel` is called
--------------------------------------------------------------------------------
Data.Array{numbertype, ndims}
The datatype `Data.Array` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads and CUDA.CuArray or CUDA.CuDeviceArray for CUDA; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required).
The datatype `Data.Array` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads, CUDA.CuArray or CUDA.CuDeviceArray for CUDA and AMDGPU.ROCArray or AMDGPU.ROCDeviceArray for AMDGPU; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required).
--------------------------------------------------------------------------------
Data.CellArray{numbertype, ndims}
The datatype `Data.CellArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads and CuCellArray or CuDeviceCellArray for CUDA; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required).
The datatype `Data.CellArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads, CuCellArray or CuDeviceCellArray for CUDA and ROCCellArray or ROCDeviceCellArray for AMDGPU; [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CellArray automatically to DeviceCellArray in kernels when required).
--------------------------------------------------------------------------------
Data.Cell{numbertype, S}
Expand All @@ -65,18 +65,18 @@ Expands to `Union{StaticArrays.SArray{S, numbertype}, StaticArrays.FieldArray{S,
!!! note "Advanced"
Data.DeviceArray{numbertype, ndims}
The datatype `Data.DeviceArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads and CUDA.CuDeviceArray for CUDA).
The datatype `Data.DeviceArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (Array for Threads, CUDA.CuDeviceArray for CUDA and AMDGPU.ROCDeviceArray for AMDGPU).
!!! warning
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required.
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required.
--------------------------------------------------------------------------------
Data.DeviceCellArray{numbertype, ndims}
The datatype `Data.DeviceCellArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads and CuDeviceCellArray for CUDA).
The datatype `Data.DeviceCellArray` is automatically chosen to be compatible with the package for parallelization selected with [`@init_parallel_kernel`](@ref) (CPUCellArray for Threads, CuDeviceCellArray for CUDA and ROCDeviceCellArray for AMDGPU).
!!! warning
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray automatically to CUDA.CuDeviceArray in kernels when required.
This datatype is not intended for explicit manual usage. [`@parallel`](@ref) and [`@parallel_indices`](@ref) convert CUDA.CuArray and AMDGPU.ROCArray automatically to CUDA.CuDeviceArray and AMDGPU.ROCDeviceArray in kernels when required.
"""

function Data_cuda(numbertype::DataType)
Expand Down Expand Up @@ -110,6 +110,37 @@ function Data_cuda(numbertype::DataType)
end
end

function Data_amdgpu(numbertype::DataType)
if numbertype == NUMBERTYPE_NONE
:(baremodule Data # NOTE: there cannot be any newline before 'module Data' or it will create a begin end block and the module creation will fail.
import AMDGPU, CellArrays, StaticArrays
Array{T, N} = AMDGPU.ROCArray{T, N}
DeviceArray{T, N} = AMDGPU.ROCDeviceArray{T, N}
Cell{T, S} = Union{StaticArrays.SArray{S, T}, StaticArrays.FieldArray{S, T}}
DeviceCell{T, S} = Union{StaticArrays.SArray{S, T}, StaticArrays.FieldArray{S, T}}
CellArray{T_elem, N, B} = CellArrays.ROCCellArray{<:Cell{T_elem},N,B,T_elem}
DeviceCellArray{T_elem, N, B} = CellArrays.CellArray{<:DeviceCell{T_elem},N,B,<:AMDGPU.ROCDeviceArray{T_elem,CellArrays._N}}
end)
else
:(baremodule Data # NOTE: there cannot be any newline before 'module Data' or it will create a begin end block and the module creation will fail.
import AMDGPU, CellArrays, StaticArrays
Number = $numbertype
Array{N} = AMDGPU.ROCArray{$numbertype, N}
DeviceArray{N} = AMDGPU.ROCDeviceArray{$numbertype, N}
Cell{S} = Union{StaticArrays.SArray{S, $numbertype}, StaticArrays.FieldArray{S, $numbertype}}
DeviceCell{S} = Union{StaticArrays.SArray{S, $numbertype}, StaticArrays.FieldArray{S, $numbertype}}
CellArray{N, B} = CellArrays.ROCCellArray{<:Cell,N,B,$numbertype}
DeviceCellArray{N, B} = CellArrays.CellArray{<:DeviceCell,N,B,<:AMDGPU.ROCDeviceArray{$numbertype,CellArrays._N}}
TArray{T, N} = AMDGPU.ROCArray{T, N}
DeviceTArray{T, N} = AMDGPU.ROCDeviceArray{T, N}
TCell{T, S} = Union{StaticArrays.SArray{S, T}, StaticArrays.FieldArray{S, T}}
DeviceTCell{T, S} = Union{StaticArrays.SArray{S, T}, StaticArrays.FieldArray{S, T}}
TCellArray{T_elem, N, B} = CellArrays.ROCCellArray{<:TCell{T_elem},N,B,T_elem}
DeviceTCellArray{T_elem, N, B} = CellArrays.CellArray{<:DeviceTCell{T_elem},N,B,<:AMDGPU.ROCDeviceArray{T_elem,CellArrays._N}}
end)
end
end

function Data_threads(numbertype::DataType)
if numbertype == NUMBERTYPE_NONE
:(baremodule Data # NOTE: there cannot be any newline before 'module Data' or it will create a begin end block and the module creation will fail.
Expand Down
Loading

0 comments on commit 598a6bb

Please sign in to comment.