Skip to content

Guidelines to convert CUDA(CuPy) kernel to OpenCL(ClPy) kernel

vorj edited this page Nov 1, 2018 · 5 revisions

Thread, Grid, Block -> Work Group, Work Item

See https://www.sharcnet.ca/help/index.php/Porting_CUDA_to_OpenCL .

threadIdx.{x,y,z} -> get_local_id({0, 1, 2})

blockDim.{x,y,z} -> get_local_size({0, 1, 2})

blockIdx.{x,y,z} -> get_group_id({0, 1, 2})


The concepts of thread, block, grid (for CUDA) and workitem, workgroup (for OpenCL) are quite different.

To launch total 1024 threads grouped by 32 in 1D,

CUDA OpenCL
blocksize = (32, 1, 1), gridsize = (32, 1, 1) global_work_size = (1024, 1, 1), local_work_size = (32, 1, 1)

Threads

__syncthreads()
-> barrier(CLK_LOCAL_MEM_FENCE)

Stuffs Related to CArray

If ultima will be applied, these changes are not necessary.

Function Arguments

CArray<T, N> arr
-> __global T* arr, CArray_N arr_info

Size Aquistion

arr.size()
-> arr_info.size_

Access by Index

arr[I]
-> arr[get_CArrayIndexI_N(&arr_info, I)/sizeof(<type of arr[0]>)]