Wrapper of the RCCL (Radeon Collective Communication Library) for Julia. The bindings to the C functions were generated with Clang.jl. The API is very similar to that of NCCL.jl and passes the same set of tests as that.
A couple of implementation details differ between RCCL.jl and NCCL.jl due to discrepancies between the AMDGPU.jl and CUDA.jl; specifically:
CUDA.jlexports aCUstreamtype which is simply an alias ofcuStream_twhich then gets wrapped in aCuStreamstruct.CUstreamcan be passed to all NCCL C functions since it is basically a C type.AMDGPU.jlexposes a Julia struct calledHIPStreamwhich contains a handle to ahipStream_tC type (which itself is not exported). Thus, when@ccalling RCCL functions, a simplePtr{Cvoid}is passed. This should change if somehow ahipStream_tis exposed byAMDGPU.jl;CUDA.jlhas a