You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue defines integration of GPU support in MPICH. Currently we enable GPU to GPU buffer transfers using the MPI interface through the UCX netmod in the CH4 device (actual transport is provided by UCX). However this only works for NVIDIA GPUs and contiguous datatypes. Non-contiguous datatype buffers are packed in host memory and then transferred. This involves a device-to-host memory copy. Additionally collective reductions are not supported and data in intermediate nodes has to be transferred to host memory to perform operations.
The following features will be needed to enable full GPU support:
Contiguous data movement for inter-node communication using GPU Direct 3.0, aka GPUDirect RDMA for pinned buffers (likely no changes needed to MPICH).
Contiguous data movement for inter-node communication using GPU Direct 3.0, aka GPUDirect RDMA for unified memory buffers (either throw an error or do an explicit memcpy) (ticket GPU-aware contiguous data movement optimization #3583)
[Generic fallback] Noncontiguous data movement support through contiguous copies (updates to MPIR_Localcopy) (ticket Enable GPU-aware generic fallback #3582)
Noncontiguous data movement support through vector-copy mechanisms (e.g., 2DMemcpy, 3DMemcpy)
Noncontiguous data movement support through kernel offload
[Generic fallback] Reduction operations support through CPU computation (CPU gets GPU data and computes on the CPU) (ticket Enable GPU-aware generic fallback #3582)
@gcongiu I am trying to note the FY19Q2 progress. Could you please let me know the update or point me to the correct PR that i need to see for the progress of this issue?
This issue defines integration of GPU support in MPICH. Currently we enable GPU to GPU buffer transfers using the MPI interface through the UCX netmod in the CH4 device (actual transport is provided by UCX). However this only works for NVIDIA GPUs and contiguous datatypes. Non-contiguous datatype buffers are packed in host memory and then transferred. This involves a device-to-host memory copy. Additionally collective reductions are not supported and data in intermediate nodes has to be transferred to host memory to perform operations.
The following features will be needed to enable full GPU support:
MPIR_Localcopy
) (ticket Enable GPU-aware generic fallback #3582)The text was updated successfully, but these errors were encountered: