You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This concerns the halo swap for the distributions which currently occurs in lb_data.c via lb_halo(). There are currently two implementations: an internal "host" implementation and a "target" implementation via halo_swap.c.
The intention is to replace the "target" implementation with one using GPU-aware MPI in a similar fashion to that seen in field.c. This will allow the removal of the code in hanlo_swap.c.
This will also provide a more effective halo swap for device applications.
Background
The mechanism for the halo swap is:
Pack contiguous send buffers with the relevant data for transfers between up to 26 nearest-neighbours.
[Conditional] Transfer send buffers from device to host.
Undertake message passing between send and recv buffers.
[Conditional] Transfer recv buffers to device.
Unpack receive buffers to appropriate location in memory.
Steps 1 and 5 involve an OpenMP section (host) or kernel (device) to pack/unpack the messages.
Steps 2 and 4 are not required for host execution, and not required is GPU-aware MPI is available.
The field.c device implementation is via a graph; this handles steps 2 and 4 either by explicit copy or via gpu-aware MPI.
"Relevant data" includes the possibility of a "reduced" swap in which only elements of the distribution with a component in the direction of the halo exchange direction are considered. Otherwise it's a full swap with all the elements of the distribution.
What's needed
An effective implementation in lb_data.c.
It may be appropriate to have some input file switches to control exactly the mechanism, e.g., via gpu-aware MPI or not. This would allow, e.g., to use host-device transfer to provide an effective host halo swap in the case that device is present.
we should consider the case for extended velocity sets with components of magnitude greater than one, although this may favour a different implementation.
The text was updated successfully, but these errors were encountered:
Distribution halo swap
This concerns the halo swap for the distributions which currently occurs in
lb_data.c
vialb_halo()
. There are currently two implementations: an internal "host" implementation and a "target" implementation viahalo_swap.c
.The intention is to replace the "target" implementation with one using GPU-aware MPI in a similar fashion to that seen in
field.c
. This will allow the removal of the code inhanlo_swap.c
.This will also provide a more effective halo swap for device applications.
Background
The mechanism for the halo swap is:
field.c
device implementation is via a graph; this handles steps 2 and 4 either by explicit copy or via gpu-aware MPI.What's needed
lb_data.c
.The text was updated successfully, but these errors were encountered: