Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution halo swap #316

Open
kevinstratford opened this issue Jul 26, 2024 · 0 comments
Open

Distribution halo swap #316

kevinstratford opened this issue Jul 26, 2024 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@kevinstratford
Copy link
Collaborator

Distribution halo swap

This concerns the halo swap for the distributions which currently occurs in lb_data.c via lb_halo(). There are currently two implementations: an internal "host" implementation and a "target" implementation via halo_swap.c.

The intention is to replace the "target" implementation with one using GPU-aware MPI in a similar fashion to that seen in field.c. This will allow the removal of the code in hanlo_swap.c.

This will also provide a more effective halo swap for device applications.

Background

The mechanism for the halo swap is:

  1. Pack contiguous send buffers with the relevant data for transfers between up to 26 nearest-neighbours.
  2. [Conditional] Transfer send buffers from device to host.
  3. Undertake message passing between send and recv buffers.
  4. [Conditional] Transfer recv buffers to device.
  5. Unpack receive buffers to appropriate location in memory.
  • Steps 1 and 5 involve an OpenMP section (host) or kernel (device) to pack/unpack the messages.
  • Steps 2 and 4 are not required for host execution, and not required is GPU-aware MPI is available.
  • The field.c device implementation is via a graph; this handles steps 2 and 4 either by explicit copy or via gpu-aware MPI.
  • "Relevant data" includes the possibility of a "reduced" swap in which only elements of the distribution with a component in the direction of the halo exchange direction are considered. Otherwise it's a full swap with all the elements of the distribution.

What's needed

  • An effective implementation in lb_data.c.
  • It may be appropriate to have some input file switches to control exactly the mechanism, e.g., via gpu-aware MPI or not. This would allow, e.g., to use host-device transfer to provide an effective host halo swap in the case that device is present.
  • we should consider the case for extended velocity sets with components of magnitude greater than one, although this may favour a different implementation.
@kevinstratford kevinstratford added enhancement New feature or request help wanted Extra attention is needed labels Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant