Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for MPI Parallel Execution #31

Closed
2 of 8 tasks
christopherwharrop-noaa opened this issue Jan 22, 2021 · 2 comments
Closed
2 of 8 tasks

Add Support for MPI Parallel Execution #31

christopherwharrop-noaa opened this issue Jan 22, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@christopherwharrop-noaa
Copy link
Collaborator

Issue checklist

  • I have read the README and ref/README, including the Troubleshooting Guide.
  • I have reviewed existing issues to ensure this has not already been reported.

Is your feature request related to a problem? Please describe.

The existing code only has support for parallelism via OpenMP threading. This limits the utility of the kernel for use in evaluating performance tradeoffs of various types of implementations. A proper evaluation must include analysis of multi-node as well as single-node parallel performance.

Describe the solution you'd like

A full featured MPI capability is needed. This will include:

  • Ability to run with any number of MPI tasks between 1 and N
  • Ability to run with MPI+OpenMP threads such that each MPI rank spawns a configurable number of OpenMP threads
  • A customizable processor grid for distribution of MPI ranks (e.g. 4x4, 2x8, 1x16)
  • A default processor grid that maximizes the "square-ness" of the grid for to minimize communication
  • Automatic domain decomposition of data onto the chosen processor grid
  • All options configurable at runtime in the input namelist

Describe alternatives you've considered

MPI is a ubiquitous and standard means of parallelizing across nodes of a supercomputer. While there may be other ways of achieving that, having MPI parallelism is necessary for establishing a baseline of performance against which other implementations should be compared.

@christopherwharrop-noaa
Copy link
Collaborator Author

Adding MPI will be a large change. If possible, it would be easier to evaluate and make progress if it could be broken down into smaller pieces. It isn't clear to me yet what those pieces should be. We can discuss here. To start, I'm going to throw out an initial breakdown for us to discuss and refine.

  1. Implement halos for the existing code. This would simply extend the dimensions of existing arrays without adding any parallelism. The code would not run in parallel, and would loop over the same indices as it does in serial, but the arrays would be properly dimensioned for halos.
  2. Implement calculation of domain decomposition for a default processor grid (as square as possible) for an arbitrary number of MPI ranks. This would be a routine that computes the local indices and allocates arrays for each tile. Allocations and loop indices would be adjusted as needed. Still no parallelism, but addition of tests to verify that the decomposition works would be added.
  3. Add MPI_Init() and implement a halo exchange. This would be a routine for exchanging the data in the halo. Add a test to verify that the halo exchange works for multiple numbers of MPI ranks. Add tests to show running with different MPI ranks produces the same results.

This is just a starting point. Please comment/suggest adjustments as needed. Breaking parallelization down into smaller pieces may prove quite difficult. However, the smaller we can make the steps toward the final goal, the easer it will be to both implement, review, merge each step along the way.

@christopherwharrop-noaa
Copy link
Collaborator Author

A decision was made to not pursue a full MPI parallelization. Instead a simulation of parallel execution (see #35) is provided by running N copies of the kernel with MPI including simulating the work of a halo exchange operation using the N copies of the serial kernel. Since no further work on MPI will be pursued, this issue is being closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants