Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of usage with MPI parallelism #29

Closed
ashwinvis opened this issue Mar 13, 2023 · 6 comments
Closed

Example of usage with MPI parallelism #29

ashwinvis opened this issue Mar 13, 2023 · 6 comments
Assignees

Comments

@ashwinvis
Copy link

While in the article you state that

In particular, MPI is used to divide time and depth

I did not find an example which demonstrates this in the Tutorials. We only see OpenMP being used and SLURM_NTASKS is always set as 1. Would it be possible to construct a simple example which shows MPI parallelism. This is necessary to check out from openjournals/joss-reviews#4277 (comment):

Functionality: Have the functional claims of the software been confirmed?

@bastorer bastorer self-assigned this Mar 14, 2023
@bastorer
Copy link
Collaborator

It hadn't even occurred to me that I didn't have any MPI tutorials. Thanks for point that out! I'll fix that this week :-)

bastorer added a commit that referenced this issue Mar 24, 2023
@bastorer
Copy link
Collaborator

Took a little longer than planned, sorry, but an MPI tutorial is now included. It follows almost exactly the 'Low Resolution' spherical case for consistency, but with multiple depth levels to allow multiple MPI ranks. When used on the recommended number of processors, it runs in a couple minutes.

@ashwinvis
Copy link
Author

40 processors is a bit much, so I have to get a supercomputer to test it. Would it be possible to reduce the requirements such that even with 8 processors you get something in a few minutes?

@bastorer
Copy link
Collaborator

I've updated the ABOUT_TUTORIAL.md to include a note / instructions on how to adjust the tutorial (changing a single number in the generate_data_sphere.py scripts) to allow the tutorial to run on fewer processors in reasonable time.

The default setting is 24 processors running for ~10 minutes.

Does that seem reasonable?

Reducing the MPI-requirement

24 processors is a fairly heavy requirement if you are not running on a computing cluster.
You can simply run on fewer processors (highest efficiency if the number of processors divides evenly into 48 - the number of vertical levels), but at the cost of increasing the runtime.

To reduce the processor cost without increasing runtime, you can decrease the number of vertical levels proportionately. E.g. you can reduce the vertical levels to 12 in order to run on 6 processors in a similar amount of time.

To adjust the number of vertical levels, you can adjust line 13 of generate_data_sphere.py, which reads Nlon, Nlat, Ndepth = int(360//2), int(180//2), 48. The last number, 48, specifies the number of vertical levels.

When running the code, you can use any number of MPI ranks up to the number of verticals levels, but the most efficient use of processors occurs when the number of MPI ranks divides evenly into the number of vertical levels.

@bastorer
Copy link
Collaborator

bastorer commented Apr 6, 2023

For something that runs in ~5 minutes on 8 processors, setting the number of vertical levels to 8 (last number on line 13 of generate_data_sphere.py, change from 48 to 8) should do the trick.

@ashwinvis
Copy link
Author

Managed to run with 8 vertical levels and 4 processors in a laptop for ~30 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants