Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow "non-competitive" initialization of sub-apps #24264

Closed
aprilnovak opened this issue May 3, 2023 · 0 comments · Fixed by #24265
Closed

Allow "non-competitive" initialization of sub-apps #24264

aprilnovak opened this issue May 3, 2023 · 0 comments · Fixed by #24265
Labels
C: Framework T: task An enhancement to the software.

Comments

@aprilnovak
Copy link
Contributor

Reason

Cardinal runs NekRS simulations within MOOSE. If we have > 1 NekRS simulation in a given run (such as when multiple Nek sub-apps are provided with the MultiApp/positions parameter, or when running a StochasticTools simulation with num_rows > 1), MultiApp::createApps will try to create all N NekRS sub-apps concurrently.

However, NekRS is not a MOOSE-based application, and has some special requirements on running > 1 NekRS case concurrently. NekRS does JIT compilation of GPU kernels, plus a bunch of other stuff, and writes temporary files to a .cache/ directory in the same folder as your input files.

cardinal-opt -i nek.i

# first thing this does is JIT compile and write files into a .cache/ directory

This JIT compile MUST happen in a "quiet" environment. If a MultiApp is trying to spawn 5 different NekRS sub-apps, they will clobber each other and overwrite the contents of the .cache/, causing catastrophic failures.

We are integrating NekRS with the Stochastic Tools module. If we want to run a stochastic simulation with 500 samples, this means that we either:

  • Need to launch our job with 500 MPI ranks and set min_procs_per_app = 500 so that only a single Nek sub-app is created. This is not going to be feasible for most scenarios, because the number of stochastic samples is typically quite high, imposing artificial constraints on the number of processes in a parallel job (and tangible impacts on long queue times). This also could be a bad idea for performance, if each individual Nek simulation is small, we don't want to artificially force them to run with loads of processes.
  • OR, add some logic that will "pause" the activity of all non-rank-0 MPI processes while the very first Nek sub-app is JIT compiled, and then resume parallel instantiation.

Design

Add an option to MultiApp which changes how the sub-apps are created, so that the first app on rank 0 is constructed "in quiet" while everybody else waits. Only after that, let all the other ranks resume initialization of their apps.

I think this is the simplest way to accomplish this, but am open to suggestions.

Impact

New feature. Necessary for NEAMS milestone.

@aprilnovak aprilnovak added the T: task An enhancement to the software. label May 3, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 3, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 3, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 4, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 4, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 5, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 8, 2023
aprilnovak added a commit to aprilnovak/moose that referenced this issue May 8, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
Elley-Folks pushed a commit to Elley-Folks/Elley-moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 10, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 11, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 11, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 11, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 11, 2023
milljm pushed a commit to milljm/moose that referenced this issue May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework T: task An enhancement to the software.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants