Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more automated support for creating+using distributed/split meshes #10458

Closed
rwcarlsen opened this issue Jan 4, 2018 · 6 comments
Closed

more automated support for creating+using distributed/split meshes #10458

rwcarlsen opened this issue Jan 4, 2018 · 6 comments
Assignees
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.

Comments

@rwcarlsen
Copy link
Contributor

rwcarlsen commented Jan 4, 2018

Rationale

Provide convenient workflows for working with distributed meshes with moose. Currently too many manual and poorly documented steps are required to properly use distributed meshes with moose.

Description

We want users to be able to:

mpiexec -ppn <low number> foo-opt -i input.i --pre-split --splits='500 1000'
mpiexec foo-opt -i input.i --use-split

For cases where the input file references a file-based mesh, we want the mesh splits to go in the same directory as the original file mesh file. For generated mesh cases, the user will need to specify a split destination directory via a CLI flag or otherwise moose should error out. For both cases the user can manually specify the split destination directory which will override any file-mesh based directory if one exists.

Impact

This will require the addition of some new CLI API in moose and possibly some new API to libmesh to support it. Just new functionality - shouldn't affect anything otherwise.

@permcody
Copy link
Member

permcody commented Jan 4, 2018

While we are working on this, we'll want to make sure we support the subdirectory creation and usage as well. Specifically, rather than creating splits in a single flat directory where the mesh is located, we should be created separate subdirectories for each split.

@permcody permcody added C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software. labels Jan 4, 2018
@YaqiWang
Copy link
Contributor

YaqiWang commented Jan 4, 2018

This is interesting to me. Can we do this with just one parameter?

mpirun -np 100 foo-opt -i input.i Mesh/pre_split=true

Adding --mesh-only in command line will just do the split. Number of processors will be passed in the app by mpirun.

@permcody
Copy link
Member

permcody commented Jan 4, 2018

Can we do this with just one parameter?

... Mesh/pre_split=true

Well with your above syntax you are missing the point. We are intentionally not designing this around input file syntax. For this workflow we really are trying to avoid having to modify the input file between splitting and running. I've slightly extended the command Robert put in his original description to show how you could potentially split and run with a single PBS job submission. These CLI flags would change the behavior of MOOSE fundamentally, not just add or change an input file parameter.

If you already had splits on the filesystem you could indeed just run with the split by submitting an input file with a normal looking Mesh block and just telling MOOSE to use the split instead (no input file modification necessary to switch over).

@permcody
Copy link
Member

permcody commented Jan 4, 2018

Oh, I realize I left off one really important detail about why you will almost always need two separate commands. Note the -ppn <low number> argument I added. Let me explain this for you and anyone else looking at this ticket.

Let's assume you've submitted a job where you've requested several large "chunks" (usually whole cluster nodes). You'll be able to take advantage of all of the processing cores during the simulation, but not during the split. The reason is due to the large memory requirement of just reading in the whole mesh on every processor. Even if you intend to run in distributed mode, you may run out of memory before the program reaches the point where it can discard unneeded elements. The -ppn <n> argument overrides the MPI behavior where the number (and order) of ranks comes from the MPI hosts file (normally created by your queuing software automatically). If you give it a low argument, like say 4, you'll limit MPI to spawning just 4 ranks on a given node lowering your overall processor usage. That's actually ideal for the splitting. You can use some amount of parallelism to perform the splitting task (and thus, much more memory per rank), followed by your real run which will occupy each whole chunk.

Now remember, you can always just split ahead of time on a few nodes, or maybe even on the a different high memory system if you have one available.

@YaqiWang
Copy link
Contributor

YaqiWang commented Jan 5, 2018

Suppose we have a mesh file, if we just say Mesh/parallel_type=DISTRIBUTED, currently every processor will load the mesh, partition the mesh and then remove the remote elements if I am understanding correctly. This is typically not an issue because the memory overhead of loading the entire mesh file will be later compensated by the memory usage of others like solution vector, Jacobian, etc. The compensation becomes smaller when the number of processors is getting larger because those memory usage is probably very scalable. Beyond some point of the number of processors (this number is probably problem dependent, but we can check for simple Poisson equation to get an idea, what this number is), the memory spike caused by mesh loading will exceed the total memory usage after everything is set up. So we will need to distribute the mesh differently by doing the pre-splitting. What I am proposing is to make the splitting transparent to users. The entire calculation will have two steps but users will not see. The first step, possibly only one processor on a computer node will be active and do the split. Once the mesh is split, the parts will be passed to other processors. All of this is controlled by Mesh/pre_split=true. This parameter essentially controls how mesh is loaded and distributed. The default will be loading on all processors. This requires the proper selection of processors among all of them for partitioning (not sure if MPI_Get_processor_name can be used for this) and fairly complicated communication control. Note that we do not have to have the first step write the split mesh into files. Using --mesh-only, the split mesh will be outputted. If users already have the pre-split mesh, the file type will be different, thus the mesh will be directly loaded.

@friedmud
Copy link
Contributor

friedmud commented Jan 8, 2018

Firstly: if your mesh is so large that it would cause you to run out of memory... you will most likely want to actually PRE-split your mesh: i.e. split many times with a dedicated job for doing that and store away the mesh files so that you can run the calculation using those files many times.

Cody's example of having this all in one MPI job is just that: an example. We don't see it actually being used this way often.

Secondly: You can't do this all in one job. There is no way to know how much memory each MPI process is allowed to use. While many people do often run by using "whole nodes" - it is becoming more common to run in "scatter" mode where MPI processes are scattered across the cluster. In that configuration an individual process would only have access to 1/36th of the total memory of the node and loading even a medium sized mesh could cause it run out of memory.

The way we're going to encourage people to use this capability is to run a "splitting" job that uses, say, 4 nodes and splits for all the numbers of procs you plan on running on. In addition: the splits should get stored onto the high-speed filesystem to make starting jobs even faster.

After that you simply run your job like normal... but you pass the --use-split option.

@rwcarlsen rwcarlsen self-assigned this Jan 15, 2018
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Jan 26, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Jan 30, 2018
This required adding the ability to specify a "final" task to run in the
action warehouse.  The setFinalTask function can be called by anybody to
make any task the last one to run.

Cleans up a special case in preparation for work on idaholab#10458.
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Jan 31, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 2, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 2, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 12, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 12, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 13, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018
Fixes idaholab#10458.  Note that the mesh splitting won't work right until we
get another libmesh update in :-(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.
Projects
None yet
Development

No branches or pull requests

4 participants