more automated support for creating+using distributed/split meshes #10458

rwcarlsen · 2018-01-04T00:00:31Z

Rationale

Provide convenient workflows for working with distributed meshes with moose. Currently too many manual and poorly documented steps are required to properly use distributed meshes with moose.

Description

We want users to be able to:

mpiexec -ppn <low number> foo-opt -i input.i --pre-split --splits='500 1000'
mpiexec foo-opt -i input.i --use-split

For cases where the input file references a file-based mesh, we want the mesh splits to go in the same directory as the original file mesh file. For generated mesh cases, the user will need to specify a split destination directory via a CLI flag or otherwise moose should error out. For both cases the user can manually specify the split destination directory which will override any file-mesh based directory if one exists.

Impact

This will require the addition of some new CLI API in moose and possibly some new API to libmesh to support it. Just new functionality - shouldn't affect anything otherwise.

The text was updated successfully, but these errors were encountered:

permcody · 2018-01-04T00:21:36Z

While we are working on this, we'll want to make sure we support the subdirectory creation and usage as well. Specifically, rather than creating splits in a single flat directory where the mesh is located, we should be created separate subdirectories for each split.

YaqiWang · 2018-01-04T04:09:19Z

This is interesting to me. Can we do this with just one parameter?

mpirun -np 100 foo-opt -i input.i Mesh/pre_split=true

Adding --mesh-only in command line will just do the split. Number of processors will be passed in the app by mpirun.

permcody · 2018-01-04T15:09:01Z

Can we do this with just one parameter?
... Mesh/pre_split=true

Well with your above syntax you are missing the point. We are intentionally not designing this around input file syntax. For this workflow we really are trying to avoid having to modify the input file between splitting and running. I've slightly extended the command Robert put in his original description to show how you could potentially split and run with a single PBS job submission. These CLI flags would change the behavior of MOOSE fundamentally, not just add or change an input file parameter.

If you already had splits on the filesystem you could indeed just run with the split by submitting an input file with a normal looking Mesh block and just telling MOOSE to use the split instead (no input file modification necessary to switch over).

permcody · 2018-01-04T15:20:54Z

Oh, I realize I left off one really important detail about why you will almost always need two separate commands. Note the -ppn <low number> argument I added. Let me explain this for you and anyone else looking at this ticket.

Let's assume you've submitted a job where you've requested several large "chunks" (usually whole cluster nodes). You'll be able to take advantage of all of the processing cores during the simulation, but not during the split. The reason is due to the large memory requirement of just reading in the whole mesh on every processor. Even if you intend to run in distributed mode, you may run out of memory before the program reaches the point where it can discard unneeded elements. The -ppn <n> argument overrides the MPI behavior where the number (and order) of ranks comes from the MPI hosts file (normally created by your queuing software automatically). If you give it a low argument, like say 4, you'll limit MPI to spawning just 4 ranks on a given node lowering your overall processor usage. That's actually ideal for the splitting. You can use some amount of parallelism to perform the splitting task (and thus, much more memory per rank), followed by your real run which will occupy each whole chunk.

Now remember, you can always just split ahead of time on a few nodes, or maybe even on the a different high memory system if you have one available.

YaqiWang · 2018-01-05T21:41:04Z

Suppose we have a mesh file, if we just say Mesh/parallel_type=DISTRIBUTED, currently every processor will load the mesh, partition the mesh and then remove the remote elements if I am understanding correctly. This is typically not an issue because the memory overhead of loading the entire mesh file will be later compensated by the memory usage of others like solution vector, Jacobian, etc. The compensation becomes smaller when the number of processors is getting larger because those memory usage is probably very scalable. Beyond some point of the number of processors (this number is probably problem dependent, but we can check for simple Poisson equation to get an idea, what this number is), the memory spike caused by mesh loading will exceed the total memory usage after everything is set up. So we will need to distribute the mesh differently by doing the pre-splitting. What I am proposing is to make the splitting transparent to users. The entire calculation will have two steps but users will not see. The first step, possibly only one processor on a computer node will be active and do the split. Once the mesh is split, the parts will be passed to other processors. All of this is controlled by Mesh/pre_split=true. This parameter essentially controls how mesh is loaded and distributed. The default will be loading on all processors. This requires the proper selection of processors among all of them for partitioning (not sure if MPI_Get_processor_name can be used for this) and fairly complicated communication control. Note that we do not have to have the first step write the split mesh into files. Using --mesh-only, the split mesh will be outputted. If users already have the pre-split mesh, the file type will be different, thus the mesh will be directly loaded.

friedmud · 2018-01-08T16:43:09Z

Firstly: if your mesh is so large that it would cause you to run out of memory... you will most likely want to actually PRE-split your mesh: i.e. split many times with a dedicated job for doing that and store away the mesh files so that you can run the calculation using those files many times.

Cody's example of having this all in one MPI job is just that: an example. We don't see it actually being used this way often.

Secondly: You can't do this all in one job. There is no way to know how much memory each MPI process is allowed to use. While many people do often run by using "whole nodes" - it is becoming more common to run in "scatter" mode where MPI processes are scattered across the cluster. In that configuration an individual process would only have access to 1/36th of the total memory of the node and loading even a medium sized mesh could cause it run out of memory.

The way we're going to encourage people to use this capability is to run a "splitting" job that uses, say, 4 nodes and splits for all the numbers of procs you plan on running on. In addition: the splits should get stored onto the high-speed filesystem to make starting jobs even faster.

After that you simply run your job like normal... but you pass the --use-split option.

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

This required adding the ability to specify a "final" task to run in the action warehouse. The setFinalTask function can be called by anybody to make any task the last one to run. Cleans up a special case in preparation for work on idaholab#10458.

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

permcody added C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software. labels Jan 4, 2018

rwcarlsen self-assigned this Jan 15, 2018

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Jan 26, 2018

implement convenience workflow for using distributed mesh

af78a99

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen mentioned this issue Jan 26, 2018

implement convenience workflow for using distributed mesh #10623

Merged

rwcarlsen mentioned this issue Jan 30, 2018

refactor mesh-only behavior into an action #10648

Merged

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Jan 31, 2018

implement convenience workflow for using distributed mesh

dd8fb93

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 2, 2018

implement convenience workflow for using distributed mesh

000ecc5

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 2, 2018

implement convenience workflow for using distributed mesh

f46b090

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 12, 2018

implement convenience workflow for using distributed mesh

82279e7

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 12, 2018

implement convenience workflow for using distributed mesh

874c694

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 13, 2018

implement convenience workflow for using distributed mesh

db412d7

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018

implement convenience workflow for using distributed mesh

b0184a9

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018

implement convenience workflow for using distributed mesh

f4f20de

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

rwcarlsen added a commit to rwcarlsen/moose that referenced this issue Feb 14, 2018

implement convenience workflow for using distributed mesh

add095b

Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(

friedmud closed this as completed in #10623 Feb 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more automated support for creating+using distributed/split meshes #10458

more automated support for creating+using distributed/split meshes #10458

rwcarlsen commented Jan 4, 2018 •

edited by permcody

Loading

permcody commented Jan 4, 2018

YaqiWang commented Jan 4, 2018 •

edited

Loading

permcody commented Jan 4, 2018 •

edited

Loading

permcody commented Jan 4, 2018

YaqiWang commented Jan 5, 2018 •

edited

Loading

friedmud commented Jan 8, 2018

more automated support for creating+using distributed/split meshes #10458

more automated support for creating+using distributed/split meshes #10458

Comments

rwcarlsen commented Jan 4, 2018 • edited by permcody Loading

Rationale

Description

Impact

permcody commented Jan 4, 2018

YaqiWang commented Jan 4, 2018 • edited Loading

permcody commented Jan 4, 2018 • edited Loading

permcody commented Jan 4, 2018

YaqiWang commented Jan 5, 2018 • edited Loading

friedmud commented Jan 8, 2018

rwcarlsen commented Jan 4, 2018 •

edited by permcody

Loading

YaqiWang commented Jan 4, 2018 •

edited

Loading

permcody commented Jan 4, 2018 •

edited

Loading

YaqiWang commented Jan 5, 2018 •

edited

Loading