-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more automated support for creating+using distributed/split meshes #10458
Comments
While we are working on this, we'll want to make sure we support the subdirectory creation and usage as well. Specifically, rather than creating splits in a single flat directory where the mesh is located, we should be created separate subdirectories for each split. |
This is interesting to me. Can we do this with just one parameter?
Adding |
Well with your above syntax you are missing the point. We are intentionally not designing this around input file syntax. For this workflow we really are trying to avoid having to modify the input file between splitting and running. I've slightly extended the command Robert put in his original description to show how you could potentially split and run with a single PBS job submission. These CLI flags would change the behavior of MOOSE fundamentally, not just add or change an input file parameter. If you already had splits on the filesystem you could indeed just run with the split by submitting an input file with a normal looking Mesh block and just telling MOOSE to use the split instead (no input file modification necessary to switch over). |
Oh, I realize I left off one really important detail about why you will almost always need two separate commands. Note the Let's assume you've submitted a job where you've requested several large "chunks" (usually whole cluster nodes). You'll be able to take advantage of all of the processing cores during the simulation, but not during the split. The reason is due to the large memory requirement of just reading in the whole mesh on every processor. Even if you intend to run in distributed mode, you may run out of memory before the program reaches the point where it can discard unneeded elements. The Now remember, you can always just split ahead of time on a few nodes, or maybe even on the a different high memory system if you have one available. |
Suppose we have a mesh file, if we just say |
Firstly: if your mesh is so large that it would cause you to run out of memory... you will most likely want to actually PRE-split your mesh: i.e. split many times with a dedicated job for doing that and store away the mesh files so that you can run the calculation using those files many times. Cody's example of having this all in one MPI job is just that: an example. We don't see it actually being used this way often. Secondly: You can't do this all in one job. There is no way to know how much memory each MPI process is allowed to use. While many people do often run by using "whole nodes" - it is becoming more common to run in "scatter" mode where MPI processes are scattered across the cluster. In that configuration an individual process would only have access to 1/36th of the total memory of the node and loading even a medium sized mesh could cause it run out of memory. The way we're going to encourage people to use this capability is to run a "splitting" job that uses, say, 4 nodes and splits for all the numbers of procs you plan on running on. In addition: the splits should get stored onto the high-speed filesystem to make starting jobs even faster. After that you simply run your job like normal... but you pass the |
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
This required adding the ability to specify a "final" task to run in the action warehouse. The setFinalTask function can be called by anybody to make any task the last one to run. Cleans up a special case in preparation for work on idaholab#10458.
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
Rationale
Provide convenient workflows for working with distributed meshes with moose. Currently too many manual and poorly documented steps are required to properly use distributed meshes with moose.
Description
We want users to be able to:
For cases where the input file references a file-based mesh, we want the mesh splits to go in the same directory as the original file mesh file. For generated mesh cases, the user will need to specify a split destination directory via a CLI flag or otherwise moose should error out. For both cases the user can manually specify the split destination directory which will override any file-mesh based directory if one exists.
Impact
This will require the addition of some new CLI API in moose and possibly some new API to libmesh to support it. Just new functionality - shouldn't affect anything otherwise.
The text was updated successfully, but these errors were encountered: