-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement convenience workflow for using distributed mesh #10623
Conversation
Job Documentation on 59b8479 wanted to post the following: View the site here This comment will be updated on new commits. |
framework/src/base/MooseApp.C
Outdated
// splits s with whitespace and commas as delimiters and converts the split elements to integers | ||
// returned in a vector. | ||
std::vector<unsigned int> | ||
splitstr(const std::string & s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How convenient... I'll switch.
framework/src/base/MooseApp.C
Outdated
@@ -562,20 +600,45 @@ MooseApp::runInputFile() | |||
|
|||
_action_warehouse.executeAllActions(); | |||
|
|||
if (isParamValid("mesh_only")) | |||
if (getParam<std::string>("split_mesh") != "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend that you break out this new logic into a subroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you probably don't want to run all Actions when doing the splits. Building the equation system might be really expensive for instance. A much better design would be to run the subset of Actions needed to build the mesh and then do the splits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll piggy back on mesh-only logic.
framework/src/base/MooseApp.C
Outdated
"", | ||
"comma-separated list of numbers of chunks to split the mesh into"); | ||
|
||
params.addCommandLineParam<std::string>("split_file", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this parameter should be pluralized? split_files
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but the user only passes a single file name as the argument to that flag - so it is a bit weird no matter how you do the pluralization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm ok with split_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MOOSE we call these file_bases. Perhaps that would be a better name (more in line with what users see in the Outputs
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but there is actually a file that gets created that is named exactly what the user types in there. As currently implemented, it is left to the user to type in the ".cpr" extension. This was done to make is easy for tab completion to work if the user is trying to use an already split mesh. I think it would be confusing to call it [bla]-base
if we expect the user to type in a file extension.
* Returns the name of the mesh file read to produce this mesh if any or an empty string | ||
* otherwise. | ||
*/ | ||
virtual std::string getFileName() const { return ""; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable new API to add, but I know we have several other mesh formats that read files spread all over (there maybe another in the framework, there are several in the modules, etc). You should probably take a look and make them all override this new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look. Not sure what to do about the meshes that read multiple files though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done.
* Removes any file extension from the fiven string s (i.e. any ".[extension]" suffix of s) and | ||
* returns the result. | ||
*/ | ||
std::string stripExtension(const std::string & s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't strip extensions - it splits file name from prefix dirpaths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the method called stripExtensions doesn't actually strip extensions 😛 ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - the function you linked (splitFileName
) doesn't strip extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - just a few things.
framework/src/base/MooseApp.C
Outdated
// splits s with whitespace and commas as delimiters and converts the split elements to integers | ||
// returned in a vector. | ||
std::vector<unsigned int> | ||
splitstr(const std::string & s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have MooseUtils::tokenize()
that does this.
framework/src/base/MooseApp.C
Outdated
@@ -562,20 +600,45 @@ MooseApp::runInputFile() | |||
|
|||
_action_warehouse.executeAllActions(); | |||
|
|||
if (isParamValid("mesh_only")) | |||
if (getParam<std::string>("split_mesh") != "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not isParamValid()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That always returns true for parameters that have a default. I suppose split_mesh doesn't need a default though - I'll take a look.
framework/src/base/MooseApp.C
Outdated
auto & mesh = _action_warehouse.mesh(); | ||
|
||
if (mesh->getFileName() == "" && getParam<std::string>("split_file") == "") | ||
mooseError("must specify an output mesh file name (--split-file) when splitting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
framework/src/base/MooseApp.C
Outdated
processor_id_type n = splits[i]; | ||
Moose::out << "splitting " << n << " ways..." << std::endl; | ||
|
||
auto cpr = libMesh::split_mesh(*mesh, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the reset to 1 partition happen inside split_mesh()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: maybe the libMesh interface should really just take the vector of splits... and do this loop internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought about that, but this allows the callers of the API more flexibility on how they want to handle the checkpoint object - they can switch it into ascii mode, etc. without having to pass a bajillion options into the splitting API. The goal was to make the libmesh splitting API do as close to one thing as possible and still preserve flexibility for callers.
framework/src/base/MooseApp.C
Outdated
for (std::size_t i = 0; i < splits.size(); i++) | ||
{ | ||
processor_id_type n = splits[i]; | ||
Moose::out << "splitting " << n << " ways..." << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good.
if (_app.parameters().get<bool>("use_split")) | ||
{ | ||
auto split_file = _app.parameters().get<std::string>("split_file"); | ||
if (_type != "FileMesh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is right. We have several other FileMesh derived types. I know you haven't instantiated the object here yet so using a virtual method or some other normal trick doesn't work but we might be able to make this all more robust with the old validParams trick (embedding a "base" type in validParams).
auto mesh = _app.actionWarehouse().mesh(); | ||
|
||
if (mesh->getFileName() == "" && _app.parameters().get<std::string>("split_file") == "") | ||
mooseError("must specify an output mesh file name (--split-file) when splitting " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't in the style guide but, we typically write sentences (start with a capital letter) in all of our error messaging throughout MOOSE - please change.
|
||
auto splitstr = _app.parameters().get<std::string>("split_mesh"); | ||
std::vector<unsigned int> splits; | ||
bool success = MooseUtils::tokenizeAndConvert(splitstr, splits, ", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Moose::out << "splitting " << n << " ways..." << std::endl; | ||
|
||
auto cpr = libMesh::split_mesh(*mesh, n); | ||
Moose::out << " * writing " << cpr->current_processor_ids().size() << " files per process..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: OK, but we really should revisit the whole console stream interface so that we can use it more consistently. I know you can't here because your messages would be buffered.
for (std::size_t i = 0; i < splits.size(); i++) | ||
{ | ||
processor_id_type n = splits[i]; | ||
Moose::out << "splitting " << n << " ways..." << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More lower case messages coming out here. We might need to have a discussion on this.
framework/src/base/MooseApp.C
Outdated
{ | ||
_syntax.registerTaskName("split_mesh", true); | ||
_syntax.addDependency("split_mesh", "setup_mesh_complete"); | ||
_action_warehouse.setFinalTask("split_mesh"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
framework/src/base/MooseApp.C
Outdated
@@ -527,6 +542,12 @@ MooseApp::setupOptions() | |||
_syntax.addDependency("mesh_only", "setup_mesh_complete"); | |||
_action_warehouse.setFinalTask("mesh_only"); | |||
} | |||
else if (getParam<std::string>("split_mesh") != "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does isParamValid()
not work here and in other usages of this new parameter?
* Removes any file extension from the fiven string s (i.e. any ".[extension]" suffix of s) and | ||
* returns the result. | ||
*/ | ||
std::string stripExtension(const std::string & s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This is necessary because checkpoint files will be in different locations and named differently depending on mesh splitting and processor number. The naive file check we did before won't work. So for now we skip the check for cpr/cpa files. This is necessary for an upcoming libmesh PR that is necessary for the upcoming idaholab#10623 moose PR.
000ecc5
to
f46b090
Compare
82279e7
to
874c694
Compare
This should be ready for another look now that the new libmesh stuff is in. |
new_pars.set<MeshFileName>("file") = split_file; | ||
new_pars.set<MooseApp *>("_moose_app") = _moose_object_pars.get<MooseApp *>("_moose_app"); | ||
new_pars.set<MooseEnum>("parallel_type") = _moose_object_pars.get<MooseEnum>("parallel_type"); | ||
new_pars.set<MooseEnum>("parallel_type") = "distributed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, two sets in a row for the same parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
{ | ||
if (split_file == "") | ||
mooseError("cannot use split mesh with non-file mesh without specifying --split-file on " | ||
"command line"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is OK, it would be cool if we could use the virtual function but that might be a little tricky before the object is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For generated mesh, we don't know the name of the pre-split files. There is no way we could divine it. The autosplit generated mesh and run in one go is a separate thing that will have to be implemented later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @rwcarlsen here - this is the right thing to do.
@@ -152,6 +153,34 @@ SetupMeshAction::setupMesh(MooseMesh * mesh) | |||
void | |||
SetupMeshAction::act() | |||
{ | |||
// switch non-file meshes to be a file-mesh if using a pre-split mesh configuration. | |||
if (_app.parameters().get<bool>("use_split")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to run this logic twice (or more). You only need to run it during the "setup_mesh" task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are saying here. Could you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the code below this if statement (e.g. if _current_task == ...
). This Action gets fired multiple time to perform different tasks. You only need to run this new logic for the setup task, not the init task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, it's not about efficiency or anything, just getting it right logically so we don't look back at this later and wonder why this block runs for every task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I see - good catch. It is easy to forget that multiple tasks can use/run the same action.
874c694
to
db412d7
Compare
Just waiting for a review update from @friedmud. |
Grabbing this now... |
b0184a9
to
f4f20de
Compare
Fixes idaholab#10458. Note that the mesh splitting won't work right until we get another libmesh update in :-(
f4f20de
to
add095b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add .md
documentation on how to use this.
When you didn't specify mpiexec -n 4 ../../../moose_test-opt -i simple_diffusion.i --split-mesh "2,4"
*** ERROR ***
Output mesh file name must be specified (with --split-file) when splitting non-file-based meshes
*** ERROR ***
Output mesh file name must be specified (with --split-file) when splitting non-file-based meshes
*** ERROR ***
Output mesh file name must be specified (with --split-file) when splitting non-file-based meshes
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
*** ERROR ***
Output mesh file name must be specified (with --split-file) when splitting non-file-based meshes
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 If you're running on thousands of procs that is going to suck. Let's just output that error on proc 0. |
When we specify a filename using |
So I've often thought about what to do about parallel mooseErrors. It's actually a really big pain because we have mooseErrors sprinkled all over the code and more often than not, they are in common code paths so this probably isn't isolated. How many times have you submitted an input file with a simple problem only to get a thousand lines of mooseErrors from every rank? Outputting on processor zero is actually reasonable when we know it's on a common code path. Maybe we can give this some thought and be smart about when errors come out? I guess for now for this very common and specific error we can add one more |
Actually: we should reject the filename if you don't put It turns out we can't read it!
|
@permcody what about just having a |
Or - I guess, just an argument to |
The problem with that is that it relies on the developer using it correctly which might not always be as easy as it seems. You run the risk of flying right through an error condition if for some reason processor zero doesn't encounter the error but some of the other ranks do, but are told to just carry on. I can think of two other alternatives that might be better (might not). We could use the libMesh "parallel_only" macro that does an assert to make sure the error is in a parallel region. The problem with that, is the assert goes away in your big parallel run when you actually need it to error and you still miss the error in the worst case. Another alternative would be to still call abort on the non-zero ranks but not print? The downside to that is that on a really big job you run the risk of just terminating with no error message what-so-ever which would probably be way worse than seeing a thousand of the same message. I'm not sure I like any alternative. |
cpr->binary() = true; | ||
auto fname = MooseUtils::stripExtension(mesh->getFileName()) + ".cpr"; | ||
if (mesh->getFileName() == "") | ||
fname = _app.parameters().get<std::string>("split_file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few ways to go here. Line 52 could be before line 50... but that would mean that running the same command-line twice might not work.
Probably better just to error if the split_file
does not have a .cpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - let's add the .cpr
here... and then make sure it gets added properly when we're reading in too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that it is cpa or the user renamed it to something random though. I think if the user uses the --split-file
flag we should just use that wholesale - this especially makes sense with tab completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't work though - have you tried it?
If the user names it anything other than .cpr
it doesn't work. I think we should make sure that .cpr
is on the end (if the user passes it just leave it alone - if they don't then append it... for both writing and reading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the check for the extension when writing.
Option #3: Just get in the habit of using separate file outputs for big jobs or ignoring your error stream instead of combining it into your output stream. This is what I've done recently because I almost never want to see errors. |
@permcody : well I guess it's not something to solve here |
Recommendations for where (what file(s)) to put docs? This isn't really a system/object so it can't really follow the "docs go in the file corresponding to this class/object" pattern. |
Add a short section here: http://mooseframework.org/moose/documentation/systems/Mesh/index.html Then link to a longer explanation. |
Alright - I pushed up some docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is 👍
But we need to get the name extension thing right...
cpr->binary() = true; | ||
auto fname = MooseUtils::stripExtension(mesh->getFileName()) + ".cpr"; | ||
if (mesh->getFileName() == "") | ||
fname = _app.parameters().get<std::string>("split_file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't work though - have you tried it?
If the user names it anything other than .cpr
it doesn't work. I think we should make sure that .cpr
is on the end (if the user passes it just leave it alone - if they don't then append it... for both writing and reading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add the .cpr
for both writing and reading or you lose symmetry in the commands:
[gastdr][~/projects/moose/test/tests/kernels/simple_diffusion]> ../../../moose_test-opt -i simple_diffusion.i --split-mesh 4 --split-file junk
Splitting 4 ways...
- writing 4 files per process...
[gastdr][~/projects/moose/test/tests/kernels/simple_diffusion]> ls
gold junk.cpr simple_diffusion.i simple_diffusion_out.e speedtests tests
[gastdr][~/projects/moose/test/tests/kernels/simple_diffusion]> mpiexec -n 4 ../../../moose_test-opt -i simple_diffusion.i --use-split --split-file junk
*** ERROR ***
cannot locate mesh file 'junk'
d802689
to
59b8479
Compare
Done. |
I'll merge this when the testing is complete! Thanks for working on this... it's really slick! |
follow up to idaholab#10623
follow up to idaholab#10623
follow up to idaholab#10623
follow up to idaholab#10623
A few parts of this implementation feel a bit hacky - suggestions on improving it are certainly welcome.
Fixes #10458. Note that the mesh splitting won't work right until we
get another libmesh update in :-(