New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RankMap #12630
Add RankMap #12630
Conversation
703cac9
to
94ea949
Compare
@@ -0,0 +1,65 @@ | |||
#include "RankMap.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header
framework/include/utils/RankMap.h
Outdated
*/ | ||
const std::vector<processor_id_type> & ranks(unsigned int hardware_id) const | ||
{ | ||
return _hardware_id_to_ranks.at(hardware_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to use at()
? It throws and doesn't terminate MPI "well" in parallel. If you intend for this to be a MOOSE utility it probably should be changed to provide a better error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will be widely used outside of the MOOSE team so it's fine if you want to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with it throwing... but you can tell me if you'd rather have me change it.
It has to be .at()
or some find()
thing though because the function is const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we've switched a few of these to this:
auto item = map.find()
if (item == map.end())
mooseError("Item not found");
return *item; // or item->first
If you ever hit one of these at()
in parallel it can be infuriating. If a non-zero ranks gets there first, your application will just quit and you won't get any errors in the error log, nothing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I'll change it 👍
{ | ||
current_id = next_id++; | ||
|
||
world_rank_to_hardware_id.emplace_hint(it, world_rank, current_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
CSV differ errors out with |
c458789
to
ecd632a
Compare
Job Documentation on 86c1dd8 wanted to post the following: View the site here This comment will be updated on new commits. |
@@ -47,18 +47,24 @@ validParams<WorkBalance>() | |||
WorkBalance::WorkBalance(const InputParameters & parameters) | |||
: GeneralVectorPostprocessor(parameters), | |||
_system(getParam<MooseEnum>("system")), | |||
_rank_map(_app.rankMap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to design this with composition or inheritance? I'd prefer not to have to add a VPP right in the core of MOOSE when most people probably won't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could make an unique_ptr to RankMap instead. Then when it's requested through the accessor you can build it? That would avoid having that object active for every simulation on the planet.
ecd632a
to
cdf6404
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closing due to 30 days of inactivity. See http://mooseframework.org/moose/framework_development/patch_to_code.html |
cc06377
to
021d92b
Compare
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and a few things that probably should be cleaned up. Thanks
// Each process on the same node will end up with the same world_rank | ||
processor_id_type world_rank = 0; | ||
|
||
#ifdef LIBMESH_HAVE_MPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting is supported in the libMesh communicator object. You can drop this ifdef and and the raw MPI here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I opened a ticket so we can eventually pull this back out:
libMesh/libmesh#2128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI_Comm_split_type
is not in libMesh
framework/src/utils/RankMap.C
Outdated
// Assign a contiguous unique numerical id to each shared memory group | ||
unsigned int next_id = 0; | ||
|
||
for (auto pid = beginIndex(world_ranks); pid < world_ranks.size(); pid++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beginIndex -> MooseIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
framework/include/utils/RankMap.h
Outdated
PerfID _construct_timer; | ||
|
||
/// Map of hardware_id -> ranks on that node | ||
std::map<unsigned int, std::vector<processor_id_type>> _hardware_id_to_ranks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible unordered_map here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
framework/include/utils/RankMap.h
Outdated
* Returns the "hardware ID" (a unique ID given to each physical compute node in the job) | ||
* for a given processor ID (rank) | ||
*/ | ||
unsigned int hardwareID(processor_id_type pid) const { return _rank_to_hardware_id[pid]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably should have a mooseAssert() - yes I realize that it will always be sized by num_processors, but if somebody passes something stupid like a DOF number to it... Just sayin' 😉
} | ||
|
||
// Free up the memory | ||
MPI_Comm_free(&shmem_raw_comm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go too if you use the communicator wrapper.
@@ -47,18 +47,24 @@ validParams<WorkBalance>() | |||
WorkBalance::WorkBalance(const InputParameters & parameters) | |||
: GeneralVectorPostprocessor(parameters), | |||
_system(getParam<MooseEnum>("system")), | |||
_rank_map(_app.rankMap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could make an unique_ptr to RankMap instead. Then when it's requested through the accessor you can build it? That would avoid having that object active for every simulation on the planet.
framework/include/utils/RankMap.h
Outdated
const std::vector<unsigned int> & rankHardwareIds() const { return _rank_to_hardware_id; } | ||
|
||
protected: | ||
PerfID _construct_timer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been making these const. May as well, they never change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
We should update the documentation to recommend this.
Added a new utility object:
RankMap
and used it to fix a bug inMemoryUsage
and add some new capability toWorkBalance
. Also added aHardwareIDAux
that can paint the assignment of elements to compute nodes in the cluster into a field.Here's an example using WorkBalance and VectorPostprocessorVisualizationAux to look at the amount of internode communication using two different partitioners:
And an example of using HardwareIDAux to diagnose how MPI process placement on the nodes effects METIS and a Hierarchcial partitioner (for better descriptions of both of these cases look at the documentation added here).
I also think this object will be useful in helping with placement of apps within a MultiApp in the future...
closes #12629