Add RankMap #12630

friedmud · 2018-12-27T19:00:03Z

Added a new utility object: RankMap and used it to fix a bug in MemoryUsage and add some new capability to WorkBalance. Also added a HardwareIDAux that can paint the assignment of elements to compute nodes in the cluster into a field.

Here's an example using WorkBalance and VectorPostprocessorVisualizationAux to look at the amount of internode communication using two different partitioners:

And an example of using HardwareIDAux to diagnose how MPI process placement on the nodes effects METIS and a Hierarchcial partitioner (for better descriptions of both of these cases look at the documentation added here).

I also think this object will be useful in helping with placement of apps within a MultiApp in the future...

closes #12629

dschwen · 2019-01-01T19:40:49Z

framework/src/utils/RankMap.C

@@ -0,0 +1,65 @@
+#include "RankMap.h"


permcody · 2019-01-02T19:31:07Z

framework/include/utils/RankMap.h

+   */
+  const std::vector<processor_id_type> & ranks(unsigned int hardware_id) const
+  {
+    return _hardware_id_to_ranks.at(hardware_id);


Are you sure you want to use at()? It throws and doesn't terminate MPI "well" in parallel. If you intend for this to be a MOOSE utility it probably should be changed to provide a better error message.

I don't think this will be widely used outside of the MOOSE team so it's fine if you want to leave it.

I'm ok with it throwing... but you can tell me if you'd rather have me change it.

It has to be .at() or some find() thing though because the function is const.

Right, we've switched a few of these to this:

auto item = map.find() if (item == map.end()) mooseError("Item not found"); return *item; // or item->first

If you ever hit one of these at() in parallel it can be infuriating. If a non-zero ranks gets there first, your application will just quit and you won't get any errors in the error log, nothing...

Sure - I'll change it 👍

permcody · 2019-01-02T19:37:14Z

framework/src/utils/RankMap.C

+    {
+      current_id = next_id++;
+
+      world_rank_to_hardware_id.emplace_hint(it, world_rank, current_id);


dschwen · 2019-01-10T00:05:42Z

CSV differ errors out with KeyError: 'partition_harwdware_id_surface_area'. Did you forget to update a gold file?

moosebuild · 2019-01-10T01:12:32Z

Job Documentation on 86c1dd8 wanted to post the following:

View the site here

This comment will be updated on new commits.

permcody · 2019-01-10T15:57:38Z

framework/src/vectorpostprocessors/WorkBalance.C

@@ -47,18 +47,24 @@ validParams<WorkBalance>()
 WorkBalance::WorkBalance(const InputParameters & parameters)
  : GeneralVectorPostprocessor(parameters),
    _system(getParam<MooseEnum>("system")),
+    _rank_map(_app.rankMap()),


Is there no way to design this with composition or inheritance? I'd prefer not to have to add a VPP right in the core of MOOSE when most people probably won't use it.

Perhaps you could make an unique_ptr to RankMap instead. Then when it's requested through the accessor you can build it? That would avoid having that object active for every simulation on the planet.

stale · 2019-03-03T03:04:49Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-03-04T03:47:20Z

Closing due to 30 days of inactivity. See http://mooseframework.org/moose/framework_development/patch_to_code.html

…titioning and fix a big in MemoryUsageReporter closes idaholab#12629

permcody

A few minor comments and a few things that probably should be cleaned up. Thanks

permcody · 2019-05-08T22:19:28Z

framework/src/utils/RankMap.C

+  // Each process on the same node will end up with the same world_rank
+  processor_id_type world_rank = 0;
+
+#ifdef LIBMESH_HAVE_MPI


Splitting is supported in the libMesh communicator object. You can drop this ifdef and and the raw MPI here.

OK - I opened a ticket so we can eventually pull this back out:
libMesh/libmesh#2128

MPI_Comm_split_type is not in libMesh

permcody · 2019-05-08T22:20:09Z

framework/src/utils/RankMap.C

+  // Assign a contiguous unique numerical id to each shared memory group
+  unsigned int next_id = 0;
+
+  for (auto pid = beginIndex(world_ranks); pid < world_ranks.size(); pid++)


beginIndex -> MooseIndex

permcody · 2019-05-08T22:22:23Z

framework/include/utils/RankMap.h

+  PerfID _construct_timer;
+
+  /// Map of hardware_id -> ranks on that node
+  std::map<unsigned int, std::vector<processor_id_type>> _hardware_id_to_ranks;


possible unordered_map here?

permcody · 2019-05-08T22:24:56Z

framework/include/utils/RankMap.h

+   * Returns the "hardware ID" (a unique ID given to each physical compute node in the job)
+   * for a given processor ID (rank)
+   */
+  unsigned int hardwareID(processor_id_type pid) const { return _rank_to_hardware_id[pid]; }


You probably should have a mooseAssert() - yes I realize that it will always be sized by num_processors, but if somebody passes something stupid like a DOF number to it... Just sayin' 😉

permcody · 2019-05-08T22:25:55Z

framework/src/utils/RankMap.C

+  }
+
+  // Free up the memory
+  MPI_Comm_free(&shmem_raw_comm);


This can go too if you use the communicator wrapper.

framework/src/utils/RankMap.C

permcody · 2019-05-08T22:29:05Z

framework/src/vectorpostprocessors/WorkBalance.C

@@ -47,18 +47,24 @@ validParams<WorkBalance>()
 WorkBalance::WorkBalance(const InputParameters & parameters)
  : GeneralVectorPostprocessor(parameters),
    _system(getParam<MooseEnum>("system")),
+    _rank_map(_app.rankMap()),


Perhaps you could make an unique_ptr to RankMap instead. Then when it's requested through the accessor you can build it? That would avoid having that object active for every simulation on the planet.

permcody · 2019-05-08T22:36:08Z

framework/include/utils/RankMap.h

+  const std::vector<unsigned int> & rankHardwareIds() const { return _rank_to_hardware_id; }
+
+protected:
+  PerfID _construct_timer;


I've been making these const. May as well, they never change.

👍

We should update the documentation to recommend this.

friedmud force-pushed the rank_map_12629 branch 3 times, most recently from 703cac9 to 94ea949 Compare December 27, 2018 21:00

dschwen reviewed Jan 1, 2019

View reviewed changes

framework/src/utils/RankMap.C

@@ -0,0 +1,65 @@

#include "RankMap.h"

Copy link

Member

dschwen Jan 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header

permcody reviewed Jan 2, 2019

View reviewed changes

friedmud force-pushed the rank_map_12629 branch 2 times, most recently from c458789 to ecd632a Compare January 10, 2019 00:42

permcody reviewed Jan 10, 2019

View reviewed changes

friedmud force-pushed the rank_map_12629 branch from ecd632a to cdf6404 Compare February 1, 2019 00:48

stale bot added the stale PRs that have reached or exceeded 90 days with no activity label Mar 3, 2019

stale bot closed this Mar 4, 2019

friedmud reopened this May 8, 2019

stale bot removed the stale PRs that have reached or exceeded 90 days with no activity label May 8, 2019

friedmud force-pushed the rank_map_12629 branch 5 times, most recently from cc06377 to 021d92b Compare May 8, 2019 19:59

Add RankMap and use it to get some more diagnostic capability for par…

021d92b

…titioning and fix a big in MemoryUsageReporter closes idaholab#12629

permcody requested changes May 8, 2019

View reviewed changes

friedmud added a commit to friedmud/moose that referenced this pull request May 9, 2019

Address comments refs idaholab#12629 idaholab#12630

8ff502c

friedmud force-pushed the rank_map_12629 branch from 8ff502c to 86c1dd8 Compare May 9, 2019 15:48

Address comments refs idaholab#12629 idaholab#12630

86c1dd8

friedmud added the PR: Ready for review/merge label May 9, 2019

permcody approved these changes May 9, 2019

View reviewed changes

permcody merged commit 1e51abf into idaholab:next May 17, 2019

permcody mentioned this pull request Nov 7, 2019

Need a way to identify the hardware node-name as an Aux Field #12788

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RankMap #12630

Add RankMap #12630

friedmud commented Dec 27, 2018

dschwen Jan 1, 2019

permcody Jan 2, 2019

permcody Jan 2, 2019

friedmud Jan 2, 2019

permcody Jan 3, 2019

friedmud Jan 4, 2019

permcody Jan 2, 2019

dschwen commented Jan 10, 2019

moosebuild commented Jan 10, 2019 •

edited

permcody Jan 10, 2019

permcody May 8, 2019

stale bot commented Mar 3, 2019

stale bot commented Mar 4, 2019

permcody left a comment

permcody May 8, 2019

permcody May 9, 2019

friedmud May 9, 2019

permcody May 8, 2019

friedmud May 9, 2019

permcody May 8, 2019

friedmud May 9, 2019

permcody May 8, 2019

permcody May 8, 2019

permcody May 8, 2019

permcody May 8, 2019

friedmud May 9, 2019

Add RankMap #12630

Add RankMap #12630

Conversation

friedmud commented Dec 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dschwen commented Jan 10, 2019

moosebuild commented Jan 10, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Mar 3, 2019

stale bot commented Mar 4, 2019

permcody left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moosebuild commented Jan 10, 2019 •

edited