New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in MemoryUsageReporter #12629
Labels
C: Framework
P: minor
A defect that does not affect the accuracy of results.
T: defect
An anomaly, which is anything that deviates from expectations.
Comments
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
Merged
friedmud
added
C: Framework
P: minor
A defect that does not affect the accuracy of results.
T: defect
An anomaly, which is anything that deviates from expectations.
labels
Dec 27, 2018
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
Good catch! |
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Jan 10, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Jan 10, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Feb 1, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 9, 2019
friedmud
added a commit
to friedmud/moose
that referenced
this issue
May 9, 2019
friedmud
added a commit
to friedmud/moose
that referenced
this issue
Jun 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C: Framework
P: minor
A defect that does not affect the accuracy of results.
T: defect
An anomaly, which is anything that deviates from expectations.
Rationale
Shared memory group hardware_id assignment might not be right in the case where ranks aren't perfectly monotonic across the nodes.
Description
MemoryUsageReporter::sharedMemoryRanksBySplitCommunicator()
is close - but not quite right. The part where it loops through the "contiguous" ranks and assigns them... assumes that ranks on one node are contiguous! That might not actually be true.For intance... it's totally possible to do this:
In this case
sharedMemoryRanksBySplitCommunicator()
would report 3 hardware IDs.Now - you might think that's completely dumb... but one common thing that happens with
mpiexec
is "striping"... where processes are spread across the nodes by adding one to each node in turn. That can happen if you do something like:In that case you'll get this node:rank mapping:
Which would also throw off
sharedMemoryRanksBySplitCommunicator()
.The fix here is really just to keep track of the "world_ranks" and what ID they've been assigned and check it each time it changes to see if you've already seen this rank before.
Actually - to fix this I might make a new utility that creates node->rank and rank->node mappings (I need both for my current work - which is why I was looking at what was done here). Then I'll make
MemoryUsageReporter
use the new utility.Impact
Correct hardware ID reporting.
Fix a small bug that probably doesn't effect anyone.
The text was updated successfully, but these errors were encountered: