Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in MemoryUsageReporter #12629

Closed
friedmud opened this issue Dec 26, 2018 · 1 comment · Fixed by #12630
Closed

Issue in MemoryUsageReporter #12629

friedmud opened this issue Dec 26, 2018 · 1 comment · Fixed by #12630
Labels
C: Framework P: minor A defect that does not affect the accuracy of results. T: defect An anomaly, which is anything that deviates from expectations.

Comments

@friedmud
Copy link
Contributor

Rationale

Shared memory group hardware_id assignment might not be right in the case where ranks aren't perfectly monotonic across the nodes.

Description

MemoryUsageReporter::sharedMemoryRanksBySplitCommunicator() is close - but not quite right. The part where it loops through the "contiguous" ranks and assigns them... assumes that ranks on one node are contiguous! That might not actually be true.

For intance... it's totally possible to do this:

mpiexec -hosts host0,host1,host1,host0 ...

In this case sharedMemoryRanksBySplitCommunicator() would report 3 hardware IDs.

Now - you might think that's completely dumb... but one common thing that happens with mpiexec is "striping"... where processes are spread across the nodes by adding one to each node in turn. That can happen if you do something like:

mpiexec -hosts host0,host1,host2 -n 6 ...

In that case you'll get this node:rank mapping:

0: 0,3
1: 1,4
2: 2,5

Which would also throw off sharedMemoryRanksBySplitCommunicator().

The fix here is really just to keep track of the "world_ranks" and what ID they've been assigned and check it each time it changes to see if you've already seen this rank before.

Actually - to fix this I might make a new utility that creates node->rank and rank->node mappings (I need both for my current work - which is why I was looking at what was done here). Then I'll make MemoryUsageReporter use the new utility.

Impact

Correct hardware ID reporting.

Fix a small bug that probably doesn't effect anyone.

friedmud added a commit to friedmud/moose that referenced this issue Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
@friedmud friedmud mentioned this issue Dec 27, 2018
@friedmud friedmud added C: Framework P: minor A defect that does not affect the accuracy of results. T: defect An anomaly, which is anything that deviates from expectations. labels Dec 27, 2018
friedmud added a commit to friedmud/moose that referenced this issue Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue Dec 27, 2018
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
@dschwen
Copy link
Member

dschwen commented Jan 7, 2019

Good catch!

friedmud added a commit to friedmud/moose that referenced this issue Jan 10, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue Jan 10, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue Feb 1, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
friedmud added a commit to friedmud/moose that referenced this issue May 9, 2019
friedmud added a commit to friedmud/moose that referenced this issue May 9, 2019
friedmud added a commit to friedmud/moose that referenced this issue Jun 8, 2019
…titioning and fix a big in MemoryUsageReporter closes idaholab#12629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: minor A defect that does not affect the accuracy of results. T: defect An anomaly, which is anything that deviates from expectations.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants