Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - Fix issue that the root mpi rank may not be the first in the hostfile #270

Merged
merged 7 commits into from
Dec 24, 2021

Conversation

yukirora
Copy link
Contributor

Description
Launch mpi on the sorted first host in the hostfile.

@yukirora yukirora added the bug Something isn't working label Dec 20, 2021
@yukirora yukirora requested a review from a team as a code owner December 20, 2021 14:03
@ghost
Copy link

ghost commented Dec 20, 2021

CLA assistant check
All CLA requirements met.

@codecov
Copy link

codecov bot commented Dec 20, 2021

Codecov Report

Merging #270 (62c4d1e) into release/0.4 (bcf6ea3) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           release/0.4     #270      +/-   ##
===============================================
+ Coverage        87.80%   87.82%   +0.01%     
===============================================
  Files               75       75              
  Lines             4364     4368       +4     
===============================================
+ Hits              3832     3836       +4     
  Misses             532      532              
Flag Coverage Δ
cpu-unit-test 72.05% <100.00%> (+0.02%) ⬆️
cuda-unit-test 87.77% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superbench/runner/ansible.py 98.07% <100.00%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bcf6ea3...62c4d1e. Read the comment docs.

@cp5555 cp5555 requested a review from abuccts December 20, 2021 23:34
@cp5555 cp5555 added the runner SuperBench Runner label Dec 20, 2021
superbench/runner/ansible.py Outdated Show resolved Hide resolved
tests/runner/test_ansible.py Outdated Show resolved Hide resolved
superbench/runner/ansible.py Outdated Show resolved Hide resolved
tests/runner/test_ansible.py Outdated Show resolved Hide resolved
tests/runner/test_ansible.py Outdated Show resolved Hide resolved
tests/runner/test_ansible.py Outdated Show resolved Hide resolved
tests/runner/test_ansible.py Outdated Show resolved Hide resolved
yukirora and others added 2 commits December 24, 2021 13:35
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
@cp5555 cp5555 enabled auto-merge (squash) December 24, 2021 05:52
@cp5555 cp5555 merged commit cb77286 into release/0.4 Dec 24, 2021
@cp5555 cp5555 deleted the v-yutjiang/mpi-launch branch December 24, 2021 06:15
abuccts pushed a commit that referenced this pull request Dec 29, 2021
…stfile (#270)

**Description**
Launch mpi on the sorted first host in the hostfile.
abuccts added a commit that referenced this pull request Dec 30, 2021
__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runner SuperBench Runner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants