Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - Fix torch.distributed command for single node #201

Merged
merged 1 commit into from
Sep 17, 2021

Conversation

abuccts
Copy link
Member

@abuccts abuccts commented Sep 16, 2021

Fix torch.distributed command for single node.

Fix `torch.distributed` command for single node.
@abuccts abuccts added the bug Something isn't working label Sep 16, 2021
@abuccts abuccts requested a review from a team as a code owner September 16, 2021 08:03
@codecov
Copy link

codecov bot commented Sep 16, 2021

Codecov Report

Merging #201 (61cacb3) into release/0.3 (f91f97b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/0.3     #201   +/-   ##
============================================
  Coverage        88.78%   88.79%           
============================================
  Files               58       58           
  Lines             2818     2819    +1     
============================================
+ Hits              2502     2503    +1     
  Misses             316      316           
Flag Coverage Δ
cpu-unit-test 74.48% <100.00%> (+<0.01%) ⬆️
cuda-unit-test 88.72% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superbench/runner/runner.py 86.08% <100.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f91f97b...61cacb3. Read the comment docs.

@cp5555 cp5555 added the runner SuperBench Runner label Sep 17, 2021
@cp5555 cp5555 merged commit 890ce65 into release/0.3 Sep 17, 2021
@cp5555 cp5555 deleted the xiongyf/fix-torch-1node branch September 17, 2021 05:08
abuccts added a commit that referenced this pull request Sep 24, 2021
Fix `torch.distributed` command for single node.
cp5555 pushed a commit that referenced this pull request Sep 26, 2021
**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)

Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runner SuperBench Runner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants