-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release - SuperBench v0.3.0 #212
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…l-bw of hpe config (#190) **Description** fix bug of error param opterations of rccl-bw in hpe MI100 config **Major Revision** - operations->operation
**Description** Revise 'docker run' in sb deploy due to base image running endpoint/cmd under /root. **Major Revision** - define endpoint bash when 'docker run'
…nchmarks (#198) **Description** Add barrier before 'destroy_process_group' to resolve the bug due to when multi models in one model benchmark, some processes haven't finished the previous process group while others failed to initialize new process group for the next model on rocm4.x when running bert_models. **Major Revision** - Add barrier before 'destroy_process_group'.
**Description** restore rocblas build logic to cancel support of rocblas build in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 base image. **Major Revision** - restore rocblas build logic, remove gpu target limit and other resource limit for rocm4.0.
**Description** fix bug of hipBusBandwidth building **Major Revision** - it failed to enter the check 'hip/samples/1_Utils/hipBusBandwidth/CMakeLists.txt' when building docker, so removed this check - add sb_micro_path for rocm_bandwidthTest
Add ROCm image build in GitHub Actions.
…196) **Description** 1. Do `enable_language(CUDA)` before using `CMAKE_CUDA_COMPILER_VERSION` 2. use `cmake --install` to install target which will call `cmake -P cmake_install.cmake` instead of `make Makefile` to avoid issue `make: *** No rule to make target 'install'. Stop.`
Integrate system info for node, add `sb node info` command.
Fix `torch.distributed` command for single node.
Push Docker images in GitHub Action.
**Description** Fix function naming issue in system info. **Major Revision** - fix function naming issue in system info - save to json file - add timeout for subprocess.run - revise error handling to print exception message
…verlap (#204) **Description** fix bug in error message of communication-computation-overlap. **Major Revision** - remove non existing variable
Fix bug in build image for push event. **Major Revision** - Fix bug in build image for push event when `github.base_ref` is not set. **Minor Revision** - Unify `[` and `[[` usage.
…iguration examples (#203) **Description** This commit fixes wrong parameters for gpu-sm-copy-bw call in configuration examples.
Update GitHub Action VM, fix pipeline hanging.
**Description** Update benchmarks in configuration files for single node validation of superbench v0.3. **Major Revision** - fix bugs of parameters in nccl-bw for single node validation in configs - update new benchmarks in amd_mi100_hpe.yaml, amd_mi100_z53.yaml, azure_ndv4.yaml - fix bug of wrong gpu visible prefix
#210) **Description** Update rccl-test git submodule to dc1ad48 which fix the bug of division by zero **Major Revision** - update rccl-test git submodule to dc1ad48
Codecov Report
@@ Coverage Diff @@
## main #212 +/- ##
==========================================
+ Coverage 88.52% 88.72% +0.19%
==========================================
Files 57 58 +1
Lines 2807 2821 +14
==========================================
+ Hits 2485 2503 +18
+ Misses 322 318 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
cp5555
approved these changes
Sep 24, 2021
guoshzhao
approved these changes
Sep 24, 2021
yukirora
approved these changes
Sep 24, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Cherry-pick bug fixes from v0.3.0 to main.
Major Revisions
Co-authored-by: Yuting Jiang v-yujiang@microsoft.com
Co-authored-by: Guoshuai Zhao guzhao@microsoft.com
Co-authored-by: Ziyue Yang ziyyang@microsoft.com