-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release - SuperBench v0.3.0 #212
Commits on Sep 24, 2021
-
Bug - Fix Bug : fix bug of error param operations to operation in rcc…
…l-bw of hpe config (#190) **Description** fix bug of error param opterations of rccl-bw in hpe MI100 config **Major Revision** - operations->operation
Configuration menu - View commit details
-
Copy full SHA for fbfb58f - Browse repository at this point
Copy the full SHA fbfb58fView commit details -
Bug - Revise 'docker run' in sb deploy (#195)
**Description** Revise 'docker run' in sb deploy due to base image running endpoint/cmd under /root. **Major Revision** - define endpoint bash when 'docker run'
Configuration menu - View commit details
-
Copy full SHA for 3553381 - Browse repository at this point
Copy the full SHA 3553381View commit details -
Bug: Fix Bug - Add barrier before 'destroy_process_group' in model be…
…nchmarks (#198) **Description** Add barrier before 'destroy_process_group' to resolve the bug due to when multi models in one model benchmark, some processes haven't finished the previous process group while others failed to initialize new process group for the next model on rocm4.x when running bert_models. **Major Revision** - Add barrier before 'destroy_process_group'.
Configuration menu - View commit details
-
Copy full SHA for c9fb724 - Browse repository at this point
Copy the full SHA c9fb724View commit details -
Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
**Description** restore rocblas build logic to cancel support of rocblas build in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 base image. **Major Revision** - restore rocblas build logic, remove gpu target limit and other resource limit for rocm4.0.
Configuration menu - View commit details
-
Copy full SHA for 6da800f - Browse repository at this point
Copy the full SHA 6da800fView commit details -
Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
**Description** fix bug of hipBusBandwidth building **Major Revision** - it failed to enter the check 'hip/samples/1_Utils/hipBusBandwidth/CMakeLists.txt' when building docker, so removed this check - add sb_micro_path for rocm_bandwidthTest
Configuration menu - View commit details
-
Copy full SHA for 2c281ba - Browse repository at this point
Copy the full SHA 2c281baView commit details -
CI/CD - Add ROCm image build in GitHub Actions (#194)
Add ROCm image build in GitHub Actions.
Configuration menu - View commit details
-
Copy full SHA for 3b9edee - Browse repository at this point
Copy the full SHA 3b9edeeView commit details -
Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#…
…196) **Description** 1. Do `enable_language(CUDA)` before using `CMAKE_CUDA_COMPILER_VERSION` 2. use `cmake --install` to install target which will call `cmake -P cmake_install.cmake` instead of `make Makefile` to avoid issue `make: *** No rule to make target 'install'. Stop.`
Configuration menu - View commit details
-
Copy full SHA for 2cf1331 - Browse repository at this point
Copy the full SHA 2cf1331View commit details -
CLI - Integrate system info for node (#199)
Integrate system info for node, add `sb node info` command.
Configuration menu - View commit details
-
Copy full SHA for 11c0ba3 - Browse repository at this point
Copy the full SHA 11c0ba3View commit details -
Bug - Fix torch.distributed command for single node (#201)
Fix `torch.distributed` command for single node.
Configuration menu - View commit details
-
Copy full SHA for e3266da - Browse repository at this point
Copy the full SHA e3266daView commit details -
CI/CD - Push images in GitHub Action (#202)
Push Docker images in GitHub Action.
Configuration menu - View commit details
-
Copy full SHA for 2c2cad0 - Browse repository at this point
Copy the full SHA 2c2cad0View commit details -
Tool: Fix bug - Fix function naming issue in system info (#200)
**Description** Fix function naming issue in system info. **Major Revision** - fix function naming issue in system info - save to json file - add timeout for subprocess.run - revise error handling to print exception message
Configuration menu - View commit details
-
Copy full SHA for c6f76ce - Browse repository at this point
Copy the full SHA c6f76ceView commit details -
Benchmark: Fix Bug - fix error message of communication-computation-o…
…verlap (#204) **Description** fix bug in error message of communication-computation-overlap. **Major Revision** - remove non existing variable
Configuration menu - View commit details
-
Copy full SHA for 43da0dd - Browse repository at this point
Copy the full SHA 43da0ddView commit details -
CI/CD - Fix bug in build image for push event (#205)
Fix bug in build image for push event. **Major Revision** - Fix bug in build image for push event when `github.base_ref` is not set. **Minor Revision** - Unify `[` and `[[` usage.
Configuration menu - View commit details
-
Copy full SHA for b5349ef - Browse repository at this point
Copy the full SHA b5349efView commit details -
Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in conf…
…iguration examples (#203) **Description** This commit fixes wrong parameters for gpu-sm-copy-bw call in configuration examples.
Configuration menu - View commit details
-
Copy full SHA for 42465b0 - Browse repository at this point
Copy the full SHA 42465b0View commit details -
CI/CD - Update GitHub Action VM (#211)
Update GitHub Action VM, fix pipeline hanging.
Configuration menu - View commit details
-
Copy full SHA for 031be6a - Browse repository at this point
Copy the full SHA 031be6aView commit details -
Benchmarks: Update - Update benchmarks in configuration file (#208)
**Description** Update benchmarks in configuration files for single node validation of superbench v0.3. **Major Revision** - fix bugs of parameters in nccl-bw for single node validation in configs - update new benchmarks in amd_mi100_hpe.yaml, amd_mi100_z53.yaml, azure_ndv4.yaml - fix bug of wrong gpu visible prefix
Configuration menu - View commit details
-
Copy full SHA for ddb0fd2 - Browse repository at this point
Copy the full SHA ddb0fd2View commit details -
Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (
#210) **Description** Update rccl-test git submodule to dc1ad48 which fix the bug of division by zero **Major Revision** - update rccl-test git submodule to dc1ad48
Configuration menu - View commit details
-
Copy full SHA for d6cc73a - Browse repository at this point
Copy the full SHA d6cc73aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b875c44 - Browse repository at this point
Copy the full SHA b875c44View commit details