Skip to content

Conversation

@polarG
Copy link
Contributor

@polarG polarG commented May 21, 2025

In the current implementation, the CPU‑stream benchmark code renames the binary before the microbench base class can verify its existence, causing the default‐binary check to fail.

This PR adds a “default” binary—built with the standard compile parameters—so that the base class can always find and validate it. Once the default binary is in place, the CPU‑stream code will rename it as needed and re‑check its presence before running the benchmark.

The PR also enable CPU stream in the default settings.

@polarG polarG requested a review from a team as a code owner May 21, 2025 17:31
@codecov
Copy link

codecov bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 86.07%. Comparing base (431bf19) to head (2311870).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...chmarks/micro_benchmarks/cpu_stream_performance.py 60.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (60.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #712   +/-   ##
=======================================
  Coverage   86.06%   86.07%           
=======================================
  Files          99       99           
  Lines        7211     7209    -2     
=======================================
- Hits         6206     6205    -1     
+ Misses       1005     1004    -1     
Flag Coverage Δ
cpu-python3.10-unit-test 71.77% <60.00%> (+<0.01%) ⬆️
cpu-python3.12-unit-test 71.77% <60.00%> (+<0.01%) ⬆️
cpu-python3.7-unit-test 71.35% <60.00%> (+<0.01%) ⬆️
cuda-unit-test 83.51% <60.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@polarG polarG enabled auto-merge (squash) May 28, 2025 17:21
@polarG polarG added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels May 28, 2025
@polarG polarG requested review from abuccts and guoshzhao June 4, 2025 23:44
Copy link
Member

@abuccts abuccts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix the build

@polarG polarG merged commit 991c005 into main Jun 14, 2025
21 of 23 checks passed
@polarG polarG deleted the hongtao/cpu-stream-revise branch June 14, 2025 08:27
@guoshzhao guoshzhao mentioned this pull request Jul 2, 2025
40 tasks
polarG added a commit that referenced this pull request Aug 11, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
polarG added a commit that referenced this pull request Aug 12, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants