Skip to content

Add ROCm 6.4, 7.0 and 7.2 support#810

Closed
arsdragonfly wants to merge 21 commits intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-refresh
Closed

Add ROCm 6.4, 7.0 and 7.2 support#810
arsdragonfly wants to merge 21 commits intomicrosoft:mainfrom
arsdragonfly:arsdragonfly/rocm-refresh

Conversation

@arsdragonfly
Copy link
Copy Markdown

No description provided.

Ubuntu and others added 18 commits April 9, 2026 05:07
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
fix
Co-authored-by: Copilot <copilot@github.com>
# Conflicts:
#	superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt
Co-authored-by: Copilot <copilot@github.com>
Copilot AI review requested due to automatic review settings May 1, 2026 08:07
@arsdragonfly arsdragonfly requested a review from a team as a code owner May 1, 2026 08:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ROCm 6.4 / 7.0 / 7.2 support across SuperBench micro-benchmarks and container images, including making parsing/build steps resilient to ROCm toolchain/output changes.

Changes:

  • Make hipblaslt-bench parsing robust to evolving output schemas by using header-based column lookup, and extend unit tests to cover the newer format.
  • Enable gpu-stream on ROCm by switching memory-clock queries from NVML to rocm_smi and updating the CMake build flow to support HIP/hipify.
  • Introduce new ROCm 6.4 / 7.0 / 7.2 Dockerfiles and a standalone CMake build script for hipblaslt-bench on ROCm 7.2.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
third_party/Makefile Injects <cassert> into HIP’s hipBusBandwidth sample to satisfy newer ROCm clang behavior.
tests/benchmarks/micro_benchmarks/test_hipblaslt_function.py Adds a positive test for the newer hipblaslt-bench CSV schema.
superbench/benchmarks/micro_benchmarks/hipblaslt_function.py Updates result parsing to map columns by header name instead of fixed indices.
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream_utils.hpp Switches NVML include to rocm_smi when building under HIP.
superbench/benchmarks/micro_benchmarks/gpu_stream/gpu_stream.cu Adds ROCm SMI-based memory clock querying for ROCm builds.
superbench/benchmarks/micro_benchmarks/gpu_stream/CMakeLists.txt Adds a CUDA vs ROCm build split; hipifies sources and links rocm_smi on ROCm.
superbench/benchmarks/micro_benchmarks/gpu_stream.py Registers gpu-stream for ROCm in the benchmark registry.
dockerfile/rocm7.2.x.dockerfile New ROCm 7.2 image; builds hipblaslt-bench via standalone CMake to avoid upstream 7.2 build issues.
dockerfile/rocm7.0.x.dockerfile New ROCm 7.0 image with updated RCCL/hipBLASLt build flow and TE install.
dockerfile/rocm6.4.x.dockerfile New ROCm 6.4 image with hipBLASLt build adjustments and related environment setup.
dockerfile/rocm6.2.x.dockerfile Updates Intel MLC download URL/version.
dockerfile/etc/hipblaslt-bench-standalone.cmake Adds standalone CMakeLists content to build only hipblaslt-bench against system hipBLASLt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +8 to +13
# Place this file at the root of an upstream hipBLASLt source tree
# (e.g. cp this to /path/to/hipBLASLt/CMakeLists-bench.txt) and invoke:
#
# cmake -B build -S /path/to/hipBLASLt -P /path/to/this/file
#
# Or use it as the top-level CMakeLists.txt by overwriting it.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage instructions are incorrect: cmake -P runs CMake in script mode and will not configure/generate a build from a project()/targets file. Either instruct users to copy this file as the top-level CMakeLists.txt and run a normal cmake -S ... -B ..., or provide a separate script-mode (-P) driver that configures a build directory via cmake -S ... -B ....

Copilot uses AI. Check for mistakes.
Comment thread superbench/benchmarks/micro_benchmarks/hipblaslt_function.py Outdated
Co-authored-by: Copilot <copilot@github.com>
@polarG
Copy link
Copy Markdown
Contributor

polarG commented May 1, 2026

@arsdragonfly Thanks for your contribution, could we break down this PR to 3 separate PR? Then each PR contain 1 ROCM version. Thanks!

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 4, 2026 19:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

./bootstrap --prefix=/usr --no-system-curl --parallel=16 && \
make -j ${NUM_MAKE_JOBS} && \
make install && \
rm -rf /tmp/cmake-${required_version}* \


BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.CUDA)
BenchmarkRegistry.register_benchmark('gpu-stream', GpuStreamBenchmark, platform=Platform.ROCM)
Comment on lines +199 to +202
# Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support,
# so we can use the latest dev branch with full CK fused attention.
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \
cd TransformerEngine && \
Comment on lines +206 to +208
RUN python3 -m pip install onnxscript && \
git clone --recursive https://github.com/ROCm/TransformerEngine.git && \
cd TransformerEngine && \
Comment on lines +8 to +13
# Place this file at the root of an upstream hipBLASLt source tree
# (e.g. cp this to /path/to/hipBLASLt/CMakeLists-bench.txt) and invoke:
#
# cmake -B build -S /path/to/hipBLASLt -P /path/to/this/file
#
# Or use it as the top-level CMakeLists.txt by overwriting it.
Comment on lines +199 to +205
# Install TransformerEngine — ROCm 7.0 has hip_fp4.h and gfx950 support,
# so we can use the latest dev branch with full CK fused attention.
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git && \
cd TransformerEngine && \
NVTE_FRAMEWORK=pytorch \
NVTE_FUSED_ATTN_CK=0 \
NVTE_FUSED_ATTN_AOTRITON=1 \
This was referenced May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants