[ROCm] Add register spilling detection support AMD #34812

amd-songpiao · 2025-12-04T12:08:34Z

✨ New Feature

Added register spilling detection support.

🧪 Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

…ernel

… nvidia

…saco

…e using llvm-readobj

…stream

xla/backends/gpu/codegen/fusion_emitter.cc

xla/service/gpu/transforms/triton_fusion_numerics_verifier_test.cc

…onAsGpuKernel" This reverts commit ebd6e1f.

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 840635718

akuegel · 2025-12-08T12:36:43Z

It turns out this is tricky to merge as it would break our internal ROCM build. We currently don't have amd_comgr target available. It might take a while until we have figured out what needs to be done. I guess the code lives in https://github.com/ROCm/llvm-project/tree/amd-staging/amd ?

alekstheod · 2025-12-08T13:02:37Z

third_party/gpus/rocm/BUILD.tpl

    name = "amd_comgr",
    hdrs = glob(["%{rocm_root}/include/amd_comgr/**"]),
-    data = glob([
-        "%{rocm_root}/lib/libamd_comgr_loader.so*",


This will break the hermetic build, we can't just remove it.

Please revert these libs back.

amd-songpiao · 2025-12-08T14:29:26Z

It turns out this is tricky to merge as it would break our internal ROCM build. We currently don't have amd_comgr target available. It might take a while until we have figured out what needs to be done. I guess the code lives in https://github.com/ROCm/llvm-project/tree/amd-staging/amd ?

Yes, the code is there, but please make sure you installed corresponding comgr version of the installed rocm.
For example, below shows the right comgr library shipped with rocm 7.1.1.

dpkg --list | grep comgr
ii  comgr                                     3.0.0.70101-34~22.04                    amd64        Library to provide support functions for ROCm code objects.

alekstheod · 2025-12-08T15:58:09Z

I did push a proper fix

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR #34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936

@xla-rotation

Imported from GitHub PR openxla/xla#34812 ✨ New Feature Added register spilling detection support. 🧪 Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 PiperOrigin-RevId: 842183737

Imported from GitHub PR #35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842635791

Imported from GitHub PR openxla/xla#35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca PiperOrigin-RevId: 842635791

Imported from GitHub PR #35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842648079

Imported from GitHub PR openxla/xla#35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 PiperOrigin-RevId: 842648079

Imported from GitHub PR #35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842648917

Imported from GitHub PR openxla/xla#35077 📝 Summary of Changes Move comgr into the data directory, fixing hermetic build 🎯 Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix 📊 Benchmark (for Performance Improvements) Not relevant 🧪 Unit Tests: Not relevant 🧪 Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca PiperOrigin-RevId: 842648917

amd-songpiao added 8 commits December 4, 2025 02:36

replace the manual calling convention fix with AnnotateFunctionAsGpuK…

ebd6e1f

…ernel

register spilling by disassembling object file

fafc7f1

added time measurement to the spilling check

f6b86f6

adapt the num_warps so that the hlo could be compiled on both amd and…

8e5ea84

… nvidia

pass though is_autotuning_compilation flag to the function CompileToH…

22ef808

…saco

implementation of register spilling by reading meta data of hasco fil…

b1d5e97

…e using llvm-readobj

adapted functiona calls as is_autotuning_compilation is removed in up…

d74ae83

…stream

utilize amd code object manager library for parsing HSACO metadata

07ed74d

amd-songpiao mentioned this pull request Dec 4, 2025

Add register spilling detection AMD v0.8.0 ROCm/xla#464

Merged

i-chaochen requested a review from akuegel December 4, 2025 15:41

akuegel reviewed Dec 5, 2025

View reviewed changes

xla/backends/gpu/codegen/fusion_emitter.cc Outdated Show resolved Hide resolved

xla/service/gpu/transforms/triton_fusion_numerics_verifier_test.cc Show resolved Hide resolved

Revert "replace the manual calling convention fix with AnnotateFuncti…

11e83bc

…onAsGpuKernel" This reverts commit ebd6e1f.

akuegel approved these changes Dec 5, 2025

View reviewed changes

copybara-service bot mentioned this pull request Dec 5, 2025

PR #34812: [ROCm] Add register spilling detection support AMD #34883

Closed

copybara-service bot mentioned this pull request Dec 5, 2025

PR #34812: [ROCm] Add register spilling detection support AMD tensorflow/tensorflow#105720

Draft

alekstheod reviewed Dec 8, 2025

View reviewed changes

Fix hermetic build

a9db720

alekstheod force-pushed the add_register_spilling_detection_support_amd branch from 52a6636 to a9db720 Compare December 8, 2025 15:59

akuegel approved these changes Dec 9, 2025

View reviewed changes

copybara-service bot mentioned this pull request Dec 9, 2025

PR #34812: [ROCm] Add register spilling detection support AMD #35028

Merged

copybara-service bot mentioned this pull request Dec 9, 2025

PR #34812: [ROCm] Add register spilling detection support AMD tensorflow/tensorflow#105915

Merged

copybara-service bot closed this in 3eeabaa Dec 9, 2025

alekstheod mentioned this pull request Dec 10, 2025

[ROCm] Fix hermetic build rocm #35077

Closed

copybara-service bot mentioned this pull request Dec 10, 2025

PR #35077: [ROCm] Fix hermetic build rocm #35084

Merged

copybara-service bot mentioned this pull request Dec 10, 2025

PR #35077: [ROCm] Fix hermetic build rocm tensorflow/tensorflow#105989

Merged

copybara-service bot mentioned this pull request Dec 10, 2025

PR #35077: [ROCm] Fix hermetic build rocm #35088

Closed

copybara-service bot mentioned this pull request Dec 10, 2025

PR #35077: [ROCm] Fix hermetic build rocm tensorflow/tensorflow#105995

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Add register spilling detection support AMD #34812

[ROCm] Add register spilling detection support AMD #34812

Uh oh!

amd-songpiao commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

akuegel commented Dec 8, 2025

Uh oh!

alekstheod Dec 8, 2025

Uh oh!

alekstheod Dec 8, 2025

Uh oh!

amd-songpiao commented Dec 8, 2025

Uh oh!

alekstheod commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ROCm] Add register spilling detection support AMD #34812

[ROCm] Add register spilling detection support AMD #34812

Uh oh!

Conversation

amd-songpiao commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

akuegel commented Dec 8, 2025

Uh oh!

alekstheod Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

alekstheod Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

amd-songpiao commented Dec 8, 2025

Uh oh!

alekstheod commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants