Skip to content

Conversation

@amd-songpiao
Copy link
Contributor

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

copybara-service bot pushed a commit that referenced this pull request Dec 5, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 5, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9
PiperOrigin-RevId: 840635718
@akuegel
Copy link
Member

akuegel commented Dec 8, 2025

It turns out this is tricky to merge as it would break our internal ROCM build. We currently don't have amd_comgr target available. It might take a while until we have figured out what needs to be done. I guess the code lives in https://github.com/ROCm/llvm-project/tree/amd-staging/amd ?

name = "amd_comgr",
hdrs = glob(["%{rocm_root}/include/amd_comgr/**"]),
data = glob([
"%{rocm_root}/lib/libamd_comgr_loader.so*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the hermetic build, we can't just remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these libs back.

@amd-songpiao
Copy link
Contributor Author

It turns out this is tricky to merge as it would break our internal ROCM build. We currently don't have amd_comgr target available. It might take a while until we have figured out what needs to be done. I guess the code lives in https://github.com/ROCm/llvm-project/tree/amd-staging/amd ?

Yes, the code is there, but please make sure you installed corresponding comgr version of the installed rocm.
For example, below shows the right comgr library shipped with rocm 7.1.1.

dpkg --list | grep comgr
ii  comgr                                     3.0.0.70101-34~22.04                    amd64        Library to provide support functions for ROCm code objects.

@alekstheod
Copy link
Contributor

I did push a proper fix

@alekstheod alekstheod force-pushed the add_register_spilling_detection_support_amd branch from 52a6636 to a9db720 Compare December 8, 2025 15:59
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

--
a9db720 by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

--
a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

--
a9db720 by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

--
a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

--
a9db720 by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

--
a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>:

Fix hermetic build

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c
PiperOrigin-RevId: 840635718
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit that referenced this pull request Dec 9, 2025
Imported from GitHub PR #34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bc by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1f.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc
PiperOrigin-RevId: 841699936
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

Merging this change closes #34812

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9
PiperOrigin-RevId: 841699936
@copybara-service copybara-service bot closed this in 3eeabaa Dec 9, 2025
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 9, 2025
Imported from GitHub PR openxla/xla#34812

✨ New Feature

Added register spilling detection support.

πŸ§ͺ Execution Test

./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters

```
I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ======
I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts
I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0
I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194
I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes
I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure
I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ========================================
I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed.
[       OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms)
[----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8019 ms total)
[  PASSED  ] 1 test.
```

This PR is on top of another bugfix PR (openxla/xla#34806).

@xla-rotation could you review my PR, please?

Copybara import of the project:

--
ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>:

replace the manual calling convention fix with AnnotateFunctionAsGpuKernel

--
fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>:

register spilling by disassembling object file

--
f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>:

added time measurement to the spilling check

--
8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>:

adapt the num_warps so that the hlo could be compiled on both amd and nvidia

--
22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>:

pass though is_autotuning_compilation flag to the function CompileToHsaco

--
b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>:

implementation of register spilling by reading meta data of hasco file using llvm-readobj

--
d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>:

adapted functiona calls as is_autotuning_compilation is removed in upstream

--
07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>:

utilize amd code object manager library for parsing HSACO metadata

--
11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>:

Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel"

This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2.

Merging this change closes #34812

PiperOrigin-RevId: 842183737
copybara-service bot pushed a commit that referenced this pull request Dec 10, 2025
Imported from GitHub PR #35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: #34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813
PiperOrigin-RevId: 842635791
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 10, 2025
Imported from GitHub PR openxla/xla#35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: openxla/xla#34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca
PiperOrigin-RevId: 842635791
copybara-service bot pushed a commit that referenced this pull request Dec 10, 2025
Imported from GitHub PR #35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: #34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813
PiperOrigin-RevId: 842648079
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 10, 2025
Imported from GitHub PR openxla/xla#35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: openxla/xla#34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

PiperOrigin-RevId: 842648079
copybara-service bot pushed a commit that referenced this pull request Dec 10, 2025
Imported from GitHub PR #35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: #34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813
PiperOrigin-RevId: 842648917
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 10, 2025
Imported from GitHub PR openxla/xla#35077

πŸ“ Summary of Changes
Move comgr into the data directory, fixing hermetic build

🎯 Justification
Hermetic build with rocm config is broken due to invalid
dependency management. Invalid merge of this PR: openxla/xla#34812

πŸš€ Kind of Contribution
Please remove what does not apply: πŸ› Bug Fix

πŸ“Š Benchmark (for Performance Improvements)
Not relevant

πŸ§ͺ Unit Tests:
Not relevant

πŸ§ͺ Execution Tests:
Not relevant

Copybara import of the project:

--
66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>:

Fix hermetic build rocm

Merging this change closes #35077

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca
PiperOrigin-RevId: 842648917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants