-
Notifications
You must be signed in to change notification settings - Fork 719
[ROCm] Add register spilling detection support AMD #34812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] Add register spilling detection support AMD #34812
Conversation
β¦e using llvm-readobj
β¦onAsGpuKernel" This reverts commit ebd6e1f.
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 840635718
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 840635718
|
It turns out this is tricky to merge as it would break our internal ROCM build. We currently don't have amd_comgr target available. It might take a while until we have figured out what needs to be done. I guess the code lives in https://github.com/ROCm/llvm-project/tree/amd-staging/amd ? |
| name = "amd_comgr", | ||
| hdrs = glob(["%{rocm_root}/include/amd_comgr/**"]), | ||
| data = glob([ | ||
| "%{rocm_root}/lib/libamd_comgr_loader.so*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break the hermetic build, we can't just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert these libs back.
Yes, the code is there, but please make sure you installed corresponding comgr version of the installed rocm. |
|
I did push a proper fix |
52a6636 to
a9db720
Compare
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. -- a9db720 by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd a9db720 PiperOrigin-RevId: 840635718
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. -- a9db72054bff1bdc31c250b4ae1f45bcc3193e4c by Alexandros Theodoridis <atheodor@amd.com>: Fix hermetic build Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd a9db72054bff1bdc31c250b4ae1f45bcc3193e4c PiperOrigin-RevId: 840635718
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936
Imported from GitHub PR #34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1f by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea84 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e97 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bc by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1f. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bc PiperOrigin-RevId: 841699936
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#34812 from ROCm:add_register_spilling_detection_support_amd 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 PiperOrigin-RevId: 841699936
Imported from GitHub PR openxla/xla#34812 β¨ New Feature Added register spilling detection support. π§ͺ Execution Test ./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters ``` I0000 00:00:1764849271.079538 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.079561 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.079565 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.079569 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.079572 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.079574 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.079576 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.390972 2923925 amdgpu_backend.cc:447] ====== REGISTER SPILLING DETECTED ====== I0000 00:00:1764849271.390996 2923925 amdgpu_backend.cc:448] Module: triton_softmax_consts I0000 00:00:1764849271.391000 2923925 amdgpu_backend.cc:449] SGPR spill count: 0 I0000 00:00:1764849271.391005 2923925 amdgpu_backend.cc:450] VGPR spill count: 194 I0000 00:00:1764849271.391007 2923925 amdgpu_backend.cc:451] Private segment size: 780 bytes I0000 00:00:1764849271.391009 2923925 amdgpu_backend.cc:452] Performance may be degraded due to register pressure I0000 00:00:1764849271.391012 2923925 amdgpu_backend.cc:453] ======================================== I0000 00:00:1764849271.397868 2923925 tfrt_gpu_client.cc:197] TfrtGpuClient destroyed. [ OK ] TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters (8019 ms) [----------] 1 test from TritonFusionNumericsVerifierTest (8019 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (8019 ms total) [ PASSED ] 1 test. ``` This PR is on top of another bugfix PR (openxla/xla#34806). @xla-rotation could you review my PR, please? Copybara import of the project: -- ebd6e1fa03033bc9f6913351323fce26e1a8e4d2 by Songlin Piao <Songlin.Piao@amd.com>: replace the manual calling convention fix with AnnotateFunctionAsGpuKernel -- fafc7f1f6ad5a47204a32d433eab2bc5ec44dbd3 by Songlin Piao <Songlin.Piao@amd.com>: register spilling by disassembling object file -- f6b86f6fc96fd3398608c0078233db2efa74fce7 by Songlin Piao <Songlin.Piao@amd.com>: added time measurement to the spilling check -- 8e5ea8455fc730b73b3768cbdde07079c8c53c29 by Songlin Piao <Songlin.Piao@amd.com>: adapt the num_warps so that the hlo could be compiled on both amd and nvidia -- 22ef808416e6d339356c3a901ce1f5d03a396a60 by Songlin Piao <Songlin.Piao@amd.com>: pass though is_autotuning_compilation flag to the function CompileToHsaco -- b1d5e976c8051332ca1fc45e5f3b91fcd15a3da8 by Songlin Piao <Songlin.Piao@amd.com>: implementation of register spilling by reading meta data of hasco file using llvm-readobj -- d74ae83731a0a56a7285c1ac57689678d21e42d4 by Songlin Piao <Songlin.Piao@amd.com>: adapted functiona calls as is_autotuning_compilation is removed in upstream -- 07ed74d49361fb1945092cac459a3bb70262265b by Songlin Piao <Songlin.Piao@amd.com>: utilize amd code object manager library for parsing HSACO metadata -- 11e83bcb502ee341ddf7db9044b05b4b757ca5e9 by Songlin Piao <Songlin.Piao@amd.com>: Revert "replace the manual calling convention fix with AnnotateFunctionAsGpuKernel" This reverts commit ebd6e1fa03033bc9f6913351323fce26e1a8e4d2. Merging this change closes #34812 PiperOrigin-RevId: 842183737
Imported from GitHub PR #35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842635791
Imported from GitHub PR openxla/xla#35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca PiperOrigin-RevId: 842635791
Imported from GitHub PR #35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842648079
Imported from GitHub PR openxla/xla#35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 PiperOrigin-RevId: 842648079
Imported from GitHub PR #35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: #34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c0813 by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=#35077 from ROCm:fix_hermetic_build_on_rocm 66c0813 PiperOrigin-RevId: 842648917
Imported from GitHub PR openxla/xla#35077 π Summary of Changes Move comgr into the data directory, fixing hermetic build π― Justification Hermetic build with rocm config is broken due to invalid dependency management. Invalid merge of this PR: openxla/xla#34812 π Kind of Contribution Please remove what does not apply: π Bug Fix π Benchmark (for Performance Improvements) Not relevant π§ͺ Unit Tests: Not relevant π§ͺ Execution Tests: Not relevant Copybara import of the project: -- 66c08137948a92ac98ccaa1785ce27ebd4c489ca by Alexandros Theodoridis <alexandros.theodoridis@amd.com>: Fix hermetic build rocm Merging this change closes #35077 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#35077 from ROCm:fix_hermetic_build_on_rocm 66c08137948a92ac98ccaa1785ce27ebd4c489ca PiperOrigin-RevId: 842648917
β¨ New Feature
Added register spilling detection support.
π§ͺ Execution Test
./bazel-7.4.1-linux-x86_64 build //xla/service/gpu/transforms:triton_fusion_numerics_verifier_test
bazel-bin/xla/service/gpu/transforms/triton_fusion_numerics_verifier_test_amdgpu_any --gtest_filter=TritonFusionNumericsVerifierTest.CompilationSucceedsEvenIfKernelWillSpillRegisters
This PR is on top of another bugfix PR (#34806).
@xla-rotation could you review my PR, please?