Support cublasLt Fp8 Approx Gelu epilogue fusion. #7751

wenscarl · 2023-12-13T19:09:34Z

Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu:

Execution time for matmul using cublasLt and gelu (XLA): 1.28ms
Execution time for matmul_gelu using cublasLt: 1.25ms

reedwm · 2023-12-14T02:37:35Z

xla/service/gpu/tests/gemm_rewrite_test.cc

+    ENTRY test {
+      x = f8e4m3fn[16,32] parameter(0)
+      y = f8e4m3fn[32,16] parameter(1)
+      x_f32 = bf16[16,32] convert(x)


This should be named x_bf16. And similarly for y. And same for the other test

Imported from GitHub PR openxla/xla#7751 Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu: Execution time for matmul using cublasLt and gelu (XLA): 1.28ms Execution time for matmul_gelu using cublasLt: 1.25ms Copybara import of the project: -- e8abce3b41f68cae1bb625cdecd5885413a0781d by Shu Wang <shuw@nvidia.com>: Support cublasLt Fp8 Approx Gelu epilogue fusion. -- 818127cf582af7ceba014d88bdf027857fc8f0e5 by shuw <shuw@nvidia.com>: Remove F32 check -- 5ce3108a9bc8459e20456d23a3ae493ef7a6a387 by shuw <shuw@nvidia.com>: Improve based on review #1 Merging this change closes #7751 PiperOrigin-RevId: 591236441

Support cublasLt Fp8 Approx Gelu epilogue fusion.

e8abce3

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 13, 2023

github-actions bot assigned kamaljeeti and xla-rotation Dec 13, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 13, 2023

reedwm self-requested a review December 13, 2023 20:22

Remove F32 check

818127c

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 14, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 14, 2023

reedwm requested changes Dec 14, 2023

View reviewed changes

Improve based on review openxla#1

5ce3108

wenscarl requested a review from reedwm December 14, 2023 03:16

github-actions bot added the kokoro:force-run Forces CI to rerun label Dec 14, 2023

kokoro-team removed the kokoro:force-run Forces CI to rerun label Dec 14, 2023

reedwm approved these changes Dec 14, 2023

View reviewed changes

copybara-service bot closed this in 2724718 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support cublasLt Fp8 Approx Gelu epilogue fusion. #7751

Support cublasLt Fp8 Approx Gelu epilogue fusion. #7751

wenscarl commented Dec 13, 2023

reedwm Dec 14, 2023

Support cublasLt Fp8 Approx Gelu epilogue fusion. #7751

Support cublasLt Fp8 Approx Gelu epilogue fusion. #7751

Conversation

wenscarl commented Dec 13, 2023

reedwm Dec 14, 2023

Choose a reason for hiding this comment