[Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise #125381

yangsiyu007 · 2024-05-02T06:11:50Z

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…ed_mm

pytorch-bot · 2024-05-02T06:11:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125381

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 18 Unrelated Failures

As of commit 321b1cf with merge base 4f29103 ():

NEW FAILURES - The following jobs have failed:

inductor / rocm6.0-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2) (gh)
test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_cpu_offload_eager
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_BigBird
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_BigBird
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_BigBird
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/kernel/mm_scaled_bias.py:
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) (gh)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #119231 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unwrap_storage_didnt_work_repro_cuda
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh) (disabled by #125568 but the issue was closed recently and a rebase is needed to make it pass)
functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view
pull / linux-focal-py3.12-clang10 / test (default, 1, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.12-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, linux.4xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface
pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge) (gh) (disabled by #104012 but the issue was closed recently and a rebase is needed to make it pass)
test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-05-02T06:11:54Z

❌ The email address for the commit (afdf7b3, d16eec2, 812749d, 321b1cf, eabf5a6, 5ba9ebb) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please submit a support request ticket.

Inductor codegen for Triton template has issues when unused tensor inputs are passed, so only passing in the necessary inputs (x, w, x_inverse_scale, w_inverse_scale for now)

Siyu Yang added 3 commits April 24, 2024 22:22

Add lowering for _scaled_mm

812749d

Add Triton kernel template for scaled_mm and handle multi output

afdf7b3

[Non working] Add scaling and bias to Triton kernel template for scal…

5ba9ebb

…ed_mm

pytorch-bot bot added ciflow/inductor module: inductor labels May 2, 2024

Add scaling to Triton kernel template; no bias

eabf5a6

Inductor codegen for Triton template has issues when unused tensor inputs are passed, so only passing in the necessary inputs (x, w, x_inverse_scale, w_inverse_scale for now)

yangsiyu007 changed the title ~~[Not working yet, draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise~~ [Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise May 7, 2024

Siyu Yang added 2 commits May 17, 2024 11:34

Benchmark Triton rowwise and tensorwise, no bias

d16eec2

Demo'ing optional tensor input node issue with bias

321b1cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise #125381

[Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise #125381

yangsiyu007 commented May 2, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 2, 2024 •

edited

linux-foundation-easycla bot commented May 2, 2024 •

edited

[Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise #125381

Are you sure you want to change the base?

[Draft] Add lowering and Triton kernel template for fp8 scaled_mm dynamic tensorwise/rowwise #125381

Conversation

yangsiyu007 commented May 2, 2024 • edited by pytorch-bot bot

pytorch-bot bot commented May 2, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125381

❌ 7 New Failures, 18 Unrelated Failures

linux-foundation-easycla bot commented May 2, 2024 • edited

yangsiyu007 commented May 2, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 2, 2024 •

edited

linux-foundation-easycla bot commented May 2, 2024 •

edited