[Inductor][ROCm] Composable Kernel backend for Inductor #125453

tenpercent · 2024-05-03T01:56:29Z

This PR adds an alternative backend for Inductor, adding Composable Kernel Universal GEMM instances to the autotune instance selection.

The implementation is heavily influenced by the series of PRs which adds CUTLASS backend (#106991). The main differences are
(1) customizing compiler for the ROCm platform
(2) customizing template code generation for Composable Kernel Universal GEMM instances.

We provide config tuning knobs for balancing between instance sources compilation time and finding the best instance.

Testing

Install the ck library

pip install git+https://github.com/rocm/composable_kernel@develop

Run the test

TORCH_LOGS=+torch._inductor \
pytest --capture=tee-sys test/inductor/test_ck_backend.py

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Test Plan: pytest -rpfs test/inductor/test_cudacodecache.py

tenpercent · 2024-06-20T18:40:43Z

@pytorchbot drci

tenpercent · 2024-06-25T20:52:19Z

@pytorchbot merge

pytorchmergebot · 2024-06-25T20:53:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#130576) Add functional support for torch.addmm with CK backend. See also #125453 # Implementation details 1. It turns out we can use the same template between addmm and matmul; essentially, matmul is addmm with empty bias 2. The Python generator in CK was updated to generate the shared cpp template. The pip package can be installed from `pip install git+https://github.com/rocm/composable_kernel@add-addmm` and will be merged into `develop` branch after this PR lands to avoid breaking the current matmul # Testing `pytest test/inductor/test_ck_backend.py -k addmm` Pull Request resolved: #130576 Approved by: https://github.com/chenyang78

… autotune (#133285) This PR enables dynamic shapes for the CK backend for gemm max autotune (see #125453). This is achieved via unhardcoding the problem sizes from the template body and passing them as parameters instead. We handle passing the problem sizes for the kernel call as well as for the benchmark call. # Testing `pytest test/inductor/test_ck_backend.py [-k dynamic]` Pull Request resolved: #133285 Approved by: https://github.com/ColinPeppler

MakeArgument signature was changed in ROCm/composable_kernel#1453 adding splitK argument to universal gemm templates which are used to codegen addmm and matmul (part of the series started at #125453 ) # Testing `pytest test/inductor/test_ck_backend.py` Pull Request resolved: #134483 Approved by: https://github.com/ColinPeppler

…34483) MakeArgument signature was changed in ROCm/composable_kernel#1453 adding splitK argument to universal gemm templates which are used to codegen addmm and matmul (part of the series started at pytorch#125453 ) # Testing `pytest test/inductor/test_ck_backend.py` Pull Request resolved: pytorch#134483 Approved by: https://github.com/ColinPeppler

… autotune (pytorch#133285) This PR enables dynamic shapes for the CK backend for gemm max autotune (see pytorch#125453). This is achieved via unhardcoding the problem sizes from the template body and passing them as parameters instead. We handle passing the problem sizes for the kernel call as well as for the benchmark call. # Testing `pytest test/inductor/test_ck_backend.py [-k dynamic]` Pull Request resolved: pytorch#133285 Approved by: https://github.com/ColinPeppler

tenpercent added 30 commits March 14, 2024 23:12

add composable_kernel submodule

6a29592

add rocm config

ed79d45

prototype changes in cuda codecache to compile with hipcc

61adbb6

cleanup cuda_compile_command

06228c5

Test Plan: pytest -rpfs test/inductor/test_cudacodecache.py

unhardcode rocm home

a1aec51

Merge branch 'pytorch:main' into ck-inductor

293696c

try creating base CKTemplate class

2f2bccc

add an initial approximation for gemm template

de41be3

add gemm template body

d3e4358

try adding one instance

08b6de0

move ck branch pointer to universal gemm wip

9169d31

fix submodule name

a8b1f91

fix submodule path

f0d5c3b

once again init submodule

9dcdab7

add ck to cutlass backend test

bcafa44

fix missing args to op instance init

e2b1db4

type fixes for python

f0c9a9f

fix ck includes

05629da

fix includes, type declarations, global constants

da6bd98

get past linker errors

0f4a3c9

amend above

f6ffa8f

fix config for new ck dir

a451f86

update ck to develop branch

00853ef

get to a segfault when running mm op

2fd82d3

prettify ck instance params

3e1a5d6

fix mm shape in the test

49f7244

add gemm argument check

f453355

make it past autotuning; fail in scheduling

d032ab9

cleanup headers and globals

fd6d6d3

unhardcode gemm parameters

a9bbdd5

tenpercent added 4 commits June 21, 2024 16:43

Merge branch 'main' into ck-inductor

6fcdd7c

Merge branch 'main' into ck-inductor

b939f77

Merge branch 'main' into ck-inductor

aa0e7c9

Merge branch 'main' into ck-inductor

1ba368c

pytorchmergebot added the merging label Jun 25, 2024

pytorchmergebot closed this in 79959d7 Jun 25, 2024

pytorchmergebot added Merged and removed merging labels Jun 25, 2024

tenpercent mentioned this pull request Jul 2, 2024

[ROCm] Install Composable Kernel for ROCm CI #129979

Closed

tenpercent mentioned this pull request Jul 15, 2024

[ROCm][CK][Inductor] Enable addmm for CK backend to gemm max autotune #130576

Closed

tenpercent mentioned this pull request Aug 13, 2024

[ROCm][CK][Inductor] enable dynamic shapes for CK backend to gemm max autotune #133285

Closed

tenpercent mentioned this pull request Aug 26, 2024

[ROCm][Inductor][CK] Fix codegen after ck signature change #134483

Closed

This was referenced Oct 7, 2024

[ROCm][AOTI] add CK backend #135641

Closed

[ROCm][Inductor][CK] FP8 gemm #136337

Closed

This was referenced Oct 21, 2024

[Inductor][ROCm][CK] add CK grouped conv2d fwd kernels to ROCm codegen #137947

Closed

[Inductor][ROCm][CK] Enable lowering conv2d instances in CK Inductor backend #138643

Closed

This was referenced Nov 15, 2024

[ROCm][Inductor][CK] Enable scaled mm with bias in gemm max autotune with CK backend #140674

Closed

[Inductor][ROCm][CK] Add standalone runner #139441

Closed

tenpercent mentioned this pull request Nov 26, 2024

[ROCm][Inductor][CK] Add batched gemms into gemm max autotune with CK backend #141520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][ROCm] Composable Kernel backend for Inductor #125453

[Inductor][ROCm] Composable Kernel backend for Inductor #125453

Uh oh!

tenpercent commented May 3, 2024 •

edited

Loading

Uh oh!

tenpercent commented Jun 20, 2024

Uh oh!

tenpercent commented Jun 25, 2024

Uh oh!

pytorchmergebot commented Jun 25, 2024

Uh oh!

Uh oh!

[Inductor][ROCm] Composable Kernel backend for Inductor #125453

[Inductor][ROCm] Composable Kernel backend for Inductor #125453

Uh oh!

Conversation

tenpercent commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

tenpercent commented Jun 20, 2024

Uh oh!

tenpercent commented Jun 25, 2024

Uh oh!

pytorchmergebot commented Jun 25, 2024

Merge started

Uh oh!

Uh oh!

tenpercent commented May 3, 2024 •

edited

Loading