Add some AITER kernel routing for ROCm#46268
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
15efed5 to
a43d58f
Compare
vasqu
left a comment
There was a problem hiding this comment.
Very nice work! I just have a few smaller questions / suggestions
| repo_id="kernels-community/rotary", func_name="apply_rotary_transformers" | ||
| ), | ||
| }, | ||
| "rocm": { |
There was a problem hiding this comment.
Reworking the whole stuff here, would it be possible to move this to another PR until I resolve all the issues there, see #46039
There was a problem hiding this comment.
Okey let's keep just the kernels-community/aiter-flash-attn for this PR then
| # `"cuda"` on ROCm (HIP impersonates the CUDA API), which would mis-route to | ||
| # the "cuda" mapping entries; the kernels library refines this internally via | ||
| # `torch.version.hip` and picks `"rocm"` when appropriate. | ||
| kernelize(self, mode=mode) |
There was a problem hiding this comment.
Interesting could we use their routing with a function instead? I feel a bit unsafe with auto detection :D
There was a problem hiding this comment.
Yes we can do that.
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
b8bf8f1 to
857dae5
Compare
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46268&sha=5ba56a |
Routes ROCm to AITER Triton kernels on AMD GPUs:
flash_attention_3→kernels-community/aiter-flash-attnmegablocks→kernels-community/megablocksnow has rocm builds.rotary_pos_emb→kernels-community/aiter-rope(Will be added in another PR after the [Kernels] Sync to latest version and add new kernels (SwiGLU, CE) #46039)Longer-term goal: ship the full AITER in Kernels (same model as
liger-kernels) rather than one repo per kernel. We're starting withaiter-ropeandaiter-flash-attnbecause those are the two we need in transformers right now.