Skip to content

Add some AITER kernel routing for ROCm#46268

Open
Abdennacer-Badaoui wants to merge 6 commits into
huggingface:mainfrom
Abdennacer-Badaoui:some-rocm-kernels
Open

Add some AITER kernel routing for ROCm#46268
Abdennacer-Badaoui wants to merge 6 commits into
huggingface:mainfrom
Abdennacer-Badaoui:some-rocm-kernels

Conversation

@Abdennacer-Badaoui
Copy link
Copy Markdown
Member

@Abdennacer-Badaoui Abdennacer-Badaoui commented May 28, 2026

Routes ROCm to AITER Triton kernels on AMD GPUs:

Longer-term goal: ship the full AITER in Kernels (same model as liger-kernels) rather than one repo per kernel. We're starting with aiter-rope and aiter-flash-attn because those are the two we need in transformers right now.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as draft May 28, 2026 17:15
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Abdennacer-Badaoui Abdennacer-Badaoui requested a review from vasqu June 4, 2026 15:18
@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as ready for review June 4, 2026 15:21
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I just have a few smaller questions / suggestions

repo_id="kernels-community/rotary", func_name="apply_rotary_transformers"
),
},
"rocm": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworking the whole stuff here, would it be possible to move this to another PR until I resolve all the issues there, see #46039

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey let's keep just the kernels-community/aiter-flash-attn for this PR then

Comment thread src/transformers/modeling_flash_attention_utils.py Outdated
Comment thread src/transformers/modeling_utils.py Outdated
# `"cuda"` on ROCm (HIP impersonates the CUDA API), which would mis-route to
# the "cuda" mapping entries; the kernels library refines this internally via
# `torch.version.hip` and picks `"rocm"` when appropriate.
kernelize(self, mode=mode)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting could we use their routing with a function instead? I feel a bit unsafe with auto detection :D

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46268&sha=5ba56a

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants