Remove unsupported modeling flags by dacorvo · Pull Request #950 · huggingface/optimum-neuron

dacorvo · 2025-09-03T13:32:14Z

What does this PR do?

This removes some optimized code paths that are not supported with the current models (mainly Llama and its variants) and/or deployment configurations (Trainium instances).

HuggingFaceDocBuilderDev · 2025-09-03T13:37:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This kernel is not supported on Trainium 1, and its integration is likely to change anyway when we support Trainium 2.

tengomucho

LGTM

Note that flash attention is still used for prefill whenever it is relevant.

It is never set to something else than 2, and it can be overridden in the model compiler flags anyway.

dacorvo · 2025-09-05T14:02:06Z

@tengomucho thank you for the review, but I realized that while removing flash decoding I had also inadvertently removed flash attention for prefill. Will force-push.

tengomucho · 2025-09-05T14:05:03Z

optimum/neuron/models/inference/backend/modules/attention/attention_base.py

-
        return FlashAttentionStrategy.NONE

-    def compute_for_flash_decoding(self, Q, K, V, past_key_value, attention_mask, active_mask) -> Tensor:


I see, you need to restore these bits

dacorvo force-pushed the remove_unsupported_modeling_flags branch from 2265176 to 63110ce Compare September 3, 2025 14:36

dacorvo added 6 commits September 5, 2025 09:46

chore: bump dev version

d8af873

refactor(inference): remove vocab_paralell

dad4069

refactor(inference): remove mlp_kernel code paths

01d5e8a

This kernel is not supported on Trainium 1, and its integration is likely to change anyway when we support Trainium 2.

refactor(inference): remove qkv_kernel code paths

6d796b2

This kernel is not supported on Trainium 1, and its integration is likely to change anyway when we support Trainium 2.

refactor(inference): remove sequence_parallel code paths

96ae4f4

refactor(inference): remove chunked prefill leftovers

3b76381

dacorvo force-pushed the remove_unsupported_modeling_flags branch from 63110ce to a13a4a2 Compare September 5, 2025 09:48

tengomucho approved these changes Sep 5, 2025

View reviewed changes

dacorvo added 5 commits September 5, 2025 13:59

refactor(decoder): remove flash decoding

411c372

Note that flash attention is still used for prefill whenever it is relevant.

refactor(inference): remove cc-pipeline-tiling-factor config

2576c9f

It is never set to something else than 2, and it can be overridden in the model compiler flags anyway.

refactor(inference): remove kv cache tiling

bb6b348

refactor(inference): rename CustomRMSNorm

3d97532

refactor(inference): rename CustomRMSNorm

3523b40

dacorvo force-pushed the remove_unsupported_modeling_flags branch from a13a4a2 to 3523b40 Compare September 5, 2025 14:02

tengomucho reviewed Sep 5, 2025

View reviewed changes

dacorvo merged commit e5a0faf into main Sep 5, 2025
8 of 9 checks passed

dacorvo deleted the remove_unsupported_modeling_flags branch September 5, 2025 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unsupported modeling flags#950

Remove unsupported modeling flags#950
dacorvo merged 11 commits intomainfrom
remove_unsupported_modeling_flags

dacorvo commented Sep 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

tengomucho left a comment

Uh oh!

dacorvo commented Sep 5, 2025

Uh oh!

tengomucho Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		return FlashAttentionStrategy.NONE

		def compute_for_flash_decoding(self, Q, K, V, past_key_value, attention_mask, active_mask) -> Tensor:

Conversation

dacorvo commented Sep 3, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

tengomucho left a comment

Choose a reason for hiding this comment

Uh oh!

dacorvo commented Sep 5, 2025

Uh oh!

tengomucho Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants