Skip to content

[FA4] Add kernels fallback#44797

Merged
vasqu merged 3 commits intohuggingface:mainfrom
vasqu:fa4-kernel-fallback
Mar 20, 2026
Merged

[FA4] Add kernels fallback#44797
vasqu merged 3 commits intohuggingface:mainfrom
vasqu:fa4-kernel-fallback

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Mar 17, 2026

Depends on #44887 and kernels being version 12.3

Works OOB with little changes! Example script for demonstration:

from transformers import AutoModelForCausalLM, AutoTokenizer


fa_version = 4
#model_id = "openai/gpt-oss-20b"
model_id = "meta-llama/Llama-3.2-3B-Instruct"


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    attn_implementation=f"flash_attention_{fa_version}",
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(
    ["Hello, how are you?", "How are you? Tell me the name of the president of"],
    padding=True,
    padding_side="left",
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50, do_sample=False)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

niice

@vasqu vasqu marked this pull request as ready for review March 20, 2026 17:57
@vasqu vasqu added this pull request to the merge queue Mar 20, 2026
Merged via the queue into huggingface:main with commit e6ed96c Mar 20, 2026
29 checks passed
@vasqu vasqu deleted the fa4-kernel-fallback branch March 20, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants