-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Open
Labels
Description
System Info
Latest transformers "main".
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
If you run this code snippet: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#transformers with dequantize=False, you will notice that it has infinite repetition issues.
If however you run the model from this PR: #42744, you can see that everything works fine which leads to the conclusion that something funky is going on with the activation scales (maybe they give inf values somewhere?).
The same activation scales work for vLLM: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#vllm-recommended so there is probably something we can do inside transformers to fix it?
Expected behavior
That FP8 works correctly
bonswouar