[Devstral 24B] FP8 is currently not working correctly

### System Info

Latest transformers "main".

### Who can help?

@SunMarc @MekkCyber 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

If you run this code snippet: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#transformers **with** `dequantize=False`, you will notice that it has infinite repetition issues.

If however you run the model from this PR: https://github.com/huggingface/transformers/pull/42744, you can see that everything works fine which leads to the conclusion that something funky is going on with the activation scales (maybe they give `inf` values somewhere?).

The same activation scales work for vLLM: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#vllm-recommended so there is probably something we can do inside transformers to fix it?

### Expected behavior

That FP8 works correctly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Devstral 24B] FP8 is currently not working correctly #42746

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Devstral 24B] FP8 is currently not working correctly #42746

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions