Fix t5 failures by Abdennacer-Badaoui · Pull Request #43374 · huggingface/transformers

Abdennacer-Badaoui · 2026-01-20T15:42:20Z

Fix all the remaining T5 failures.

The test test_export_t5_summarization was failing because eos_token_id was not being passed to GenerationConfig in Seq2SeqLMExportableModule, causing the generate loop to never stop at the end-of-sequence token and produce extra tokens.
The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading, so I created a centralized _build_dtype_plan_for_loading() method that encapsulates all the dtype_plan building logic in one place, taking the target dtype as input and properly applying _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16) based on the actual loading dtype.
Fix some expectations (tested for both AMD and Nvidia)

HuggingFaceDocBuilderDev · 2026-01-20T15:58:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2026-01-20T18:10:31Z

This is weird because we explicitly have _keep_in_fp32_modules and _keep_in_fp32_modules_strict - the later being for bf16 as well, the former only for fp16

Something must have changed for loading models

Abdennacer-Badaoui · 2026-01-21T08:46:36Z

That's true. I think I found the problem.
So the dtype_plan initialization in PreTrainedModel.__init__ was unconditionally adding _keep_in_fp32_modules entries (intended for FP16 only) even when loading with BF16, so I will fix the core loading logic in modeling_utils.py to only add _keep_in_fp32_modules to dtype_plan when dtype == torch.float16, respecting the intended distinction between _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16).

Abdennacer-Badaoui · 2026-01-21T15:30:19Z

I created a centralized _build_dtype_plan_for_loading() method that encapsulates all the dtype_plan building logic in one place, taking the target dtype as input and properly applying _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16) based on the actual loading dtype, replacing my previous fragmented logic.

ydshieh · 2026-01-21T15:45:21Z

src/transformers/integrations/executorch.py

                "batch_size": batch_size,
                "max_cache_len": max_cache_length,
            },
+            eos_token_id=model.config.eos_token_id,


I am not sure why Seq2SeqLMExportableModule doesn't use something similar to other place like generation_config = model.generation_config.
(maybe @vasqu or @zucchini-nlp could say sth here)

For the change here, I am also not sure model.config is the best thing to use, or it should use model.generation_config instead.

And one more thing: if we change here, would other tests will be affected. I guess no as I only see only T5 test file use Seq2SeqLMExportableModule.

You're right. It's better to use model.generation_config, I'll do it.
Yes, only T5 uses Seq2SeqLMExportableModule.

Yes, it makes sense to set via generation_config - note that generate is always custom overwritten for executorch related models, e.g. the encoder-decoder overwrite is here

transformers/src/transformers/integrations/executorch.py

Lines 967 to 1000 in de306e8

def generate(self, prompt_token_ids, max_new_tokens):

with torch.no_grad():

model_device = self.full_model.device

# Move input to the model's device if it's on a different device

if prompt_token_ids.device != model_device:

prompt_token_ids = prompt_token_ids.to(model_device)

# Run encoder

encoder_output = self.exported_encoder.module()(prompt_token_ids)

# Initialize with start token (0 for T5) on the correct device

decoder_input_ids = torch.tensor([[0]], dtype=torch.long, device=model_device)

generated_ids = [0]

# Generate tokens one by one

for i in range(max_new_tokens - 1):

# Run decoder for next token prediction

logits = self.exported_decoder.module()(

decoder_input_ids, encoder_output, torch.tensor([i], dtype=torch.long, device=model_device)

)

# Get next token

next_token = torch.argmax(logits[:, -1, :], dim=-1).item()

generated_ids.append(next_token)

# Update input for next iteration on the correct device

decoder_input_ids = torch.tensor([[next_token]], dtype=torch.long, device=model_device)

# Check if EOS token

if next_token == self.generation_config.eos_token_id:

break

return generated_ids

It's always as simple as possible to mimic just simple do_sample=False, the other variants seem to rely on max length for example:

transformers/src/transformers/integrations/executorch.py

Line 621 in de306e8

while len(response_tokens) < max_generation_length:

Note

transformers/src/transformers/integrations/executorch.py

Lines 996 to 998 in de306e8

# Check if EOS token

if next_token == self.generation_config.eos_token_id:

break

which needs the proper eos token

ydshieh · 2026-01-21T15:55:32Z

The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading (even though BF16 is numerically stable enough and doesn't need FP32), so we added a post_init() override in T5PreTrainedModel that checks torch.get_default_dtype() and removes wo from dtype_plan when loading with bfloat16, allowing wo to stay in FP32 only for FP16 loading while using BF16 for BF16 loading.

What kind of failures we have here? I mean could we just adjust the test logic (if that makes sense)? To change the logic in modeling_utils.py, it's best to make good and precise definitions of what those flags should do. So my question is: do the currently logic (on main) around what you changed in modeling_utils.py is wrong itself, or the change is mainly to pass the test (which has a flaw testing logic and to be adjusted)?

ydshieh · 2026-01-21T15:58:37Z

tests/models/t5/test_modeling_t5.py

        mtf_score = -(labels.shape[-1] * loss.item())

-        EXPECTED_SCORE = -59.0293
+        EXPECTED_SCORE = -40.1645


Let's try to use more Expectation class.

BTW, do you know why such difference?

For below line

self.assertTrue

let's avoid such usage, but torch.testing.assert_close

it seems failing since Nov 14 2025. Is this failing since this test is added, or there is some PRs fails it as some point while it passed before ?

when we see or make such big change, it's better to check and give more context, so we are sure we are not doing something wrong.

I think it’s difficult to track the origin of what led to this change. There are many factors that could cause the expected scores or text completions to change. However, I checked that the new value has been consistent for a good period of time.

Abdennacer-Badaoui · 2026-01-21T16:28:19Z

The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading (even though BF16 is numerically stable enough and doesn't need FP32), so we added a post_init() override in T5PreTrainedModel that checks torch.get_default_dtype() and removes wo from dtype_plan when loading with bfloat16, allowing wo to stay in FP32 only for FP16 loading while using BF16 for BF16 loading.

What kind of failures we have here? I mean could we just adjust the test logic (if that makes sense)? To change the logic in modeling_utils.py, it's best to make good and precise definitions of what those flags should do. So my question is: do the currently logic (on main) around what you changed in modeling_utils.py is wrong itself, or the change is mainly to pass the test (which has a flaw testing logic and to be adjusted)?

The current logic in modeling_utils.py on main seems to be wrong, it doesn't respect the documented distinction between _keep_in_fp32_modules and _keep_in_fp32_modules_strict.
According to the documentation, _keep_in_fp32_modules should "avoid casting to anything other than float32, except bfloat16" (only prevent FP16 casting), while _keep_in_fp32_modules_strict is meant to prevent both FP16 and BF16 casting. However, the current implementation in __init__ unconditionally adds both
to dtype_plan during initialization, before knowing the target dtype, which means _keep_in_fp32_modules incorrectly forces modules to FP32 even for BF16 loading. This causes T5's wo layer to stay in FP32 when loading with BF16 (when using _keep_in_fp32_modules = ["wo"]). The test ( tests/models/t5/test_modeling_t5.py::T5ModelFp16Tests::test_fp16_fp32_conversion ) is correctly catching this bug by expecting wo to be BF16 when loading with dtype=torch.bfloat16.

github-actions · 2026-01-21T16:44:00Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43374&sha=030d5e

vasqu · 2026-01-21T18:38:16Z

Re: keeping modules in fp32, see my comment #43374 (comment)

I'm pretty sure this behavior is wrong and was only recently changed - iirc @eustlb added the strict flag, maybe cc @Cyrilvallez for model loading?

ArthurZucker

I think the change is just that now dtype="auto" by default, which will follow the config and if no config IDK. We should try to just for fp32 in the test model lkoading imo

Abdennacer-Badaoui · 2026-02-02T11:04:23Z

src/transformers/modeling_utils.py

+    def _build_dtype_plan_for_loading(self, dtype: torch.dtype) -> dict:
+        """
+        Builds dtype_plan based on target dtype and keep_in_fp32 module settings.
+        """
+
+        # Add _keep_in_fp32_modules only for FP16 loading
+        if isinstance(self._keep_in_fp32_modules, list) and dtype == torch.float16:
+            self.dtype_plan.update(dict.fromkeys(self._keep_in_fp32_modules, torch.float32))
+
+        # Add _keep_in_fp32_modules_strict for both FP16 and BF16
+        if isinstance(self._keep_in_fp32_modules_strict, list) and dtype in (torch.float16, torch.bfloat16):
+            self.dtype_plan.update(dict.fromkeys(self._keep_in_fp32_modules_strict, torch.float32))
+
+        return self.dtype_plan
+


I see @ArthurZucker
I actually added a function that correctly builds the dtype_plan during loading. Let me know what you think. Otherwise, I can revert it and just update the test.

Cyrilvallez

Opened #43683 (will merge asap when the tests finish running) for the dtype issue because I wanted to add very strict tests on this!
The rest LGTM, let's simply revert that part on pull main! Sorry for the inconvenience 😬

github-actions · 2026-02-02T15:59:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: t5

Cyrilvallez

Alright LGTM! Thanks a lot!

Cyrilvallez · 2026-02-02T16:43:30Z

Force merging as failures are completely unrelated!

Abdennacer-Badaoui marked this pull request as draft January 20, 2026 16:13

Abdennacer-Badaoui marked this pull request as ready for review January 20, 2026 16:30

github-actions bot requested review from ArthurZucker and ydshieh January 20, 2026 16:31

Abdennacer-Badaoui requested a review from Cyrilvallez January 20, 2026 16:31

Abdennacer-Badaoui force-pushed the t5-failures branch from 37e20b5 to 71a581f Compare January 21, 2026 15:29

ydshieh reviewed Jan 21, 2026

View reviewed changes

vaibhav-research mentioned this pull request Jan 21, 2026

43054: Add Siglip2Tokenizer to enforce training-time text preprocessing defaults #43101

Merged

4 tasks

ArthurZucker reviewed Feb 2, 2026

View reviewed changes

Abdennacer-Badaoui commented Feb 2, 2026

View reviewed changes

Cyrilvallez reviewed Feb 2, 2026

View reviewed changes

Abdennacer-Badaoui added 10 commits February 2, 2026 16:53

set eos_token_id in GenerationConfig

3c58ae9

fix expectation

1a82557

pop wo from dtype_plan when for bfloat16

7ce7048

fix another expectation (same for amd & nvidia)

d8083b2

last one

1fcc106

fix repo-consistency

b34505d

fix

9c50328

override post_init() instead of __init__()

51bdec4

use _keep_in_fp32_modules in from_pretrained() when dtype=torch.float16

9dd718c

_build_dtype_plan_for_loading

04a459b

Abdennacer-Badaoui added 2 commits February 2, 2026 16:57

some fixes

72362a6

reverting

d0e73c2

Abdennacer-Badaoui force-pushed the t5-failures branch from 71d93cb to d0e73c2 Compare February 2, 2026 15:58

Cyrilvallez approved these changes Feb 2, 2026

View reviewed changes

Cyrilvallez merged commit 07aed9d into huggingface:main Feb 2, 2026
22 of 25 checks passed

	def generate(self, prompt_token_ids, max_new_tokens):
	with torch.no_grad():
	model_device = self.full_model.device

	# Move input to the model's device if it's on a different device
	if prompt_token_ids.device != model_device:
	prompt_token_ids = prompt_token_ids.to(model_device)

	# Run encoder
	encoder_output = self.exported_encoder.module()(prompt_token_ids)

	# Initialize with start token (0 for T5) on the correct device
	decoder_input_ids = torch.tensor([[0]], dtype=torch.long, device=model_device)
	generated_ids = [0]

	# Generate tokens one by one
	for i in range(max_new_tokens - 1):
	# Run decoder for next token prediction
	logits = self.exported_decoder.module()(
	decoder_input_ids, encoder_output, torch.tensor([i], dtype=torch.long, device=model_device)
	)

	# Get next token
	next_token = torch.argmax(logits[:, -1, :], dim=-1).item()
	generated_ids.append(next_token)

	# Update input for next iteration on the correct device
	decoder_input_ids = torch.tensor([[next_token]], dtype=torch.long, device=model_device)

	# Check if EOS token
	if next_token == self.generation_config.eos_token_id:
	break

	return generated_ids

Conversation

Abdennacer-Badaoui commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 20, 2026

Uh oh!

vasqu commented Jan 20, 2026

Uh oh!

Abdennacer-Badaoui commented Jan 21, 2026

Uh oh!

Abdennacer-Badaoui commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Abdennacer-Badaoui commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

vasqu commented Jan 21, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Abdennacer-Badaoui commented Jan 20, 2026 •

edited

Loading

Abdennacer-Badaoui commented Jan 21, 2026 •

edited

Loading

vasqu Jan 21, 2026 •

edited

Loading

ydshieh commented Jan 21, 2026 •

edited

Loading