Skip to content

Fix t5 failures#43374

Merged
Cyrilvallez merged 12 commits intohuggingface:mainfrom
Abdennacer-Badaoui:t5-failures
Feb 2, 2026
Merged

Fix t5 failures#43374
Cyrilvallez merged 12 commits intohuggingface:mainfrom
Abdennacer-Badaoui:t5-failures

Conversation

@Abdennacer-Badaoui
Copy link
Member

@Abdennacer-Badaoui Abdennacer-Badaoui commented Jan 20, 2026

Fix all the remaining T5 failures.

  • The test test_export_t5_summarization was failing because eos_token_id was not being passed to GenerationConfig in Seq2SeqLMExportableModule, causing the generate loop to never stop at the end-of-sequence token and produce extra tokens.
  • The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading, so I created a centralized _build_dtype_plan_for_loading() method that encapsulates all the dtype_plan building logic in one place, taking the target dtype as input and properly applying _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16) based on the actual loading dtype.
  • Fix some expectations (tested for both AMD and Nvidia)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as draft January 20, 2026 16:13
@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as ready for review January 20, 2026 16:30
@vasqu
Copy link
Contributor

vasqu commented Jan 20, 2026

This is weird because we explicitly have _keep_in_fp32_modules and _keep_in_fp32_modules_strict - the later being for bf16 as well, the former only for fp16

Something must have changed for loading models

@Abdennacer-Badaoui
Copy link
Member Author

That's true. I think I found the problem.
So the dtype_plan initialization in PreTrainedModel.__init__ was unconditionally adding _keep_in_fp32_modules entries (intended for FP16 only) even when loading with BF16, so I will fix the core loading logic in modeling_utils.py to only add _keep_in_fp32_modules to dtype_plan when dtype == torch.float16, respecting the intended distinction between _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16).

@Abdennacer-Badaoui
Copy link
Member Author

Abdennacer-Badaoui commented Jan 21, 2026

I created a centralized _build_dtype_plan_for_loading() method that encapsulates all the dtype_plan building logic in one place, taking the target dtype as input and properly applying _keep_in_fp32_modules (FP16 only) and _keep_in_fp32_modules_strict (FP16 and BF16) based on the actual loading dtype, replacing my previous fragmented logic.

"batch_size": batch_size,
"max_cache_len": max_cache_length,
},
eos_token_id=model.config.eos_token_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why Seq2SeqLMExportableModule doesn't use something similar to other place like generation_config = model.generation_config.
(maybe @vasqu or @zucchini-nlp could say sth here)

For the change here, I am also not sure model.config is the best thing to use, or it should use model.generation_config instead.

And one more thing: if we change here, would other tests will be affected. I guess no as I only see only T5 test file use Seq2SeqLMExportableModule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's better to use model.generation_config, I'll do it.
Yes, only T5 uses Seq2SeqLMExportableModule.

Copy link
Contributor

@vasqu vasqu Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense to set via generation_config - note that generate is always custom overwritten for executorch related models, e.g. the encoder-decoder overwrite is here

def generate(self, prompt_token_ids, max_new_tokens):
with torch.no_grad():
model_device = self.full_model.device
# Move input to the model's device if it's on a different device
if prompt_token_ids.device != model_device:
prompt_token_ids = prompt_token_ids.to(model_device)
# Run encoder
encoder_output = self.exported_encoder.module()(prompt_token_ids)
# Initialize with start token (0 for T5) on the correct device
decoder_input_ids = torch.tensor([[0]], dtype=torch.long, device=model_device)
generated_ids = [0]
# Generate tokens one by one
for i in range(max_new_tokens - 1):
# Run decoder for next token prediction
logits = self.exported_decoder.module()(
decoder_input_ids, encoder_output, torch.tensor([i], dtype=torch.long, device=model_device)
)
# Get next token
next_token = torch.argmax(logits[:, -1, :], dim=-1).item()
generated_ids.append(next_token)
# Update input for next iteration on the correct device
decoder_input_ids = torch.tensor([[next_token]], dtype=torch.long, device=model_device)
# Check if EOS token
if next_token == self.generation_config.eos_token_id:
break
return generated_ids

It's always as simple as possible to mimic just simple do_sample=False, the other variants seem to rely on max length for example:

while len(response_tokens) < max_generation_length:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

# Check if EOS token
if next_token == self.generation_config.eos_token_id:
break
which needs the proper eos token

@ydshieh
Copy link
Collaborator

ydshieh commented Jan 21, 2026

The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading (even though BF16 is numerically stable enough and doesn't need FP32), so we added a post_init() override in T5PreTrainedModel that checks torch.get_default_dtype() and removes wo from dtype_plan when loading with bfloat16, allowing wo to stay in FP32 only for FP16 loading while using BF16 for BF16 loading.

What kind of failures we have here? I mean could we just adjust the test logic (if that makes sense)? To change the logic in modeling_utils.py, it's best to make good and precise definitions of what those flags should do. So my question is: do the currently logic (on main) around what you changed in modeling_utils.py is wrong itself, or the change is mainly to pass the test (which has a flaw testing logic and to be adjusted)?

mtf_score = -(labels.shape[-1] * loss.item())

EXPECTED_SCORE = -59.0293
EXPECTED_SCORE = -40.1645
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to use more Expectation class.

BTW, do you know why such difference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For below line

self.assertTrue

let's avoid such usage, but torch.testing.assert_close

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems failing since Nov 14 2025. Is this failing since this test is added, or there is some PRs fails it as some point while it passed before ?

when we see or make such big change, it's better to check and give more context, so we are sure we are not doing something wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s difficult to track the origin of what led to this change. There are many factors that could cause the expected scores or text completions to change. However, I checked that the new value has been consistent for a good period of time.

@Abdennacer-Badaoui
Copy link
Member Author

The test_fp16_fp32_conversion test was failing because T5's _keep_in_fp32_modules = ["wo"] was unconditionally forcing the wo layer to FP32 for both FP16 and BF16 loading (even though BF16 is numerically stable enough and doesn't need FP32), so we added a post_init() override in T5PreTrainedModel that checks torch.get_default_dtype() and removes wo from dtype_plan when loading with bfloat16, allowing wo to stay in FP32 only for FP16 loading while using BF16 for BF16 loading.

What kind of failures we have here? I mean could we just adjust the test logic (if that makes sense)? To change the logic in modeling_utils.py, it's best to make good and precise definitions of what those flags should do. So my question is: do the currently logic (on main) around what you changed in modeling_utils.py is wrong itself, or the change is mainly to pass the test (which has a flaw testing logic and to be adjusted)?

The current logic in modeling_utils.py on main seems to be wrong, it doesn't respect the documented distinction between _keep_in_fp32_modules and _keep_in_fp32_modules_strict.
According to the documentation, _keep_in_fp32_modules should "avoid casting to anything other than float32, except bfloat16" (only prevent FP16 casting), while _keep_in_fp32_modules_strict is meant to prevent both FP16 and BF16 casting. However, the current implementation in __init__ unconditionally adds both
to dtype_plan during initialization, before knowing the target dtype, which means _keep_in_fp32_modules incorrectly forces modules to FP32 even for BF16 loading. This causes T5's wo layer to stay in FP32 when loading with BF16 (when using _keep_in_fp32_modules = ["wo"]). The test ( tests/models/t5/test_modeling_t5.py::T5ModelFp16Tests::test_fp16_fp32_conversion ) is correctly catching this bug by expecting wo to be BF16 when loading with dtype=torch.bfloat16.

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43374&sha=030d5e

@vasqu
Copy link
Contributor

vasqu commented Jan 21, 2026

Re: keeping modules in fp32, see my comment #43374 (comment)

I'm pretty sure this behavior is wrong and was only recently changed - iirc @eustlb added the strict flag, maybe cc @Cyrilvallez for model loading?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is just that now dtype="auto" by default, which will follow the config and if no config IDK. We should try to just for fp32 in the test model lkoading imo

Comment on lines 1362 to 1376
def _build_dtype_plan_for_loading(self, dtype: torch.dtype) -> dict:
"""
Builds dtype_plan based on target dtype and keep_in_fp32 module settings.
"""

# Add _keep_in_fp32_modules only for FP16 loading
if isinstance(self._keep_in_fp32_modules, list) and dtype == torch.float16:
self.dtype_plan.update(dict.fromkeys(self._keep_in_fp32_modules, torch.float32))

# Add _keep_in_fp32_modules_strict for both FP16 and BF16
if isinstance(self._keep_in_fp32_modules_strict, list) and dtype in (torch.float16, torch.bfloat16):
self.dtype_plan.update(dict.fromkeys(self._keep_in_fp32_modules_strict, torch.float32))

return self.dtype_plan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @ArthurZucker
I actually added a function that correctly builds the dtype_plan during loading. Let me know what you think. Otherwise, I can revert it and just update the test.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #43683 (will merge asap when the tests finish running) for the dtype issue because I wanted to add very strict tests on this!
The rest LGTM, let's simply revert that part on pull main! Sorry for the inconvenience 😬

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: t5

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright LGTM! Thanks a lot!

@Cyrilvallez
Copy link
Member

Force merging as failures are completely unrelated!

@Cyrilvallez Cyrilvallez merged commit 07aed9d into huggingface:main Feb 2, 2026
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants