Improve `has_similar_generate_outputs` assertions by tarekziade · Pull Request #44166 · huggingface/transformers

tarekziade · 2026-02-20T07:20:15Z

The CI does not output useful info on this flaky test - tests.models.olmo.test_modeling_olmo.OlmoModelTest testMethod=test_generate_with_static_cache and makes it harder to determine the root problem when not reproducible locally.

This patch improves this assertion, so we get full details when it fails.

Before (CI output):

AssertionError: False is not true

After (CI output would look like):

AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05).
    Sequence mismatch: 3/20 tokens differ (first at position 12).
    Batch index: 0
    Token at mismatch — output_1: 1542, output_2: 8903
    Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05

This will tell us immediately whether it's a tolerance issue (small diffs suggesting we might just need a slightly larger atol), a completely different generation path, or a specific batch element problem.

While I've changed all the callers in our tests, I kept the original function intact for backward compact (even though it might be overkill)

tarekziade · 2026-02-20T07:22:22Z

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

github-actions · 2026-02-20T07:23:35Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/gemma3n", "models/glm_ocr", "models/kosmos2", "models/kyutai_speech_to_text", "models/t5gemma", "models/t5gemma2"]
quantizations: []

HuggingFaceDocBuilderDev · 2026-02-20T07:32:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-02-20T08:42:34Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	41e6d157	workflow commit (merge commit)
PR	6f6956ec	branch commit (from PR)
main	1618d44b	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

tests/generation/test_utils.py

The CI does not output useful info on this flaky test - `tests.models.olmo.test_modeling_olmo.OlmoModelTest testMethod=test_generate_with_static_cache` and makes it harder to determine the root problem when not reproducible locally. This patch improves this assertion, so we get full details when it fails. Before (CI output): ``` AssertionError: False is not true ``` After (CI output would look like): ``` AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05). Sequence mismatch: 3/20 tokens differ (first at position 12). Batch index: 0 Token at mismatch — output_1: 1542, output_2: 8903 Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05 ``` This will tell us immediately whether it's a tolerance issue (small diffs suggesting we might just need a slightly larger atol), a completely different generation path, or a specific batch element problem. While I've changed all the callers in our tests, I kept the original function intact for backward compact (even though it might be overkill)

Rocketknight1

One nit about the docstrings but this seems ready!

tests/generation/test_utils.py

tarekziade · 2026-02-26T16:19:14Z

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

github-actions · 2026-02-26T16:20:23Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/gemma3n", "models/glm_ocr", "models/kosmos2", "models/kyutai_speech_to_text", "models/t5gemma", "models/t5gemma2"]
quantizations: []

github-actions · 2026-02-26T17:46:34Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	aff24334	workflow commit (merge commit)
PR	519afc75	branch commit (from PR)
main	af5dfb6c	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-02-27T07:39:33Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

This patch improves the `has_similar_generate_outputs` assertion, so we get full details when it fails. Before (CI output): ``` AssertionError: False is not true ``` After (CI output would look like): ``` AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05). Sequence mismatch: 3/20 tokens differ (first at position 12). Batch index: 0 Token at mismatch — output_1: 1542, output_2: 8903 Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05 ``` This will tell us immediately whether it's a tolerance issue (small diffs suggesting we might just need a slightly larger atol), a completely different generation path, or a specific batch element problem.

tarekziade requested a review from Rocketknight1 February 20, 2026 07:20

tarekziade self-assigned this Feb 20, 2026

Rocketknight1 reviewed Feb 20, 2026

View reviewed changes

tests/generation/test_utils.py Outdated Show resolved Hide resolved

tarekziade force-pushed the tarekziade-improve-assertion branch from 38f9997 to 1209e55 Compare February 20, 2026 19:29

tarekziade added 2 commits February 26, 2026 14:13

Merge branch 'main' into tarekziade-improve-assertion

c51d642

remove unrelated fix

3ee4ee0

Rocketknight1 approved these changes Feb 26, 2026

View reviewed changes

tests/generation/test_utils.py Show resolved Hide resolved

tarekziade added 2 commits February 26, 2026 16:55

smaller docstrings

11b9654

Merge branch 'main' into tarekziade-improve-assertion

519afc7

Merge branch 'main' into tarekziade-improve-assertion

8f9ed3e

tarekziade merged commit 24db9db into main Feb 27, 2026
26 checks passed

tarekziade deleted the tarekziade-improve-assertion branch February 27, 2026 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `has_similar_generate_outputs` assertions#44166

Improve `has_similar_generate_outputs` assertions#44166
tarekziade merged 6 commits intomainfrom
tarekziade-improve-assertion

tarekziade commented Feb 20, 2026

Uh oh!

tarekziade commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Uh oh!

Rocketknight1 left a comment

Uh oh!

Uh oh!

tarekziade commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tarekziade commented Feb 20, 2026

Uh oh!

tarekziade commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tarekziade commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

CI Results

Commit Info

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants