Skip to content

Improve has_similar_generate_outputs assertions#44166

Merged
tarekziade merged 6 commits intomainfrom
tarekziade-improve-assertion
Feb 27, 2026
Merged

Improve has_similar_generate_outputs assertions#44166
tarekziade merged 6 commits intomainfrom
tarekziade-improve-assertion

Conversation

@tarekziade
Copy link
Collaborator

The CI does not output useful info on this flaky test - tests.models.olmo.test_modeling_olmo.OlmoModelTest testMethod=test_generate_with_static_cache and makes it harder to determine the root problem when not reproducible locally.

This patch improves this assertion, so we get full details when it fails.

Before (CI output):

AssertionError: False is not true

After (CI output would look like):

AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05).
    Sequence mismatch: 3/20 tokens differ (first at position 12).
    Batch index: 0
    Token at mismatch — output_1: 1542, output_2: 8903
    Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05

This will tell us immediately whether it's a tolerance issue (small diffs suggesting we might just need a slightly larger atol), a completely different generation path, or a specific batch element problem.

While I've changed all the callers in our tests, I kept the original function intact for backward compact (even though it might be overkill)

@tarekziade
Copy link
Collaborator Author

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

@github-actions
Copy link
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/gemma3n", "models/glm_ocr", "models/kosmos2", "models/kyutai_speech_to_text", "models/t5gemma", "models/t5gemma2"]
quantizations: []

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 41e6d157 workflow commit (merge commit)
PR 6f6956ec branch commit (from PR)
main 1618d44b base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@tarekziade tarekziade self-assigned this Feb 20, 2026
The CI does not output useful info on this flaky test - `tests.models.olmo.test_modeling_olmo.OlmoModelTest testMethod=test_generate_with_static_cache`
and makes it harder to determine the root problem when not reproducible locally.

This patch improves this assertion, so we get full details when it fails.

Before (CI output):

```
AssertionError: False is not true
```

After (CI output would look like):

```
AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05).
    Sequence mismatch: 3/20 tokens differ (first at position 12).
    Batch index: 0
    Token at mismatch — output_1: 1542, output_2: 8903
    Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05
```

This will tell us immediately whether it's a tolerance issue (small diffs
suggesting we might just need a slightly larger atol), a completely different
generation path, or a specific batch element problem.

While I've changed all the callers in our tests, I kept the original function
intact for backward compact (even though it might be overkill)
@tarekziade tarekziade force-pushed the tarekziade-improve-assertion branch from 38f9997 to 1209e55 Compare February 20, 2026 19:29
Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit about the docstrings but this seems ready!

@tarekziade
Copy link
Collaborator Author

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

@github-actions
Copy link
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/gemma3n", "models/glm_ocr", "models/kosmos2", "models/kyutai_speech_to_text", "models/t5gemma", "models/t5gemma2"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN aff24334 workflow commit (merge commit)
PR 519afc75 branch commit (from PR)
main af5dfb6c base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia, gemma3n, glm_ocr, kosmos2, kyutai_speech_to_text, t5gemma, t5gemma2

@tarekziade tarekziade merged commit 24db9db into main Feb 27, 2026
26 checks passed
@tarekziade tarekziade deleted the tarekziade-improve-assertion branch February 27, 2026 08:26
zvik pushed a commit to zvik/transformers that referenced this pull request Mar 1, 2026
This patch improves the `has_similar_generate_outputs` assertion, so we get full details when it fails.

Before (CI output):

```
AssertionError: False is not true
```

After (CI output would look like):

```
AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05).
    Sequence mismatch: 3/20 tokens differ (first at position 12).
    Batch index: 0
    Token at mismatch — output_1: 1542, output_2: 8903
    Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05
```

This will tell us immediately whether it's a tolerance issue (small diffs
suggesting we might just need a slightly larger atol), a completely different
generation path, or a specific batch element problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants