Fix `siglip` flaky `test_eager_matches_sdpa_inference` #40584

ydshieh · 2025-09-01T08:08:13Z

What does this PR do?

This test is very flaky for dtype bf16 for 3 reasons:

bf16 has bad precision, especially when the magnitude is large
siglip has larger outputs (than llama, clip, for example). The maximal abs. values for them:
- siglip: 3.0~4.0
- clip: 0.9 ~ 1.0
- llama: ~0.1
siglip tests have sequence length 225, i.e. num_patches = (image_size: 30 / pat
ch_size: 2) ** 2

All 3 combined cause it's much more flaky.

This PR address this.

HuggingFaceDocBuilderDev · 2025-09-01T08:17:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2025-09-01T10:09:44Z

tests/models/siglip/test_modeling_siglip.py

        self.assertIsNotNone(model)

    @parameterized.expand(TEST_EAGER_MATCHES_SDPA_INFERENCE_PARAMETERIZATION)
-    @is_flaky()


even with this (retry 5 times), we still had some failures from time to time.

Now we don't need this anymore

zucchini-nlp · 2025-09-01T10:15:16Z

tests/test_modeling_common.py

+            # The choice of `3e-2` in `outputs_magnitude * 1e-2` might not work if a model has even more larger outputs.
+            # (we can try to analyze the `rtol` more closely element-wise in the future and adjust the `rtol` instead of `atol`).
+            computed_atol = outputs_magnitude * 3e-2


yeah, i think with rtol we can find one value that works for siglip and llama

~~running this (parameterized) tests 1000 times (for each set of parameters) for siglip, no failure.~~ 😢

(but yeah, the comment here is more for future weird models :-) )

github-actions · 2025-09-01T12:27:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: siglip

ydshieh · 2025-09-01T12:41:18Z

It turns out the very small hidden dimension will cause some edges cases and fail this test for other cases like logits_per_text.
I just revert the changes but only keep the image_size very small (i.e. sequence length small)

Works for all cases now.

ydshieh requested a review from zucchini-nlp September 1, 2025 09:58

ydshieh commented Sep 1, 2025

View reviewed changes

zucchini-nlp approved these changes Sep 1, 2025

View reviewed changes

fix

e146048

ydshieh force-pushed the fix_siglip_tests branch from 573fed0 to e146048 Compare September 1, 2025 12:26

Merge branch 'main' into fix_siglip_tests

ee9747e

ydshieh merged commit c99d43e into main Sep 1, 2025
25 checks passed

ydshieh deleted the fix_siglip_tests branch September 1, 2025 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `siglip` flaky `test_eager_matches_sdpa_inference` #40584

Fix `siglip` flaky `test_eager_matches_sdpa_inference` #40584

Uh oh!

ydshieh commented Sep 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 1, 2025

Uh oh!

ydshieh Sep 1, 2025

Uh oh!

zucchini-nlp Sep 1, 2025

Uh oh!

ydshieh Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

ydshieh commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Fix siglip flaky test_eager_matches_sdpa_inference #40584

Fix siglip flaky test_eager_matches_sdpa_inference #40584

Uh oh!

Conversation

ydshieh commented Sep 1, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 1, 2025

Uh oh!

ydshieh Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

ydshieh commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

Fix `siglip` flaky `test_eager_matches_sdpa_inference` #40584

Fix `siglip` flaky `test_eager_matches_sdpa_inference` #40584

ydshieh Sep 1, 2025 •

edited

Loading