Skip to content

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 1, 2025

What does this PR do?

This test is very flaky for dtype bf16 for 3 reasons:

  • bf16 has bad precision, especially when the magnitude is large
  • siglip has larger outputs (than llama, clip, for example). The maximal abs. values for them:
    • siglip: 3.0~4.0
    • clip: 0.9 ~ 1.0
    • llama: ~0.1
  • siglip tests have sequence length 225, i.e. num_patches = (image_size: 30 / pat
  • ch_size: 2) ** 2

All 3 combined cause it's much more flaky.

This PR address this.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh requested a review from zucchini-nlp September 1, 2025 09:58
self.assertIsNotNone(model)

@parameterized.expand(TEST_EAGER_MATCHES_SDPA_INFERENCE_PARAMETERIZATION)
@is_flaky()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even with this (retry 5 times), we still had some failures from time to time.

Now we don't need this anymore

Comment on lines +480 to +487
# The choice of `3e-2` in `outputs_magnitude * 1e-2` might not work if a model has even more larger outputs.
# (we can try to analyze the `rtol` more closely element-wise in the future and adjust the `rtol` instead of `atol`).
computed_atol = outputs_magnitude * 3e-2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think with rtol we can find one value that works for siglip and llama

Copy link
Collaborator Author

@ydshieh ydshieh Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running this (parameterized) tests 1000 times (for each set of parameters) for siglip, no failure. 😢

(but yeah, the comment here is more for future weird models :-) )

Copy link
Contributor

github-actions bot commented Sep 1, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: siglip

@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 1, 2025

It turns out the very small hidden dimension will cause some edges cases and fail this test for other cases like logits_per_text.
I just revert the changes but only keep the image_size very small (i.e. sequence length small)

Works for all cases now.

@ydshieh ydshieh merged commit c99d43e into main Sep 1, 2025
25 checks passed
@ydshieh ydshieh deleted the fix_siglip_tests branch September 1, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants