-
Notifications
You must be signed in to change notification settings - Fork 30.6k
Fix siglip
flaky test_eager_matches_sdpa_inference
#40584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
self.assertIsNotNone(model) | ||
|
||
@parameterized.expand(TEST_EAGER_MATCHES_SDPA_INFERENCE_PARAMETERIZATION) | ||
@is_flaky() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even with this (retry 5 times), we still had some failures from time to time.
Now we don't need this anymore
# The choice of `3e-2` in `outputs_magnitude * 1e-2` might not work if a model has even more larger outputs. | ||
# (we can try to analyze the `rtol` more closely element-wise in the future and adjust the `rtol` instead of `atol`). | ||
computed_atol = outputs_magnitude * 3e-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think with rtol
we can find one value that works for siglip and llama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running this (parameterized) tests 1000 times (for each set of parameters) for siglip, no failure. 😢
(but yeah, the comment here is more for future weird models :-) )
573fed0
to
e146048
Compare
[For maintainers] Suggested jobs to run (before merge) run-slow: siglip |
It turns out the very small hidden dimension will cause some edges cases and fail this test for other cases like Works for all cases now. |
What does this PR do?
This test is very flaky for dtype bf16 for 3 reasons:
bf16
has bad precision, especially when the magnitude is largeAll 3 combined cause it's much more flaky.
This PR address this.