New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix uniform returning end point for BFloat16 and Half #96962
Conversation
Fixes pytorch#96947 If we generate 1.0 - float_eps, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. This changes the rounding of the last bit in the BFloat16 representation to never round up. The result is we never go outside the end point and also the from point now equally likely where before it was half as likely.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96962
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 0bf366a: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How nice is to have if constexpr? :D
// Note for BFloat16 and Half, the default constructor does | ||
// round to nearest even, which may return the end point of our | ||
// range. Use truncation rounding instead. | ||
return truncate_to<scalar_t>(reverse_bound_rand * range + from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a FP precision issue. Let
float reverse_bound_rand = 0.99999994; // FF FF 7F 3F in memory
float range = 1.0;
half from = 100.0;
then reverse_bound_rand * range + from
would be 101 (in float type), and truncate_to<scalar_t>(101)
is also 101 (in half type). However, the function is supposed not to output the to_
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to reserve bounds after truncating: yuantailing@aee8c06
@pytorchbot merge |
No ciflow labels are configured for this repo. |
Merge failedReason: This PR needs a label If not, please add the For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
Hard to say if the test failure is valid or not:
The dtype and number of elements fit the error case described in #96947. This check was added (or at least adapted) in #94009. But this comment doesn't really instill confidence: pytorch/test/test_transformers.py Lines 1704 to 1707 in a37b4fa
@drisspg has there been any progress on investigating this? What do you propose to move forward? |
In the meantime I think we should increase tolerance for this case |
We would need to increase the |
atol is 0.0011 vs 0.0014, (that's 30%), rtol doesn't matter if atol is fine. |
I agree with this. I have not had a chance to fully characterize the nature of these FP errors. I think that it would be fine to bump the mulitiplier of 2 to something greater. There are 65k elements in that tensors and the parametrization sweeps 20k tests, that entry could be an outlier ( not that much confidience installing either) |
I've added the |
It seems the only related failure is the one we observed above. In the light of #96962 (comment), I think we are good just upping the tolerance. |
@pytorchbot merge -g |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_8-cuda11_7-test / test Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f 'unrelated triton version issue' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #96947 If we generate `1.0 - float_eps`, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. Instead, we delay the bounds change until after the value has been rounded. Pull Request resolved: pytorch/pytorch#96962 Approved by: https://github.com/lezcano, https://github.com/ngimel
Fixes #96947 If we generate `1.0 - float_eps`, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. Instead, we delay the bounds change until after the value has been rounded. Pull Request resolved: pytorch/pytorch#96962 Approved by: https://github.com/lezcano, https://github.com/ngimel
Fixes #96947
If we generate
1.0 - float_eps
, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. Instead, we delay the bounds change until after the value has been rounded.cc @pbelevich @pmeier