Fix uniform returning end point for BFloat16 and Half #96962

peterbell10 · 2023-03-16T16:11:13Z

If we generate 1.0 - float_eps, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. Instead, we delay the bounds change until after the value has been rounded.

cc @pbelevich @pmeier

Fixes pytorch#96947 If we generate 1.0 - float_eps, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. This changes the rounding of the last bit in the BFloat16 representation to never round up. The result is we never go outside the end point and also the from point now equally likely where before it was half as likely.

pytorch-bot · 2023-03-16T16:12:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96962

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 0bf366a:

NEW FAILURES - The following jobs have failed:

manywheel-py3_8-cuda11_7-test / test (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lezcano

How nice is to have if constexpr? :D

yuantailing · 2023-03-19T11:36:06Z

aten/src/ATen/native/cuda/DistributionTemplates.h

+      // Note for BFloat16 and Half, the default constructor does
+      // round to nearest even, which may return the end point of our
+      // range. Use truncation rounding instead.
+      return truncate_to<scalar_t>(reverse_bound_rand * range + from);


There is a FP precision issue. Let

float reverse_bound_rand = 0.99999994; // FF FF 7F 3F in memory float range = 1.0; half from = 100.0;

then reverse_bound_rand * range + from would be 101 (in float type), and truncate_to<scalar_t>(101) is also 101 (in half type). However, the function is supposed not to output the to_ value.

I suggest to reserve bounds after truncating: yuantailing@aee8c06

Fixes pytorch#96947

aten/src/ATen/native/cuda/DistributionTemplates.h

peterbell10 · 2023-03-20T17:14:02Z

@pytorchbot merge

pytorch-bot · 2023-03-20T17:14:08Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

pytorchmergebot · 2023-03-20T17:16:01Z

Merge failed

Reason: This PR needs a label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

peterbell10 · 2023-03-20T17:34:39Z

@pytorchbot merge

pytorchmergebot · 2023-03-20T17:36:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-20T18:22:07Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-bionic-cuda11.7-py3.10-gcc7-sm86 / test (default, 4, 4, linux.g5.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

pmeier · 2023-03-20T20:07:59Z

Hard to say if the test failure is valid or not:

2023-03-20T18:20:38.1245440Z _ TestSDPA.test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_128_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_None _
2023-03-20T18:20:38.1245557Z Traceback (most recent call last):
2023-03-20T18:20:38.1245764Z   File "/var/lib/jenkins/workspace/test/test_transformers.py", line 1909, in test_flash_attention_vs_math_ref_grads
2023-03-20T18:20:38.1245943Z     self.assertEqual(query.grad, query_ref.grad.to(query.grad.dtype),
2023-03-20T18:20:38.1246248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3013, in assertEqual
2023-03-20T18:20:38.1246365Z     raise error_metas[0].to_error(
2023-03-20T18:20:38.1246514Z AssertionError: Tensor-likes are not close!
2023-03-20T18:20:38.1246520Z 
2023-03-20T18:20:38.1246632Z Mismatched elements: 1 / 65536 (0.0%)
2023-03-20T18:20:38.1246803Z Greatest absolute difference: 0.0014754831790924072 at index (0, 2, 2033, 0) (up to 0.0011385045945644379 allowed)
2023-03-20T18:20:38.1247035Z Greatest relative difference: 319.4129032258065 at index (0, 2, 2033, 0) (up to 71.4206771850586 allowed)

The dtype and number of elements fit the error case described in #96947.

This check was added (or at least adapted) in #94009. But this comment doesn't really instill confidence:

pytorch/test/test_transformers.py

Lines 1704 to 1707 in a37b4fa

    
           # TODO: Investigate why grad_q needs larger tolerances 
        
           grad_q_deviation = query_ref.grad - query_ref_lp.grad 
        
           grad_q_ref_atol = max(2 * torch.abs(grad_q_deviation).max().item(), default_atol[out.dtype]) 
        
           grad_q_ref_rtol = max(get_rtol(query_ref.grad, query_ref_lp.grad), default_rtol[out.dtype])

@drisspg has there been any progress on investigating this? What do you propose to move forward?

ngimel · 2023-03-20T20:18:32Z

In the meantime I think we should increase tolerance for this case

pmeier · 2023-03-20T20:21:41Z

We would need to increase the rtol five fold and the atol roughly by 50%. And we are currently stopping after the first failure. Maybe there are more parameters that fail here.

ngimel · 2023-03-20T20:24:06Z

atol is 0.0011 vs 0.0014, (that's 30%), rtol doesn't matter if atol is fine.

drisspg · 2023-03-20T20:31:29Z

In the meantime I think we should increase tolerance for this case

I agree with this. I have not had a chance to fully characterize the nature of these FP errors. I think that it would be fine to bump the mulitiplier of 2 to something greater. There are 65k elements in that tensors and the parametrization sweeps 20k tests, that entry could be an outlier ( not that much confidience installing either)

pmeier · 2023-03-20T21:00:17Z

I've added the keep-going label so let's see what falls out.

pmeier · 2023-03-21T08:24:29Z

It seems the only related failure is the one we observed above. In the light of #96962 (comment), I think we are good just upping the tolerance.

pmeier · 2023-03-21T13:31:10Z

@pytorchbot merge -g

pytorch-bot · 2023-03-21T13:31:13Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -g

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

Try @pytorchbot --help for more info.

pmeier · 2023-03-21T13:32:19Z

@pytorchbot merge

pytorchmergebot · 2023-03-21T13:34:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-21T13:34:16Z

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_8-cuda11_7-test / test

Details for Dev Infra team

Raised by workflow job

pmeier · 2023-03-21T13:59:32Z

@pytorchbot merge -f 'unrelated triton version issue'

pytorchmergebot · 2023-03-21T14:01:20Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #96947 If we generate `1.0 - float_eps`, the BFloat16 and Half constructors will round this to 1.0 which is outside of the half-open range. Instead, we delay the bounds change until after the value has been rounded. Pull Request resolved: pytorch/pytorch#96962 Approved by: https://github.com/lezcano, https://github.com/ngimel

pytorchbot added the open source label Mar 16, 2023

lezcano approved these changes Mar 18, 2023

View reviewed changes

yuantailing reviewed Mar 20, 2023

View reviewed changes

yuantailing and others added 2 commits March 20, 2023 16:09

Fix the range of random uniform

421f11b

Fixes pytorch#96947

Remove dead code

6832038

lezcano reviewed Mar 20, 2023

View reviewed changes

aten/src/ATen/native/cuda/DistributionTemplates.h Outdated Show resolved Hide resolved

ngimel approved these changes Mar 20, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 20, 2023

peterbell10 added module: random Related to random number generation in PyTorch (rng generator) topic: bug fixes topic category release notes: cuda release notes category labels Mar 20, 2023

pmeier added the keep-going Don't stop on first failure, keep running tests until the end label Mar 20, 2023

Trigger CI

000d92b

Bump flash_attention tolerance

0bf366a

pytorchmergebot added the Merged label Mar 21, 2023

pytorchmergebot closed this in 4e05417 Mar 21, 2023

pmeier mentioned this pull request Mar 21, 2023

add proper tests for torch.testing.make_tensor #96331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uniform returning end point for BFloat16 and Half #96962

Fix uniform returning end point for BFloat16 and Half #96962

peterbell10 commented Mar 16, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Mar 16, 2023 •

edited

lezcano left a comment

yuantailing Mar 19, 2023

yuantailing Mar 19, 2023

peterbell10 commented Mar 20, 2023

pytorch-bot bot commented Mar 20, 2023

pytorchmergebot commented Mar 20, 2023

peterbell10 commented Mar 20, 2023

pytorchmergebot commented Mar 20, 2023

pytorchmergebot commented Mar 20, 2023

pmeier commented Mar 20, 2023

ngimel commented Mar 20, 2023

pmeier commented Mar 20, 2023

ngimel commented Mar 20, 2023

drisspg commented Mar 20, 2023

pmeier commented Mar 20, 2023

pmeier commented Mar 21, 2023

pmeier commented Mar 21, 2023

pytorch-bot bot commented Mar 21, 2023

pmeier commented Mar 21, 2023

pytorchmergebot commented Mar 21, 2023

pytorchmergebot commented Mar 21, 2023

pmeier commented Mar 21, 2023

pytorchmergebot commented Mar 21, 2023

Fix uniform returning end point for BFloat16 and Half #96962

Fix uniform returning end point for BFloat16 and Half #96962

Conversation

peterbell10 commented Mar 16, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Mar 16, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96962

❌ 1 Failures

lezcano left a comment

Choose a reason for hiding this comment

yuantailing Mar 19, 2023

Choose a reason for hiding this comment

yuantailing Mar 19, 2023

Choose a reason for hiding this comment

peterbell10 commented Mar 20, 2023

pytorch-bot bot commented Mar 20, 2023

pytorchmergebot commented Mar 20, 2023

Merge failed

peterbell10 commented Mar 20, 2023

pytorchmergebot commented Mar 20, 2023

Merge started

pytorchmergebot commented Mar 20, 2023

Merge failed

pmeier commented Mar 20, 2023

ngimel commented Mar 20, 2023

pmeier commented Mar 20, 2023

ngimel commented Mar 20, 2023

drisspg commented Mar 20, 2023

pmeier commented Mar 20, 2023

pmeier commented Mar 21, 2023

pmeier commented Mar 21, 2023

pytorch-bot bot commented Mar 21, 2023

pmeier commented Mar 21, 2023

pytorchmergebot commented Mar 21, 2023

Merge started

pytorchmergebot commented Mar 21, 2023

Merge failed

pmeier commented Mar 21, 2023

pytorchmergebot commented Mar 21, 2023

Merge started

peterbell10 commented Mar 16, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Mar 16, 2023 •

edited