[doc] Add overflow notice for cuFFT on half precision #35594

xwang233 · 2020-03-28T04:48:30Z

This would fix #33485.

cc @ptrblck

dr-ci · 2020-03-28T04:49:19Z

💊 CircleCI build failures summary and remediations

As of commit c2a8ce4 (more details on the Dr. CI page):

1/2 failures introduced in this PR
1/2 broken upstream at merge base ef511d8 from Mar 27 until Mar 28 (23 commits; 0c16ced - a9b540d)
Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)
```
If your commit is older than viable/strict:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

pytorch_linux_backward_compatibility_check_test (1/1)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 29 04:45:12 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.

Mar 29 04:45:12 processing existing schema:  aten::sparse_coo_tensor.size(int[] size, *, int dtype, int layout, Device device, bool pin_memory=False) -> (Tensor) 
Mar 29 04:45:12 processing existing schema:  aten::sparse_coo_tensor.indices(Tensor indices, Tensor values, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor) 
Mar 29 04:45:12 processing existing schema:  aten::sparse_coo_tensor.indices_size(Tensor indices, Tensor values, int[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor) 
Mar 29 04:45:12 processing existing schema:  aten::split_with_sizes(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[]) 
Mar 29 04:45:12 processing existing schema:  aten::squeeze(Tensor(a) self) -> (Tensor(a)) 
Mar 29 04:45:12 processing existing schema:  aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a)) 
Mar 29 04:45:12 processing existing schema:  aten::stft(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool normalized=False, bool onesided=True) -> (Tensor) 
Mar 29 04:45:12 skipping schema:  aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!)) 
Mar 29 04:45:12 skipping schema:  aten::sub_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> (Tensor(a!)) 
Mar 29 04:45:12 processing existing schema:  aten::t(Tensor(a) self) -> (Tensor(a)) 
Mar 29 04:45:12 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Mar 29 04:45:12  
Mar 29 04:45:12 Broken ops: [ 
Mar 29 04:45:12 	aten::owner(RRef(t) self) -> (__torch__.torch.classes.dist_rpc.WorkerInfo) 
Mar 29 04:45:12 	prepacked::conv2d_clamp_run(Tensor X, __torch__.torch.classes.xnnpack.Conv2dOpContext W_prepack) -> (Tensor Y) 
Mar 29 04:45:12 	prepacked::conv2d_clamp_prepack(Tensor W, Tensor? B, int[2] stride, int[2] padding, int[2] dilation, int groups, float? output_min=None, float? output_max=None) -> (__torch__.torch.classes.xnnpack.Conv2dOpContext) 
Mar 29 04:45:12 	prepacked::linear_clamp_run(Tensor X, __torch__.torch.classes.xnnpack.LinearOpContext W_prepack) -> (Tensor Y) 
Mar 29 04:45:12 	prepacked::linear_clamp_prepack(Tensor W, Tensor? B=None, float? output_min=None, float? output_max=None) -> (__torch__.torch.classes.xnnpack.LinearOpContext) 
Mar 29 04:45:12 ] 
Mar 29 04:45:12 + cleanup 
Mar 29 04:45:12 + retcode=1

🚧 1 upstream failure:

These were probably caused by upstream breakages:

pytorch_macos_10_13_py3_test from Mar 27 until Mar 28 (23 commits; 0c16ced - a9b540d)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 15 times.

vadimkantorov · 2020-03-28T12:20:12Z

Is NaN/Inf-issue related to "training on CUDA" per se?

From what I understood it can happen in any setting, just related to cuFFT functioning.

xwang233 · 2020-03-29T02:19:28Z

Thanks for the comment. I have reworded that.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-03-31T04:26:11Z

@ngimel merged this pull request in e021c13.

xwang233 · 2020-06-25T02:39:19Z

@ngimel I found this PR was reverted at a15a4a5. Is it because of the lint? Can I reland it if I fix the lint?

ngimel · 2020-06-25T03:37:46Z

Yeah, it was because of the lint. Sure, you can reland.

Summary: Reland of #35594 Pull Request resolved: #40551 Reviewed By: ezyang Differential Revision: D22249831 Pulled By: ngimel fbshipit-source-id: b221b3c0a490ccaaabba50aa698a2490536e0917

cufft overflow for half precision

563ea29

xwang233 requested a review from ngimel March 28, 2020 04:48

pytorchbot added the open source label Mar 28, 2020

remove 'on cuda'

ba930b6

Update _torch_docs.py

44e4f87

ngimel approved these changes Mar 29, 2020

View reviewed changes

facebook-github-bot reviewed Mar 29, 2020

View reviewed changes

Update _torch_docs.py

c2a8ce4

facebook-github-bot reviewed Mar 29, 2020

View reviewed changes

vadimkantorov mentioned this pull request Mar 30, 2020

[discussion] Generic solutions for too-small-epsilon in FP16 training #35666

Open

facebook-github-bot closed this in e021c13 Mar 30, 2020

facebook-github-bot added the merged label Mar 31, 2020

xwang233 mentioned this pull request Jun 25, 2020

[Reland][doc] Add overflow notice for cuFFT on half precision #40551

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[doc] Add overflow notice for cuFFT on half precision #35594

[doc] Add overflow notice for cuFFT on half precision #35594

Uh oh!

xwang233 commented Mar 28, 2020

Uh oh!

dr-ci bot commented Mar 28, 2020 •

edited

Loading

Uh oh!

vadimkantorov commented Mar 28, 2020

Uh oh!

xwang233 commented Mar 29, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Mar 31, 2020

Uh oh!

xwang233 commented Jun 25, 2020

Uh oh!

ngimel commented Jun 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[doc] Add overflow notice for cuFFT on half precision #35594

[doc] Add overflow notice for cuFFT on half precision #35594

Uh oh!

Conversation

xwang233 commented Mar 28, 2020

Uh oh!

dr-ci bot commented Mar 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_backward_compatibility_check_test (1/1)

🚧 1 upstream failure:

Uh oh!

vadimkantorov commented Mar 28, 2020

Uh oh!

xwang233 commented Mar 29, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 31, 2020

Uh oh!

xwang233 commented Jun 25, 2020

Uh oh!

ngimel commented Jun 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dr-ci bot commented Mar 28, 2020 •

edited

Loading