Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA BFloat16 gelu, hardswish, hardsigmoid #44997

Closed
wants to merge 7 commits into from
Closed

Conversation

zasdfgbnm
Copy link
Collaborator

No description provided.

@dr-ci
Copy link

dr-ci bot commented Sep 19, 2020

💊 CI failures summary and remediations

As of commit 977b89a (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_build (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Oct 19 18:48:14 caused by: Connection refused (os error 111)
Oct 19 18:48:14 ++++ extract_trap_cmd 
Oct 19 18:48:14 ++++ printf '%s\n' '' 
Oct 19 18:48:14 +++ printf '%s\n' cleanup 
Oct 19 18:48:14 ++ trap -- ' 
Oct 19 18:48:14 cleanup' EXIT 
Oct 19 18:48:14 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build != *pytorch-win-* ]] 
Oct 19 18:48:14 ++ which sccache 
Oct 19 18:48:14 ++ sccache --stop-server 
Oct 19 18:48:14 Stopping sccache server... 
Oct 19 18:48:14 error: couldn't connect to server 
Oct 19 18:48:14 caused by: Connection refused (os error 111) 
Oct 19 18:48:14 ++ true 
Oct 19 18:48:14 ++ rm /var/lib/jenkins/sccache_error.log 
Oct 19 18:48:14 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Oct 19 18:48:14 ++ true 
Oct 19 18:48:14 ++ [[ pytorch-linux-xenial-py3-clang5-asan-build == *rocm* ]] 
Oct 19 18:48:14 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Oct 19 18:48:14 ++ SCCACHE_IDLE_TIMEOUT=1200 
Oct 19 18:48:14 ++ RUST_LOG=sccache::server=error 
Oct 19 18:48:14 ++ sccache --start-server 
Oct 19 18:48:14 Starting sccache server... 

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_ort_test1 (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Oct 19 19:39:37 ERROR: No matching distribution found for ort-nightly==1.5.0.dev202009182
Oct 19 19:39:34 Collecting pip 
Oct 19 19:39:34   Downloading pip-20.2.4-py2.py3-none-any.whl (1.5 MB) 
Oct 19 19:39:34 Installing collected packages: pip 
Oct 19 19:39:34   Attempting uninstall: pip 
Oct 19 19:39:34     Found existing installation: pip 20.2.3 
Oct 19 19:39:34     Uninstalling pip-20.2.3: 
Oct 19 19:39:35       Successfully uninstalled pip-20.2.3 
Oct 19 19:39:36 Successfully installed pip-20.2.4 
Oct 19 19:39:36 + pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.5.0.dev202009182 
Oct 19 19:39:37 ERROR: Could not find a version that satisfies the requirement ort-nightly==1.5.0.dev202009182 (from versions: 1.5.2.dev202010091, 1.5.2.dev202010121, 1.5.2.dev202010141, 1.5.2.dev202010151, 1.5.2.dev202010161) 
Oct 19 19:39:37 ERROR: No matching distribution found for ort-nightly==1.5.0.dev202009182 

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_ort_test2 (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Oct 19 19:39:07 ERROR: No matching distribution found for ort-nightly==1.5.0.dev202009182
Oct 19 19:39:04 Collecting pip 
Oct 19 19:39:04   Downloading pip-20.2.4-py2.py3-none-any.whl (1.5 MB) 
Oct 19 19:39:05 Installing collected packages: pip 
Oct 19 19:39:05   Attempting uninstall: pip 
Oct 19 19:39:05     Found existing installation: pip 20.2.3 
Oct 19 19:39:05     Uninstalling pip-20.2.3: 
Oct 19 19:39:06       Successfully uninstalled pip-20.2.3 
Oct 19 19:39:07 Successfully installed pip-20.2.4 
Oct 19 19:39:07 + pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.5.0.dev202009182 
Oct 19 19:39:07 ERROR: Could not find a version that satisfies the requirement ort-nightly==1.5.0.dev202009182 (from versions: 1.5.2.dev202010091, 1.5.2.dev202010121, 1.5.2.dev202010141, 1.5.2.dev202010151, 1.5.2.dev202010161) 
Oct 19 19:39:07 ERROR: No matching distribution found for ort-nightly==1.5.0.dev202009182 

1 job timed out:

  • pytorch_python_doc_build

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 21 times.

@codecov
Copy link

codecov bot commented Sep 19, 2020

Codecov Report

Merging #44997 into master will decrease coverage by 0.11%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #44997      +/-   ##
==========================================
- Coverage   68.31%   68.19%   -0.12%     
==========================================
  Files         410      410              
  Lines       53582    53232     -350     
==========================================
- Hits        36602    36303     -299     
+ Misses      16980    16929      -51     
Impacted Files Coverage Δ
torch/testing/_internal/te_utils.py 0.00% <0.00%> (-82.15%) ⬇️
torch/nn/modules/channelshuffle.py 0.00% <0.00%> (-63.64%) ⬇️
torch/quantization/fx/utils.py 50.47% <0.00%> (-11.43%) ⬇️
torch/quantization/fx/quantization_patterns.py 89.35% <0.00%> (-7.17%) ⬇️
torch/multiprocessing/spawn.py 79.26% <0.00%> (-6.60%) ⬇️
torch/fx/experimental/GraphManipulation.py 94.28% <0.00%> (-5.72%) ⬇️
torch/utils/data/_utils/worker.py 21.49% <0.00%> (-2.83%) ⬇️
torch/autograd/__init__.py 84.28% <0.00%> (-1.43%) ⬇️
torch/distributions/binomial.py 94.93% <0.00%> (-1.01%) ⬇️
torch/quantization/quantize_fx.py 91.42% <0.00%> (-0.76%) ⬇️
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dec61f9...3830f80. Read the comment docs.

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 22, 2020
inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4, 1000]
inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4]
if dtype != torch.bfloat16:
inputValues.append(1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with this? Error too large?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe switching to acc_t for internal computations will actually make it ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if backward for hardswish and hardsigmoid is tested anywhere? Aha, looks like it is tested in test_nn, but only with gradcheck and only for float64. Ok, whatever.

if (self_val < neg_three) {
return zero;
} else if (self_val <= three) {
return grad_val * ((self_val / three) + one_half);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR, but I wonder if computations here should be done in accscalar_t

inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4, 1000]
inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4]
if dtype != torch.bfloat16:
inputValues.append(1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe switching to acc_t for internal computations will actually make it ok?

inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4, 1000]
inputValues = [-1000, -4, -3, -2, 0, 2, 3, 4]
if dtype != torch.bfloat16:
inputValues.append(1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if backward for hardswish and hardsigmoid is tested anywhere? Aha, looks like it is tested in test_nn, but only with gradcheck and only for float64. Ok, whatever.

@zasdfgbnm
Copy link
Collaborator Author

ping @ngimel

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zasdfgbnm zasdfgbnm deleted the bfloat-activations2 branch October 26, 2020 23:06
@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 7731370.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants