Enable BFloat support for gemms on arch other than ampere #50442

zasdfgbnm · 2021-01-12T19:58:07Z

Fixes #{issue number}

facebook-github-bot · 2021-01-12T19:58:21Z

💊 CI failures summary and remediations

As of commit c33a608 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda11.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: "Simulate error" does not match "grad can be implicitly created only for scalar outputs"


Traceback (most recent call last):
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 290, in instantiated_test
    result = test_fn(self, *args)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 687, in only_fn
    return fn(slf, device, *args, **kwargs)
  File "test_autograd.py", line 6652, in test_reentrant_parent_error_on_cpu
    self._test_reentrant_parent_error_on_cpu(device)
  File "test_autograd.py", line 6638, in _test_reentrant_parent_error_on_cpu
    torch.autograd.backward([t5.sum(), t7.sum()])
AssertionError: "Simulate error" does not match "grad can be implicitly created only for scalar outputs"

----------------------------------------------------------------------
Ran 2794 tests in 2826.455s

FAILED (failures=1, skipped=23, expected failures=1)

Generating XML reports...
Generated XML report: test-reports\python-unittest\TEST-TestAutograd-20210125173827.xml
Generated XML report: test-reports\python-unittest\TEST-TestAutogradComplex-20210125173827.xml
Generated XML report: test-reports\python-unittest\TEST-TestAutogradDeviceTypeCPU-20210125173827.xml

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

…torch into ci-all/matmul-bf16-non-ampere

…l-bf16-non-ampere

zasdfgbnm · 2021-01-14T18:19:47Z

This should be ready. Test failures are unrelated.

ngimel · 2021-01-18T18:38:15Z

aten/src/ATen/cuda/CUDABlas.cpp

-  } else {
-    TORCH_CHECK(false, "BFloat16 gemm in CUDA requires Ampere or later GPU");
-  }
+  TORCH_CUDABLAS_CHECK(cublasGemmEx(


setting and resetting cublas MathMode is not required if you specify CUBLAS_GEMM_DFALT_TENSOR_OP?

According to https://docs.nvidia.com/cuda/cublas/index.html#cublasmath_t

CUBLAS_DEFAULT_MATH This is the default and highest-performance mode that uses compute and intermediate storage precisions with at least the same number of mantissa and exponent bits as requested. Tensor Cores will be used whenever possible. CUBLAS_TENSOR_OP_MATH This mode is deprecated and will be removed in a future release. Allows the library to use Tensor Core operations whenever possible. For single precision GEMM routines cuBLAS will use the CUBLAS_COMPUTE_32F_FAST_16F compute type.

test/test_linalg.py

ngimel · 2021-01-18T18:50:13Z

test/test_linalg.py

-            b1 = torch.randn(num_batches, M, N, device=device).to(dtype)
-            b2 = torch.randn(num_batches, N, O, device=device).to(dtype)
-            self.assertRaisesRegex(RuntimeError, "type|Type|not implemented|Ampere", lambda: torch.bmm(b1, b2))
+            if not is_cuda_bfloat:


is_supported=False, is_cuda_bfloat=False is an impossible situation?

Some ops are supported on SM52, and some are not. I don't think it worth the maintenance effort to write a clear list on which is supported on which SM. So what I implemented here is:

SM >= 53 ---> supported SM < 53 ---> undefined behavior

torch/testing/_internal/common_cuda.py

aten/src/ATen/cuda/CUDABlas.cpp

test/test_linalg.py

…l-bf16-non-ampere

codecov · 2021-01-20T01:32:33Z

Codecov Report

Merging #50442 (79a68e3) into master (8e9ed27) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50442      +/-   ##
==========================================
- Coverage   81.00%   81.00%   -0.01%     
==========================================
  Files        1916     1916              
  Lines      209481   209484       +3     
==========================================
+ Hits       169690   169692       +2     
- Misses      39791    39792       +1

zasdfgbnm · 2021-01-20T01:33:46Z

@ngimel @mruberry I think I have resolved all review comments, and all tests pass.

mruberry

Cool! Thanks @zasdfgbnm!

Would you just rebase this? Sorry PyTorch is especially popular these days.

zasdfgbnm · 2021-01-22T00:33:12Z

@mruberry rebased

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-01-25T13:28:23Z

Internal builds are failing with:

    from torch.testing._internal.common_cuda import _get_torch_cuda_version
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/caffe2/fb/high_perf_models/pytorch/torchscript/test/test_ir_bench#binary,link-tree/torch/testing/_internal/common_cuda.py", line 18, in <module>
    CUDA11OrLater = torch.version.cuda and float(torch.version.cuda) >= 11
ValueError: could not convert string to float: '9.2.0'

We typically use LooseVersion for version comparisons. See

pytorch/torch/testing/_internal/common_methods_invocations.py

Line 1800 in 8690819

active_if=LooseVersion(scipy.__version__) < "1.4.0"),

for an example.

…l-bf16-non-ampere

zasdfgbnm · 2021-01-25T17:27:09Z

@mruberry fixed

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-26T19:14:38Z

@mruberry merged this pull request in b822aba.

zasdfgbnm added 2 commits January 12, 2021 10:18

Fix TestOpInfoCUDA.test_unsupported_dtypes_addmm_cuda_bfloat16 on ampere

65ae5ec

Enable BFloat support for gemms on arch other than ampere

288a746

zasdfgbnm changed the title ~~Enable BFloat support for gemms on arch other than ampere~~ [WIP]Enable BFloat support for gemms on arch other than ampere Jan 12, 2021

facebook-github-bot added the cla signed label Jan 12, 2021

pytorchbot added the open source label Jan 12, 2021

zasdfgbnm added 11 commits January 12, 2021 12:09

save

9c4805c

save

5898d69

save

d8c04ed

save

d988cac

Update test_linalg.py

a02b7e9

Update test_linalg.py

f615093

Merge branch 'master' into ci-all/matmul-bf16-non-ampere

934c445

save

b4aaecc

Merge branch 'ci-all/matmul-bf16-non-ampere' of github.com:pytorch/py…

89b56f6

…torch into ci-all/matmul-bf16-non-ampere

Merge branch 'master' of github.com:pytorch/pytorch into ci-all/matmu…

1f0fa6b

…l-bf16-non-ampere

save

3fcf0b4

zasdfgbnm changed the title ~~[WIP]Enable BFloat support for gemms on arch other than ampere~~ Enable BFloat support for gemms on arch other than ampere Jan 14, 2021

zasdfgbnm requested review from ngimel and mruberry January 14, 2021 18:19

mrshenli added module: bfloat16 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 15, 2021

ngimel reviewed Jan 18, 2021

View reviewed changes

mruberry reviewed Jan 19, 2021

View reviewed changes

aten/src/ATen/cuda/CUDABlas.cpp Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

zasdfgbnm added 2 commits January 19, 2021 08:41

save

8319601

save

cadc926

Merge branch 'master' of github.com:pytorch/pytorch into ci-all/matmu…

c40aa3b

…l-bf16-non-ampere

mruberry approved these changes Jan 22, 2021

View reviewed changes

zasdfgbnm added 2 commits January 21, 2021 16:31

Merge branch 'master' into ci-all/matmul-bf16-non-ampere

11e1c1e

Update test_linalg.py

79a68e3

facebook-github-bot reviewed Jan 25, 2021

View reviewed changes

zasdfgbnm added 2 commits January 25, 2021 08:54

Merge branch 'master' of github.com:pytorch/pytorch into ci-all/matmu…

2da17c9

…l-bf16-non-ampere

fix

c33a608

facebook-github-bot reviewed Jan 25, 2021

View reviewed changes

facebook-github-bot closed this in b822aba Jan 26, 2021

facebook-github-bot added the Merged label Jan 26, 2021

zasdfgbnm deleted the ci-all/matmul-bf16-non-ampere branch January 26, 2021 21:50

This was referenced May 6, 2021

Rollup: improve bfloat16 cuda support #57707

Closed

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable BFloat support for gemms on arch other than ampere #50442

Enable BFloat support for gemms on arch other than ampere #50442

zasdfgbnm commented Jan 12, 2021

facebook-github-bot commented Jan 12, 2021 •

edited

zasdfgbnm commented Jan 14, 2021

ngimel Jan 18, 2021

zasdfgbnm Jan 19, 2021

ngimel Jan 18, 2021

zasdfgbnm Jan 19, 2021 •

edited

codecov bot commented Jan 20, 2021 •

edited

zasdfgbnm commented Jan 20, 2021

mruberry left a comment

zasdfgbnm commented Jan 22, 2021

facebook-github-bot left a comment

mruberry commented Jan 25, 2021

zasdfgbnm commented Jan 25, 2021

facebook-github-bot left a comment

facebook-github-bot commented Jan 26, 2021

Enable BFloat support for gemms on arch other than ampere #50442

Enable BFloat support for gemms on arch other than ampere #50442

Conversation

zasdfgbnm commented Jan 12, 2021

facebook-github-bot commented Jan 12, 2021 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda11.1_test2 (1/1)

zasdfgbnm commented Jan 14, 2021

ngimel Jan 18, 2021

Choose a reason for hiding this comment

zasdfgbnm Jan 19, 2021

Choose a reason for hiding this comment

ngimel Jan 18, 2021

Choose a reason for hiding this comment

zasdfgbnm Jan 19, 2021 • edited

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2021 • edited

Codecov Report

zasdfgbnm commented Jan 20, 2021

mruberry left a comment

Choose a reason for hiding this comment

zasdfgbnm commented Jan 22, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

mruberry commented Jan 25, 2021

zasdfgbnm commented Jan 25, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 26, 2021

facebook-github-bot commented Jan 12, 2021 •

edited

zasdfgbnm Jan 19, 2021 •

edited

codecov bot commented Jan 20, 2021 •

edited