BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

imaginary-person · 2021-05-06T21:45:41Z

🐛 Bug in CUDA?

As per #50442, Bfloat16 CUDA GEMM ops are supposed to be supported on Nvidia SM_53 GPUs & above.
However, with an Nvidia P100 (SM_60 GPU) & CUDA 11.3, such ops produce an error stating that they're unsupported.
Please confirm which Nvidia Compute-capability GPUs support them. Thanks!

To Reproduce

Steps to reproduce the behavior:

Clone the PyTorch repo & run python test_ops.py in pytorch/test.
Some tests, such as test_supported_backward_addbmm_cuda_bfloat16 would fail with the following error:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb,
(int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c,
CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The corresponding test for addmm, baddbmm & bmm also failed with the same error.
For addmm, this test had even passed on SM_52 with CUDA 11.1.
The rest 3 are only enabled in common_methods_invocations.py, if SM version is above 52.

Expected behavior

These BFloat16 CUDA GEMM tests should pass on SM_60 GPUs (GPUs above SM_52), as they do with SM_75 (Tesla T4) on CI.
FWIW, addmm's corresponding test passes on SM_52 with CUDA 11.1, while baddbmm's fails on SM_52.
However, both are failing on Nvidia P100 with CUDA 11.3.

Environment

PyTorch version: 1.9.0a0+gitebd2c0a
Is debug build: False
CUDA used to build PyTorch: 11.3

OS: Ubuntu 18.04.1 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA Tesla P100-PCIE-12GB
Nvidia driver version: 465.19.01
cuDNN version: Could not collect

Additional context

cc @csarofeen @ptrblck @xwang233 @ngimel @zasdfgbnm

cc @mruberry @anjali411, as some OpInfos check for Nvidia SM_53 or above, so they might've to be modified based on updated info.
cc @ngimel

The text was updated successfully, but these errors were encountered:

ngimel added module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 7, 2021

imaginary-person mentioned this issue Jun 22, 2021

TestCommonCUDA.test_dtypes_matmul_cuda fails #60443

Closed

ngimel added the module: cublas Problem related to cublas support label Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

imaginary-person commented May 6, 2021 •

edited by pytorch-probot bot

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

Comments

imaginary-person commented May 6, 2021 • edited by pytorch-probot bot

🐛 Bug in CUDA?

To Reproduce

Expected behavior

Environment

Additional context

imaginary-person commented May 6, 2021 •

edited by pytorch-probot bot