Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFloat16 CUDA GEMM ops unsupported on Nvidia P100 (SM_60) on CUDA 11.3 #57773

Open
imaginary-person opened this issue May 6, 2021 · 0 comments
Labels
module: bfloat16 module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@imaginary-person
Copy link
Contributor

imaginary-person commented May 6, 2021

馃悰 Bug in CUDA?

As per #50442, Bfloat16 CUDA GEMM ops are supposed to be supported on Nvidia SM_53 GPUs & above.
However, with an Nvidia P100 (SM_60 GPU) & CUDA 11.3, such ops produce an error stating that they're unsupported.
Please confirm which Nvidia Compute-capability GPUs support them. Thanks!

To Reproduce

Steps to reproduce the behavior:

  1. Clone the PyTorch repo & run python test_ops.py in pytorch/test.
  2. Some tests, such as test_supported_backward_addbmm_cuda_bfloat16 would fail with the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb,
(int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c,
CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The corresponding test for addmm, baddbmm & bmm also failed with the same error.
For addmm, this test had even passed on SM_52 with CUDA 11.1.
The rest 3 are only enabled in common_methods_invocations.py, if SM version is above 52.

Expected behavior

These BFloat16 CUDA GEMM tests should pass on SM_60 GPUs (GPUs above SM_52), as they do with SM_75 (Tesla T4) on CI.
FWIW, addmm's corresponding test passes on SM_52 with CUDA 11.1, while baddbmm's fails on SM_52.
However, both are failing on Nvidia P100 with CUDA 11.3.

Environment

PyTorch version: 1.9.0a0+gitebd2c0a
Is debug build: False
CUDA used to build PyTorch: 11.3

OS: Ubuntu 18.04.1 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA Tesla P100-PCIE-12GB
Nvidia driver version: 465.19.01
cuDNN version: Could not collect

Additional context

cc @csarofeen @ptrblck @xwang233 @ngimel @zasdfgbnm

cc @mruberry @anjali411, as some OpInfos check for Nvidia SM_53 or above, so they might've to be modified based on updated info.
cc @ngimel

@ngimel ngimel added module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 7, 2021
@ngimel ngimel added the module: cublas Problem related to cublas support label Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: bfloat16 module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants