Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

use bfloat16 on nvidia V100 GPU #124996

Closed
bugm opened this issue Apr 26, 2024 · 2 comments
Closed

use bfloat16 on nvidia V100 GPU #124996

bugm opened this issue Apr 26, 2024 · 2 comments
Labels
module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@bugm
Copy link

bugm commented Apr 26, 2024

馃悰 Describe the bug

Hello!
It is said that bfloat16 is only supported on GPUs with compute capability of at least 8.0, which means nvidia V100 should not support bfloat16.

But I have test the code below on a V100 machine and run successfully.

import torch
a = torch.randn(3,3,dtype=torch.bfloat16,device="cuda")
b = torch.randn(3,3,dtype=torch.bfloat16,device="cuda")
c = torch.matmul(a,b)
print(c.dtype)
print(c.device)
print(torch.cuda.is_bf16_supported())

While the initialization and operation success, but the torch.cuda.is_bf16_supported() here return False

torch.bfloat16
cuda:0
False

So I what to know what is the situation her? Thanks!

Versions

PyTorch version: 2.2.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.12.2 | packaged by Anaconda, Inc. | (main, Feb 27 2024, 17:35:02) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 535.154.05

cc @ptrblck

@malfet malfet added module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general labels Apr 26, 2024
@malfet
Copy link
Contributor

malfet commented Apr 26, 2024

I think distinction here is "supported by software"(i.e. emulation) vs "supported by hardware". torch.cuda.is_bf16_supported() tells that your GPU hardware does not have native bf16 instructions, but software can easily emulate some bf16 operations by shifting input values to the left and then running computation in float32, but it will be slower

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 26, 2024
@bugm
Copy link
Author

bugm commented Apr 29, 2024

I think distinction here is "supported by software"(i.e. emulation) vs "supported by hardware". torch.cuda.is_bf16_supported() tells that your GPU hardware does not have native bf16 instructions, but software can easily emulate some bf16 operations by shifting input values to the left and then running computation in float32, but it will be slower

Thanks for you answer! I have tried to use a bfloat16 mixed precision training on a V100 GPU, which shows the time cost is almost the same as full fp32 training(even a little slower).

@bugm bugm closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants