Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

Closed
d4l3k opened this issue Nov 5, 2022 · 6 comments
Closed
Labels
feature A request for a proper, new feature. module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@d4l3k
Copy link
Collaborator

d4l3k commented Nov 5, 2022

馃悰 Describe the bug

torch.nn.functional.interpolate doesn't support bfloat16 with CUDA for both mode nearest and bilinear. A number of CV models require this such as mmseg.

  File "/home/rice/venvs/v/lib/python3.10/site-packages/mmseg/ops/wrappers.py", line 27, in resize
    return F.interpolate(input, size, scale_factor, mode, align_corners)
  File "/home/rice/venvs/v/lib/python3.10/site-packages/torch/nn/functional.py", line 3950, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16'
import torch
import torch.nn.functional as F

device = torch.device('cuda')
t = torch.ones(2, 3, 240, 320, device=device, dtype=torch.bfloat16)
F.interpolate(t, scale_factor=0.5, mode='nearest')
F.interpolate(t, scale_factor=0.5, mode='bilinear')

Versions

Collecting environment information...
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 12.2.0
Clang version: 14.0.6
CMake version: version 3.24.3
Libc version: glibc-2.36

Python version: 3.10.8 (main, Oct 13 2022, 21:13:48) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-6.0.2-arch1-1-x86_64-with-glibc2.36
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2080
GPU 1: NVIDIA GeForce RTX 3070 Ti
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 520.56.06
cuDNN version: Probably one of the following:
/opt/cudnn6/lib64/libcudnn.so.6.0.21
/usr/lib/libcudnn.so.8.5.0
/usr/lib/libcudnn_adv_infer.so.8.5.0
/usr/lib/libcudnn_adv_train.so.8.5.0
/usr/lib/libcudnn_cnn_infer.so.8.5.0
/usr/lib/libcudnn_cnn_train.so.8.5.0
/usr/lib/libcudnn_ops_infer.so.8.5.0
/usr/lib/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.971
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] pytorch-memlab==0.2.4
[pip3] pytorch-msssim==0.2.1
[pip3] pytorch3d==0.7.1
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.14.0
[conda] Could not collect

cc @ngimel

@lezcano lezcano added the feature A request for a proper, new feature. label Nov 6, 2022
@albanD albanD added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 10, 2022
@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2022

cc @ptrblck
@d4l3k note that even if bf16 is enabled, backward for these ops will be very slow due to cuda limitations on bfloat16 atomic ops before version 11.8, I think at list bilinear uses atomic in the backward. You might be better off converting inputs to fp32, running upsampling and converting them back.

@d4l3k
Copy link
Collaborator Author

d4l3k commented Nov 10, 2022

@ngimel that's what I ended up doing but when running with amp it incorrectly calls the bf16 variant of the ops instead of the fp32 version. Can we add the upsampling operator to the blacklist for bf16 amp?

@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2022

Sure, let's do it.

@d4l3k
Copy link
Collaborator Author

d4l3k commented Apr 6, 2023

Looks like this should be fixed as of #95500

@GuillaumeTong
Copy link

GuillaumeTong commented Jul 5, 2023

Sorry for commenting on a closed issue, but this still error still seems to be occurring for me on PyTorch=2.0.1+cu118.
Given that this issue was resolved recently, which version of PyTorch should contain the fix? Or does this require building PyTorch from source?

EDIT:
My bad, the info was included in the link from the previous comment:
#95500 (comment)

It appears the feature did not make it to the 2.0.0 stable release, but is currently available in the nightly build, and should make it to the 2.1.0 stable release

@d4l3k
Copy link
Collaborator Author

d4l3k commented Oct 29, 2023

@GuillaumeTong this is fixed in PT2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: bfloat16 module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants