RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

d4l3k · 2022-11-05T03:47:54Z

🐛 Describe the bug

torch.nn.functional.interpolate doesn't support bfloat16 with CUDA for both mode nearest and bilinear. A number of CV models require this such as mmseg.

  File "/home/rice/venvs/v/lib/python3.10/site-packages/mmseg/ops/wrappers.py", line 27, in resize
    return F.interpolate(input, size, scale_factor, mode, align_corners)
  File "/home/rice/venvs/v/lib/python3.10/site-packages/torch/nn/functional.py", line 3950, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16'

import torch
import torch.nn.functional as F

device = torch.device('cuda')
t = torch.ones(2, 3, 240, 320, device=device, dtype=torch.bfloat16)
F.interpolate(t, scale_factor=0.5, mode='nearest')
F.interpolate(t, scale_factor=0.5, mode='bilinear')

Versions

Collecting environment information...
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 12.2.0
Clang version: 14.0.6
CMake version: version 3.24.3
Libc version: glibc-2.36

Python version: 3.10.8 (main, Oct 13 2022, 21:13:48) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-6.0.2-arch1-1-x86_64-with-glibc2.36
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 2080
GPU 1: NVIDIA GeForce RTX 3070 Ti
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 520.56.06
cuDNN version: Probably one of the following:
/opt/cudnn6/lib64/libcudnn.so.6.0.21
/usr/lib/libcudnn.so.8.5.0
/usr/lib/libcudnn_adv_infer.so.8.5.0
/usr/lib/libcudnn_adv_train.so.8.5.0
/usr/lib/libcudnn_cnn_infer.so.8.5.0
/usr/lib/libcudnn_cnn_train.so.8.5.0
/usr/lib/libcudnn_ops_infer.so.8.5.0
/usr/lib/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.971
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] pytorch-memlab==0.2.4
[pip3] pytorch-msssim==0.2.1
[pip3] pytorch3d==0.7.1
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.14.0
[conda] Could not collect

cc @ngimel

The text was updated successfully, but these errors were encountered:

ngimel · 2022-11-10T18:00:56Z

cc @ptrblck
@d4l3k note that even if bf16 is enabled, backward for these ops will be very slow due to cuda limitations on bfloat16 atomic ops before version 11.8, I think at list bilinear uses atomic in the backward. You might be better off converting inputs to fp32, running upsampling and converting them back.

d4l3k · 2022-11-10T18:20:33Z

@ngimel that's what I ended up doing but when running with amp it incorrectly calls the bf16 variant of the ops instead of the fp32 version. Can we add the upsampling operator to the blacklist for bf16 amp?

ngimel · 2022-11-10T18:24:51Z

Sure, let's do it.

d4l3k · 2023-04-06T02:03:38Z

Looks like this should be fixed as of #95500

GuillaumeTong · 2023-07-05T09:01:46Z

Sorry for commenting on a closed issue, but this still error still seems to be occurring for me on PyTorch=2.0.1+cu118.
Given that this issue was resolved recently, which version of PyTorch should contain the fix? Or does this require building PyTorch from source?

EDIT:
My bad, the info was included in the link from the previous comment:
#95500 (comment)

It appears the feature did not make it to the 2.0.0 stable release, but is currently available in the nightly build, and should make it to the 2.1.0 stable release

d4l3k · 2023-10-29T02:17:15Z

@GuillaumeTong this is fixed in PT2.1

d4l3k added the module: bfloat16 label Nov 5, 2022

lezcano added the feature A request for a proper, new feature. label Nov 6, 2022

albanD added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 10, 2022

catboxanon mentioned this issue Feb 20, 2023

[Bug]: Black images when using specific settings - RTX 3080 AUTOMATIC1111/stable-diffusion-webui#6723

Closed

1 task

fredlarochelle mentioned this issue Feb 20, 2023

Native API returns: -996 invalid_kernel("uses-fp64-math") intel/intel-extension-for-pytorch#285

Closed

d4l3k closed this as completed Apr 6, 2023

weiji14 mentioned this issue May 29, 2023

🚧 add code for tinycd model developmentseed/chabud2023#10

Merged

3 tasks

dsuess mentioned this issue Jul 27, 2023

"upsample_nearest2d_out_frame" not implemented for 'BFloat16' #86679

Closed

maniac-0s mentioned this issue Apr 9, 2024

[Bug]: Highres-fix button crashes lllyasviel/stable-diffusion-webui-forge#655

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

d4l3k commented Nov 5, 2022 •

edited by pytorch-bot bot

ngimel commented Nov 10, 2022

d4l3k commented Nov 10, 2022

ngimel commented Nov 10, 2022

d4l3k commented Apr 6, 2023

GuillaumeTong commented Jul 5, 2023 •

edited

d4l3k commented Oct 29, 2023

RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'BFloat16' #88536

Comments

d4l3k commented Nov 5, 2022 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

ngimel commented Nov 10, 2022

d4l3k commented Nov 10, 2022

ngimel commented Nov 10, 2022

d4l3k commented Apr 6, 2023

GuillaumeTong commented Jul 5, 2023 • edited

d4l3k commented Oct 29, 2023

d4l3k commented Nov 5, 2022 •

edited by pytorch-bot bot

GuillaumeTong commented Jul 5, 2023 •

edited