[CUDA fuser] fails to run basic InceptionV3 #64062

ssnl · 2021-08-26T23:13:54Z

🐛 Bug

In PyTorch 1.9, the CUDA fuser addition makes it impossible to run (part of) the NVIDIA's InceptionV3 TorchScript model (ckpt url). After loading, the model works fine when running directly on an image (calling model(x)), but using a submodule (calling model.layers(x) fails

on PyTorch 1.9, with RuntimeError: MALFORMED INPUT: lanes don't match.
on latest nightly (1.10.0.dev20210826) at the time of writing, with either the above RuntimeError or segfault.

A couple notes:

By inspecting model.code and model.layers.code, I think they are very standard code without anything too fancy. Is the fuser trying to fuse across the boundary (of self.layers call in network.forward)?
Because this still happens on latest nightly, it is different with the fixed issue BN+ReLU cause "RuntimeError: MALFORMED INPUT: bad dtype in CompareSelect" error in fp16, traced module #61382
This has broken packages that depend on PyTorch JIT, e.g., at RuntimeError: MALFORMED INPUT: lanes don't match GaParmar/clean-fid#5.

To Reproduce

import torch 
import os
import urllib.request
import shutil

inception_url = "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt"

if not os.path.exists('/tmp/inception-2015-12-05.pt'):
    # download the file
    with urllib.request.urlopen(inception_url) as response, open(inception_path, 'wb') as f:
        shutil.copyfileobj(response, f)

net = torch.jit.load('/tmp/inception-2015-12-05.pt').eval().cuda()

x = torch.randn(1, 3, 299, 299, device='cuda')

print('PyTorch version', torch.__version__)
print('full net(x)', net(x).sum())
torch._C._jit_override_can_fuse_on_gpu(False)
print('net.layers(x) w/o fuser', net.layers(x).sum())
torch._C._jit_override_can_fuse_on_gpu(True)
print('net.layers(x) w/ fuser', net.layers(x).sum())

PyTorch 1.8 output

PyTorch version 1.8.1
full net(x) tensor(1., device='cuda:0')
net.layers(x) w/o fuser tensor(432.4347, device='cuda:0')
net.layers(x) w/ fuser tensor(432.4347, device='cuda:0')

PyTorch 1.9 output

PyTorch version 1.9.0
XXX/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/c10/core/TensorImpl.h:1156.)
return forward_call(*input, **kwargs)
full net(x) tensor(1., device='cuda:0')
net.layers(x) w/o fuser tensor(423.2352, device='cuda:0')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-ca859f30cada> in <module>
    21 print('net.layers(x) w/o fuser', net.layers(x).sum())
    22 torch._C._jit_override_can_fuse_on_gpu(True)
---> 23 print('net.layers(x) w/ fuser', net.layers(x).sum())

XXX/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
1052         # Do not call functions when jit is used
1053         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: MALFORMED INPUT: lanes dont match

PyTorch nightly 1.10.0.dev20210826

Output 1 (RuntimeError)

PyTorch version 1.10.0.dev20210826
full net(x) tensor(1., device='cuda:0')
net.layers(x) w/o fuser tensor(416.9673, device='cuda:0')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-b52b7aee928d> in <module>
    21 print('net.layers(x) w/o fuser', net.layers(x).sum())
    22 torch._C._jit_override_can_fuse_on_gpu(True)
---> 23 print('net.layers(x) w/ fuser', net.layers(x).sum())
    24

XXX/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
1103         # Do not call functions when jit is used
1104         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: MALFORMED INPUT: lanes dont match

Output 2 (segfault)

PyTorch version 1.10.0.dev20210826
full net(x) tensor(1., device='cuda:0')
net.layers(x) w/o fuser tensor(408.5045, device='cuda:0')
[1]    107222 segmentation fault  ipython

Expected behavior

I expect the code to work as it did in PyTorch 1.8.1.

Environment

PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.19.6
Libc version: glibc-2.27

Python version: 3.8.8 (default, Apr 13 2021, 19:58:26)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-153-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 3080
GPU 1: GeForce RTX 3080

Nvidia driver version: 460.91.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchfile==0.1.0
[pip3] torchvision==0.9.1
[conda] _tflow_select             2.3.0                       mkl
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.1               h6406543_8    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2020.2                      256
[conda] mkl-include               2020.2                      256
[conda] mkl-service               2.3.0            py38he904b0f_0
[conda] mkl_fft                   1.3.0            py38h54f3939_0
[conda] mkl_random                1.1.1            py38h0573a6f_0
[conda] numpy                     1.19.2           py38h54aff64_0
[conda] numpy-base                1.19.2           py38hfa32c7d_0
[conda] pytorch                   1.8.1           py3.8_cuda11.1_cudnn8.0.5_0    pytorch
[conda] tensorflow                2.4.1           mkl_py38hb2083e0_0
[conda] tensorflow-base           2.4.1           mkl_py38h43e0292_0
[conda] torchaudio                0.8.1                      py38    pytorch
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchvision               0.9.1                py38_cu111    pytorch

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ngimel

The text was updated successfully, but these errors were encountered:

Fixes #64062 [ghstack-poisoned]

iramazanli added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 27, 2021

ngimel added oncall: jit Add this issue/PR to JIT oncall triage queue high priority and removed module: cuda Related to torch.cuda, and CUDA support in general labels Aug 27, 2021

pytorch-probot bot added the triage review label Aug 27, 2021

github-actions bot added this to Need triage in JIT Triage Aug 27, 2021

ngimel removed the triage review label Aug 27, 2021

ngimel added this to the 1.10.0 milestone Aug 27, 2021

bertmaher self-assigned this Aug 27, 2021

bertmaher mentioned this issue Aug 27, 2021

[nnc] Fix batchnorm implementation #64112

Closed

bertmaher added a commit that referenced this issue Aug 28, 2021

Add unit tests on "[nnc] Fix batchnorm implementation"

632440e

Fixes #64062 [ghstack-poisoned]

GaParmar mentioned this issue Aug 28, 2021

Resolution in image folders GaParmar/clean-fid#2

Closed

facebook-github-bot closed this as completed in 4f969db Aug 29, 2021

JIT Triage automation moved this from Need triage to Done Aug 29, 2021

Ldhlwh mentioned this issue May 18, 2024

预训练时报错 Ldhlwh/DFMGAN#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA fuser] fails to run basic InceptionV3 #64062

[CUDA fuser] fails to run basic InceptionV3 #64062

ssnl commented Aug 26, 2021 •

edited by pytorch-probot bot

[CUDA fuser] fails to run basic InceptionV3 #64062

[CUDA fuser] fails to run basic InceptionV3 #64062

Comments

ssnl commented Aug 26, 2021 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Environment

ssnl commented Aug 26, 2021 •

edited by pytorch-probot bot