Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA fuser] fails to run basic InceptionV3 #64062

Closed
ssnl opened this issue Aug 26, 2021 · 0 comments
Closed

[CUDA fuser] fails to run basic InceptionV3 #64062

ssnl opened this issue Aug 26, 2021 · 0 comments
Assignees
Labels
high priority oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Milestone

Comments

@ssnl
Copy link
Collaborator

ssnl commented Aug 26, 2021

馃悰 Bug

In PyTorch 1.9, the CUDA fuser addition makes it impossible to run (part of) the NVIDIA's InceptionV3 TorchScript model (ckpt url). After loading, the model works fine when running directly on an image (calling model(x)), but using a submodule (calling model.layers(x) fails

  • on PyTorch 1.9, with RuntimeError: MALFORMED INPUT: lanes don't match.
  • on latest nightly (1.10.0.dev20210826) at the time of writing, with either the above RuntimeError or segfault.

A couple notes:

  1. By inspecting model.code and model.layers.code, I think they are very standard code without anything too fancy. Is the fuser trying to fuse across the boundary (of self.layers call in network.forward)?
  2. Because this still happens on latest nightly, it is different with the fixed issue BN+ReLU cause "RuntimeError: MALFORMED INPUT: bad dtype in CompareSelect" error in fp16, traced module聽#61382
  3. This has broken packages that depend on PyTorch JIT, e.g., at RuntimeError: MALFORMED INPUT: lanes don't match聽GaParmar/clean-fid#5.

To Reproduce

import torch 
import os
import urllib.request
import shutil

inception_url = "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt"

if not os.path.exists('/tmp/inception-2015-12-05.pt'):
    # download the file
    with urllib.request.urlopen(inception_url) as response, open(inception_path, 'wb') as f:
        shutil.copyfileobj(response, f)

net = torch.jit.load('/tmp/inception-2015-12-05.pt').eval().cuda()

x = torch.randn(1, 3, 299, 299, device='cuda')

print('PyTorch version', torch.__version__)
print('full net(x)', net(x).sum())
torch._C._jit_override_can_fuse_on_gpu(False)
print('net.layers(x) w/o fuser', net.layers(x).sum())
torch._C._jit_override_can_fuse_on_gpu(True)
print('net.layers(x) w/ fuser', net.layers(x).sum())
  • PyTorch 1.8 output

    PyTorch version 1.8.1
    full net(x) tensor(1., device='cuda:0')
    net.layers(x) w/o fuser tensor(432.4347, device='cuda:0')
    net.layers(x) w/ fuser tensor(432.4347, device='cuda:0')
    
  • PyTorch 1.9 output

    PyTorch version 1.9.0
    XXX/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/c10/core/TensorImpl.h:1156.)
    return forward_call(*input, **kwargs)
    full net(x) tensor(1., device='cuda:0')
    net.layers(x) w/o fuser tensor(423.2352, device='cuda:0')
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-1-ca859f30cada> in <module>
        21 print('net.layers(x) w/o fuser', net.layers(x).sum())
        22 torch._C._jit_override_can_fuse_on_gpu(True)
    ---> 23 print('net.layers(x) w/ fuser', net.layers(x).sum())
    
    XXX/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
    1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
    1052         # Do not call functions when jit is used
    1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    RuntimeError: MALFORMED INPUT: lanes dont match
    
  • PyTorch nightly 1.10.0.dev20210826

    • Output 1 (RuntimeError)

      PyTorch version 1.10.0.dev20210826
      full net(x) tensor(1., device='cuda:0')
      net.layers(x) w/o fuser tensor(416.9673, device='cuda:0')
      ---------------------------------------------------------------------------
      RuntimeError                              Traceback (most recent call last)
      <ipython-input-1-b52b7aee928d> in <module>
          21 print('net.layers(x) w/o fuser', net.layers(x).sum())
          22 torch._C._jit_override_can_fuse_on_gpu(True)
      ---> 23 print('net.layers(x) w/ fuser', net.layers(x).sum())
          24
      
      XXX/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
      1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
      1101                 or _global_forward_hooks or _global_forward_pre_hooks):
      -> 1102             return forward_call(*input, **kwargs)
      1103         # Do not call functions when jit is used
      1104         full_backward_hooks, non_full_backward_hooks = [], []
      
      RuntimeError: MALFORMED INPUT: lanes dont match
      
    • Output 2 (segfault)

      PyTorch version 1.10.0.dev20210826
      full net(x) tensor(1., device='cuda:0')
      net.layers(x) w/o fuser tensor(408.5045, device='cuda:0')
      [1]    107222 segmentation fault  ipython
      

Expected behavior

I expect the code to work as it did in PyTorch 1.8.1.

Environment

PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.19.6
Libc version: glibc-2.27

Python version: 3.8.8 (default, Apr 13 2021, 19:58:26)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-153-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 3080
GPU 1: GeForce RTX 3080

Nvidia driver version: 460.91.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchfile==0.1.0
[pip3] torchvision==0.9.1
[conda] _tflow_select             2.3.0                       mkl
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.1               h6406543_8    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2020.2                      256
[conda] mkl-include               2020.2                      256
[conda] mkl-service               2.3.0            py38he904b0f_0
[conda] mkl_fft                   1.3.0            py38h54f3939_0
[conda] mkl_random                1.1.1            py38h0573a6f_0
[conda] numpy                     1.19.2           py38h54aff64_0
[conda] numpy-base                1.19.2           py38hfa32c7d_0
[conda] pytorch                   1.8.1           py3.8_cuda11.1_cudnn8.0.5_0    pytorch
[conda] tensorflow                2.4.1           mkl_py38hb2083e0_0
[conda] tensorflow-base           2.4.1           mkl_py38h43e0292_0
[conda] torchaudio                0.8.1                      py38    pytorch
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchvision               0.9.1                py38_cu111    pytorch

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ngimel

@iramazanli iramazanli added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 27, 2021
@ngimel ngimel added oncall: jit Add this issue/PR to JIT oncall triage queue high priority and removed module: cuda Related to torch.cuda, and CUDA support in general labels Aug 27, 2021
@github-actions github-actions bot added this to Need triage in JIT Triage Aug 27, 2021
@ngimel ngimel added this to the 1.10.0 milestone Aug 27, 2021
@bertmaher bertmaher self-assigned this Aug 27, 2021
bertmaher added a commit that referenced this issue Aug 28, 2021
JIT Triage automation moved this from Need triage to Done Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority oncall: jit Add this issue/PR to JIT oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
JIT Triage
  
Done
Development

No branches or pull requests

4 participants