Skip to content

torch.jit.script'd function very slow on first invocation on latest nightly #75903

@cpuhrsch

Description

@cpuhrsch

🐛 Describe the bug

It takes about a minute to run this function for the first time. It takes only a second if it's running on a version of PyTorch built from source.

To reproduce run the code in this gist. All credit to finding this to @Linux-cpp-lisp. I suspect this is an environment issue, i.e. an old version of that we ship as a nightly vs. a newer version I'm using locally.

Clearly this prevents the optimization from being useful.

Versions

The nightly in question here is 1.12.0.dev20220415-py3.9_cuda11.3_cudnn8.3.2_0

My local environment is

Collecting environment information...
PyTorch version: 1.12.0a0+git075974e
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 10.0.1
CMake version: version 3.22.3
Libc version: glibc-2.27

Python version: 3.9.5 (default, Jun  4 2021, 12:28:51)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration: GPU 0: A100-SXM4-40GB
Nvidia driver version: 450.119.03
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.812
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.3
[pip3] torch==1.12.0a0+git075974e
[pip3] torch2trt==0.3.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-include               2021.2.0           h06a4308_296
[conda] mkl-random                1.2.1                    pypi_0    pypi
[conda] mkl-service               2.3.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.0            py39h42c9631_2
[conda] mkl_random                1.2.2            py39h51133e4_0
[conda] mypy                      0.812              pyhd8ed1ab_0    conda-forge
[conda] mypy_extensions           0.4.3            py39h06a4308_0
[conda] numpy                     1.20.3                   pypi_0    pypi
[conda] numpy-base                1.20.2           py39hfae3a4d_0
[conda] torch                     1.11.0                   pypi_0    pypi

One notable difference is CUDA 11.1 locally vs 11.3 in the nightlies (note that the gist doesn't use CUDA).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions