torch.jit.script'd function very slow on first invocation on latest nightly

### 🐛 Describe the bug

It takes about a minute to run this function for the first time. It takes only a second if it's running on a version of PyTorch built from source.

To reproduce run the code in [this gist](https://gist.github.com/Linux-cpp-lisp/9ffccb39f5f0f192a0c07eb6645d32d3). All credit to finding this to @Linux-cpp-lisp. I suspect this is an environment issue, i.e. an old version of that we ship as a nightly vs. a newer version I'm using locally.

Clearly this prevents the optimization from being useful.

### Versions

The nightly in question here is 1.12.0.dev20220415-py3.9_cuda11.3_cudnn8.3.2_0

My local environment is 

```
Collecting environment information...
PyTorch version: 1.12.0a0+git075974e
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 10.0.1
CMake version: version 3.22.3
Libc version: glibc-2.27

Python version: 3.9.5 (default, Jun  4 2021, 12:28:51)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration: GPU 0: A100-SXM4-40GB
Nvidia driver version: 450.119.03
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.812
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.3
[pip3] torch==1.12.0a0+git075974e
[pip3] torch2trt==0.3.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-include               2021.2.0           h06a4308_296
[conda] mkl-random                1.2.1                    pypi_0    pypi
[conda] mkl-service               2.3.0                    pypi_0    pypi
[conda] mkl_fft                   1.3.0            py39h42c9631_2
[conda] mkl_random                1.2.2            py39h51133e4_0
[conda] mypy                      0.812              pyhd8ed1ab_0    conda-forge
[conda] mypy_extensions           0.4.3            py39h06a4308_0
[conda] numpy                     1.20.3                   pypi_0    pypi
[conda] numpy-base                1.20.2           py39hfae3a4d_0
[conda] torch                     1.11.0                   pypi_0    pypi
```

One notable difference is CUDA 11.1 locally vs 11.3 in the nightlies (note that the gist doesn't use CUDA).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.jit.script'd function very slow on first invocation on latest nightly #75903

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.jit.script'd function very slow on first invocation on latest nightly #75903

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions