-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Open
Labels
NNCmodule: nvfuseroncall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queue
Description
🐛 Describe the bug
It takes about a minute to run this function for the first time. It takes only a second if it's running on a version of PyTorch built from source.
To reproduce run the code in this gist. All credit to finding this to @Linux-cpp-lisp. I suspect this is an environment issue, i.e. an old version of that we ship as a nightly vs. a newer version I'm using locally.
Clearly this prevents the optimization from being useful.
Versions
The nightly in question here is 1.12.0.dev20220415-py3.9_cuda11.3_cudnn8.3.2_0
My local environment is
Collecting environment information...
PyTorch version: 1.12.0a0+git075974e
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 10.0.1
CMake version: version 3.22.3
Libc version: glibc-2.27
Python version: 3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration: GPU 0: A100-SXM4-40GB
Nvidia driver version: 450.119.03
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.812
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.3
[pip3] torch==1.12.0a0+git075974e
[pip3] torch2trt==0.3.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-include 2021.2.0 h06a4308_296
[conda] mkl-random 1.2.1 pypi_0 pypi
[conda] mkl-service 2.3.0 pypi_0 pypi
[conda] mkl_fft 1.3.0 py39h42c9631_2
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] mypy 0.812 pyhd8ed1ab_0 conda-forge
[conda] mypy_extensions 0.4.3 py39h06a4308_0
[conda] numpy 1.20.3 pypi_0 pypi
[conda] numpy-base 1.20.2 py39hfae3a4d_0
[conda] torch 1.11.0 pypi_0 pypi
One notable difference is CUDA 11.1 locally vs 11.3 in the nightlies (note that the gist doesn't use CUDA).
Linux-cpp-lisp
Metadata
Metadata
Assignees
Labels
NNCmodule: nvfuseroncall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queue