Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] the LTS torch==1.8.2 pip package is incomplete #69689

Open
stas00 opened this issue Dec 9, 2021 · 5 comments
Open

[bug] the LTS torch==1.8.2 pip package is incomplete #69689

stas00 opened this issue Dec 9, 2021 · 5 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: lts related to Enterprise PyTorch Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@stas00
Copy link
Contributor

stas00 commented Dec 9, 2021

馃悰 Describe the bug

I suspect the LTS torch==1.8.2 pip package is incomplete, as it is missing at least libnvrtc-builtins.so.11.1 files.
I followed the instructions to the dot at https://pytorch.org/get-started/locally/

pip3 install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

is missing these files. With this I get:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: failed to open libnvrtc-builtins.so.11.1.

when using Megatron-LM
and conda installs these just fine:

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia

and then all works.

I normally use pt-1.10 now, but needed to check whether my change still worked in pt-1.8.

Thanks.

Versions

Collecting environment information...
PyTorch version: 1.11.0.dev20211208
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.21.3
Libc version: glibc-2.31

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-90-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration:
GPU 0: NVIDIA GeForce GTX 1070 Ti
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 495.29.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.11.0.dev20211208+cu113
[pip3] torchaudio==0.11.0.dev20211208+cu113
[pip3] torchvision==0.12.0.dev20211208+cu113
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.20.3 pypi_0 pypi
[conda] numpy-base 1.21.2 py38h79a1101_0
[conda] pytorch 1.11.0.dev20211208 py3.8_cuda11.3_cudnn8.2.0_0 pytorch-nightly
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torch 1.11.0.dev20211208+cu113 pypi_0 pypi
[conda] torchaudio 0.11.0.dev20211208+cu113 pypi_0 pypi
[conda] torchvision 0.12.0.dev20211208+cu113 pypi_0 pypi

cc @ngimel

@malfet malfet added module: lts related to Enterprise PyTorch module: cuda Related to torch.cuda, and CUDA support in general labels Dec 9, 2021
@mstfbl
Copy link
Collaborator

mstfbl commented Dec 9, 2021

Hi @stas00 , for reproducing this issue on my end, can you please include a link for the exact Megatron-LM PyTorch script you've used, where you saw the failed to open libnvrtc-builtins.so.11.1 issue? Also, the environment information that you've included in the issue seems to be with your PyTorch nightly 1.11.0 installation, and not with the LTS installations. It would be helpful if you could replace the environment information with that of your PyTorch LTS installation. Thanks!

@ngimel
Copy link
Collaborator

ngimel commented Dec 9, 2021

@mstfbl This is likely similar to #58101, you should be able to repro it with the reproducer script in that issue (and the fix is also likely similar)

@mstfbl
Copy link
Collaborator

mstfbl commented Dec 9, 2021

Hi @ngimel , yes shortly after my comment I realized that a cherry-pick PR of that commit in issue #58101 would suffice, I've opened pytorch/builder PR #908 to address this. Thank you!

@VitalyFedyunin VitalyFedyunin added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 9, 2021
@stas00
Copy link
Contributor Author

stas00 commented Dec 10, 2021

So you have everything you need, right?

Sorry, I no longer had that environment, as I originally reported this on the pytorch slack some weeks back.

@tgolsson
Copy link

Found this issue while searching as I ran into this while setting up a docker image. Until a fixed wheel is published I solved this by aliasing the existing library to the expected version. Note that it doesn't seem to matter if you have a valid libnvrtc-builtins.so.11.1 elsewhere, such as in your CUDA installation.

The specific command I used for Python 3.8 is below. I did not check if a relative symlink would work.

ln -s /usr/local/lib/python3.8/dist-packages/torch/lib/libnvrtc-builtins.so /usr/local/lib/python3.8/dist-packages/torch/lib/libnvrtc-builtins.so.11.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: lts related to Enterprise PyTorch Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants