Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgcc_s.so.1 must be installed for pthread_cancel to work #41661

Open
ruotianluo opened this issue Jul 20, 2020 · 40 comments
Open

libgcc_s.so.1 must be installed for pthread_cancel to work #41661

ruotianluo opened this issue Jul 20, 2020 · 40 comments
Labels
high priority module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ruotianluo
Copy link
Contributor

ruotianluo commented Jul 20, 2020

馃悰 Bug

Only get error with nightly, 1.5.1 works fine.

(Edit: I saw this at other places. The main problem of getting this error is I can't see the original error trace.)

How to reproduce

>>> import torchvision
>>> x = torchvision.models.resnet.resnet50(True)
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home-nfs/rluo/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 97.8M/97.8M [00:01<00:00, 65.4MB/s]
libgcc_s.so.1 must be installed for pthread_cancel to work
Aborted

By stepping in, it seems the error occurs at the end of downloading.

Environment

Collecting environment information...
PyTorch version: 1.7.0.dev20200709+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 7.5.0
CMake version: version 3.14.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 418.43
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] detectron-pytorch==0.1
[pip] gluoncv-torch==0.0.3
[pip] numpy==1.18.4
[pip] numpydoc==0.9.2
[pip] pytorch-lightning==0.8.6.dev0
[pip] pytorch-pretrained-bert==0.6.2
[pip] torch==1.7.0.dev20200709+cu101
[pip] torchvision==0.8.0.dev20200719+cu101
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] detectron-pytorch 0.1 dev_0
[conda] gluoncv-torch 0.0.3 pypi_0 pypi
[conda] magma-cuda102 2.5.2 1 pytorch
[conda] mkl 2019.5 281 conda-forge
[conda] mkl-include 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] numpy 1.17.2 pypi_0 pypi
[conda] numpydoc 0.9.2 py_0
[conda] pytorch-lightning 0.8.6.dev0 dev_0
[conda] pytorch-pretrained-bert 0.6.2 pypi_0 pypi
[conda] torch 1.7.0.dev20200709+cu101 pypi_0 pypi
[conda] torchvision 0.8.0.dev20200719+cu101 pypi_0 pypi

cc @ezyang @gchanan @zou3519 @seemethere @malfet

@gchanan
Copy link
Contributor

gchanan commented Jul 20, 2020

very likely an issue with your setup; googling around that error message shows a lot of examples.

@gchanan gchanan added module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 20, 2020
@gchanan
Copy link
Contributor

gchanan commented Jul 20, 2020

@malfet would know more though.

@SteffenCzolbe
Copy link

Dumped hours of debugging time into this issue since upgrading to torch 1.6.0, still have no clue what causes it. Some of my models work fine, while others abort with the previously mentioned error.

Only fix I found was downgrading to 1.5.1 :/

@ezyang
Copy link
Contributor

ezyang commented Aug 2, 2020

raising priority based on activity

@gchanan
Copy link
Contributor

gchanan commented Aug 3, 2020

@SteffenCzolbe can you post the environment information (see above example, which is on master).

@ruotianluo
Copy link
Contributor Author

I did a binary search. I am not having trouble (getting regular error trace) with 1.6.0.dev20200411+cu101.
But got:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

between 412-421

422-424 I just got segmentation fault.
and
425-427 there are no proper wheels.
428 I start to get the error in the title.

The code I use is here https://gist.github.com/ruotianluo/54c25460b2ca43a274f50e1a7daa409a.

@mattip
Copy link
Collaborator

mattip commented Aug 9, 2020

I cannot reproduce in a cuda-less docker environment using the pypa/manylinux2014 image based on CentOS 7.8.2003 and torch 1.6.0 official wheels. It does seem strange that @ruotianluo has two different NumPy versions.

@mattip mattip self-assigned this Aug 9, 2020
@ruotianluo
Copy link
Contributor Author

FYI. I seem to be able to get correct error trace with stable 1.6.0 now. But still fail with torch==1.6.0.dev20200428+cu101.
current env:

Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 7.5.0
CMake version: version 3.14.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 418.43
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] detectron-pytorch==0.1
[pip] gluoncv-torch==0.0.3
[pip] numpy==1.18.4
[pip] numpydoc==0.9.2
[pip] pytorch-pretrained-bert==0.6.2
[pip] torch==1.6.0+cu101
[pip] torchvision==0.7.0+cu101
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] detectron-pytorch         0.1                       dev_0    <develop>
[conda] gluoncv-torch             0.0.3                    pypi_0    pypi
[conda] magma-cuda102             2.5.2                         1    pytorch
[conda] mkl                       2019.5                      281    conda-forge
[conda] mkl-include               2020.1                      217
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] numpy                     1.18.4                   pypi_0    pypi
[conda] numpydoc                  0.9.2                      py_0
[conda] pytorch-pretrained-bert   0.6.2                    pypi_0    pypi
[conda] torch                     1.6.0+cu101              pypi_0    pypi
[conda] torchvision               0.7.0+cu101              pypi_0    pypi

@rgommers
Copy link
Collaborator

rgommers commented Aug 9, 2020

Other info that will be relevant:

  • how did you install GCC?
  • how did you install PyTorch and torchvision?

@ruotianluo looking at you package info, you are using a mix between pip and conda-installed packages - this is never a good idea and will lead to the kind of issue you're seeing.

FYI. I seem to be able to get correct error trace with stable 1.6.0 now.

If you could reproduce this in a clean environment where all packages are installed with the same package manager, then that would really help (and hint at a real issue). E.g.:

conda create -n issue41661 python=3.7
conda activate issue41661
pip install numpy
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
python my_script_that_is_failing.py

@ruotianluo
Copy link
Contributor Author

ruotianluo commented Aug 9, 2020

I built my gcc from source. (I used to use conda to install torch and I switched to pip. That's why there is something left in the conda env. (magma/cudatoolkit))

Using a clean conda env as you suggested: the same result. Getting "libgcc_s.so.1 must be installed for pthread_cancel to work" with torch==1.6.0.dev20200428+cu101, and correct error trace with stable 1.6.0.

Maybe it has been fixed in 1.6.0??

Collecting environment information...
PyTorch version: 1.6.0.dev20200428+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: CentOS Linux 7 (Core)
GCC version: (GCC) 7.5.0
CMake version: version 2.8.12.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 418.43
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0.dev20200428+cu101
[pip3] torchvision==0.7.0+cu101
[conda] numpy                     1.19.1                   pypi_0    pypi
[conda] torch                     1.6.0.dev20200428+cu101          pypi_0    pypi
[conda] torchvision               0.7.0+cu101              pypi_0    pypi

@mattip
Copy link
Collaborator

mattip commented Aug 9, 2020

I built my gcc from source.

Try installing the conda compilers conda install compilers. You may have built your libc_gcc without compatible pthread support. Installing the conda compilers should supply a compatible path/to/conda/env/lib/libgcc_s.so.1

@ruotianluo
Copy link
Contributor Author

Installed gcc:

(issue41661) [rluo@gpu20 ~]$ conda list | grep  gcc
_libgcc_mutex             0.1                        main
gcc_impl_linux-64         7.3.0                habb00fd_1
gcc_linux-64              7.3.0                h553295d_9
libgcc-ng                 9.1.0                hdf63c60_0
(issue41661) [rluo@gpu20 ~]$ ls ~/rluo/local/anaconda3/envs/issue41661/lib/libgcc_s.so.1
/home-nfs/rluo/rluo/local/anaconda3/envs/issue41661/lib/libgcc_s.so.1

Still fail with torch==1.6.0.dev20200428+cu101

@mattip
Copy link
Collaborator

mattip commented Aug 10, 2020

Are you using the pytorch you think you are?
$ python -c "import torch; print(torch._C)"
What does ldd show for that file?

@ruotianluo
Copy link
Contributor Author

	linux-vdso.so.1 =>  (0x00007ffc1bbfb000)
	libshm.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libshm.so (0x00007f30d3d43000)
	libtorch_python.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_python.so (0x00007f30d2c9e000)
	libstdc++.so.6 => /share/data/vision-greg/rluo/local/gcc-7.5.0/lib64/libstdc++.so.6 (0x00007f30d291b000)
	libm.so.6 => /usr/lib64/libm.so.6 (0x00007f30d2619000)
	libgcc_s.so.1 => /share/data/vision-greg/rluo/local/gcc-7.5.0/lib64/libgcc_s.so.1 (0x00007f30d2402000)
	libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f30d21e6000)
	libc.so.6 => /usr/lib64/libc.so.6 (0x00007f30d1e19000)
	libtorch.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch.so (0x00007f30d1c05000)
	librt.so.1 => /usr/lib64/librt.so.1 (0x00007f30d19fd000)
	libtorch_cpu.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so (0x00007f30c2626000)
	libtorch_cuda.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so (0x00007f3085e2e000)
	libnvToolsExt-3965bdd0.so.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00007f3085c24000)
	libc10_cuda.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libc10_cuda.so (0x00007f30859f5000)
	libc10.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libc10.so (0x00007f3085799000)
	libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f3085595000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f30d4168000)
	libgomp-7c85b1e2.so.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libgomp-7c85b1e2.so.1 (0x00007f308536b000)
	libcudart-1b201d85.so.10.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libcudart-1b201d85.so.10.1 (0x00007f30850ec000)

@mattip
Copy link
Collaborator

mattip commented Aug 10, 2020

So installing the conda compilers was not useful, since the "wrong" libgcc_s.so.1 is being picked up. Do you have LD_LIBRARY_PATH defined?

@ruotianluo
Copy link
Contributor Author

Yes.

/share/data/vision-greg/rluo/local/gcc-7.5.0/lib64:/share/data/vision-greg/rluo/local/gcc-7.5.0/lib:/share/data/vision-greg/rluo/local/anaconda3/lib:/share/data/vision-greg/common/libjpeg/lib:/share/data/vision-greg/rluo/local/nccl/lib:/share/data/vision-greg/common/boost-1.57/lib:/share/data/vision-greg/rluo/local/cuda-10.1/lib64:/share/data/vision-greg/rluo/local/cuda-10.1/extras/CUPTI/lib64:/share/data/vision-greg/rluo/local/cudnn-7.6.4-for-cuda-10.1/lib64:/share/data/vision-greg/rluo/local/gcc-7.5.0/lib64:/share/data/vision-greg/rluo/local/gcc-7.5.0/lib:/share/data/vision-greg/rluo/local/anaconda3/lib:/share/data/vision-greg/common/libjpeg/lib:/share/data/vision-greg/rluo/local/nccl/lib:/share/data/vision-greg/common/boost-1.57/lib:/share/data/vision-greg/rluo/local/cuda-10.1/lib64:/share/data/vision-greg/rluo/local/cuda-10.1/extras/CUPTI/lib64:/share/data/vision-greg/rluo/local/cudnn-7.6.4-for-cuda-10.1/lib64:

@mattip
Copy link
Collaborator

mattip commented Aug 10, 2020

Can you put /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib before the rest so it picks up the conda-provided libgcc_s.so.1?

@ruotianluo
Copy link
Contributor Author

	linux-vdso.so.1 =>  (0x00007ffe8ffa5000)
	libshm.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libshm.so (0x00007f325d4cf000)
	libtorch_python.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_python.so (0x00007f325c42a000)
	libstdc++.so.6 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/libstdc++.so.6 (0x00007f325d99e000)
	libm.so.6 => /usr/lib64/libm.so.6 (0x00007f325c128000)
	libgcc_s.so.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/libgcc_s.so.1 (0x00007f325d971000)
	libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f325bf0c000)
	libc.so.6 => /usr/lib64/libc.so.6 (0x00007f325bb3f000)
	libtorch.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch.so (0x00007f325b92b000)
	librt.so.1 => /usr/lib64/librt.so.1 (0x00007f325b723000)
	libtorch_cpu.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so (0x00007f324c34c000)
	libtorch_cuda.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so (0x00007f320fb54000)
	libnvToolsExt-3965bdd0.so.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00007f320f94a000)
	libc10_cuda.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libc10_cuda.so (0x00007f320f71b000)
	libc10.so => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libc10.so (0x00007f320f4bf000)
	libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f320f2bb000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f325d8f4000)
	libgomp-7c85b1e2.so.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libgomp-7c85b1e2.so.1 (0x00007f320f091000)
	libcudart-1b201d85.so.10.1 => /share/data/vision-greg/rluo/local/anaconda3/envs/issue41661/lib/python3.7/site-packages/torch/lib/libcudart-1b201d85.so.10.1 (0x00007f320ee12000)

It can pick up the conda libgcc_s now. But I still get "libgcc_s.so.1 must be installed for pthread_cancel to work"

@mattip
Copy link
Collaborator

mattip commented Aug 12, 2020

I am thinking this is not connected to pytorch, rather any program you compile that uses pythread_cancel will show this error on your system. The man page has an example, does that compile and run?

@ruotianluo
Copy link
Contributor Author

Yes. It runs correctly.

@rgommers
Copy link
Collaborator

So installing the conda compilers was not useful, since the "wrong" libgcc_s.so.1 is being picked up. Do you have LD_LIBRARY_PATH defined?

It looks to me like @ruotianluo installed compilers without rebuilding PyTorch and Torchvision with those compilers, or is still working in a conda env that's somehow messed up. I'd suggest the current back-and-forth isn't all that helpful. There are two people who reported this issue, but there's no reproducer. We need a full reproducer, either with Docker with a system GCC from the distro's package manager, or in a clean conda env with conda compilers.

@vanewu
Copy link

vanewu commented Sep 1, 2020

After I upgraded to 1.6.0, I also encountered the same problem. I have checked that libgcc_s.so.1 exists. My GCC is also 7.5.

@rgommers
Copy link
Collaborator

rgommers commented Sep 1, 2020

@kenjewu thanks for the report. Could you please add the output of python torch/utils/collect_env.py? And how you installed GCC?

@philokey
Copy link

@ruotianluo Did you solve this problem锛孖 also encounter this problem when I use pytorch1.6.

@evaldsurtans
Copy link

For me it consistently happens when I install latest pytorch and use it with:

if torch.cuda.device_count() > 1:
    model = torch.nn.DataParallel(model, dim=0)

without dataparallel works fine, I checked that LD_LIBRARY_PATH contains libgcc_s.so.1

@brando90
Copy link

brando90 commented Nov 7, 2020

Dumped hours of debugging time into this issue since upgrading to torch 1.6.0, still have no clue what causes it. Some of my models work fine, while others abort with the previously mentioned error.

Only fix I found was downgrading to 1.5.1 :/

how do you downgrade?

@dnaaun
Copy link

dnaaun commented Nov 24, 2020

@brando90 , if you are using pip, you can downgrade by doing pip install torch==1.5.1 (assuming 1.5.1 is an acutal version that exists, didn't check). What will definitely work is something like pip install 'torch<1.6'

@ritvik1512
Copy link

Any updates on this? Updating to 1.6.0 gives the same error.

@mattip
Copy link
Collaborator

mattip commented Nov 30, 2020

So far we have theorized that the wrong libgcc_s.so.1 is being picked up. This could be shown by someone with the problem compiling the test program in the man page for pthread_cancel and, if it runs correctly, trying to figure out which libgcc the test program is using versus which libgcc pytorch is using.

Since I cannot reproduce nor help without more information, I am unassigning myself from the issue.

@mattip mattip removed their assignment Nov 30, 2020
@rgommers
Copy link
Collaborator

So far we have theorized that the wrong libgcc_s.so.1 is being picked up.

I don't think it is actually. A test with the exact conda env in #41661 (comment), which has a pip-installed pytorch 1.6.0 shows libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 - and running the 1.6.0 test suite against this pip-installed pytorch works just fine.

Since I cannot reproduce nor help without more information

Yep, me neither. If anyone who encounters this issue could put together a reproducer, that would be super helpful. Try one of:

@shrubb
Copy link

shrubb commented Feb 25, 2021

Had this error too with pip-installed PyTorch and with GCC built from source. This helped.

@tejas-gokhale
Copy link

I can confirm that I got the same error with torch version '1.9.0+cu102', and downgrading to previous versions solved it.

@seyeeet
Copy link

seyeeet commented Jun 25, 2021

I also get this error with pytorch 1.9.0+cu102

@malfet
Copy link
Contributor

malfet commented Jun 25, 2021

@seyeeet can you run `python3 -m torch.utils.collect_env" and share its output here?

@seyeeet
Copy link

seyeeet commented Jun 26, 2021

@malfet
yep

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.6.1810 (Core)  (x86_64)
GCC version: (GCC) 8.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.9

Python version: 3.6 (64-bit runtime)
Python platform: Linux-3.10.0-957.1.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Is CUDA available: False
CUDA runtime version: 10.2.89
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] numpydoc==1.1.0
[pip3] torch==1.9.0
[pip3] torch-summary==1.4.5
[pip3] torchaudio==0.9.0a0+33b2469
[pip3] torchfile==0.1.0
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              hfd86e86_1
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2018.0.3                      1
[conda] mkl-service               1.1.2            py36h90e4bf4_5
[conda] mkl_fft                   1.0.4            py36h4414c95_1
[conda] mkl_random                1.0.1            py36h4414c95_1
[conda] numpy                     1.19.5                   pypi_0    pypi
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1
[conda] pytorch                   1.9.0           py3.6_cuda10.2_cudnn7.6.5_0    pytorch
[conda] torch-summary             1.4.5                    pypi_0    pypi
[conda] torchaudio                0.9.0                      py36    pytorch
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchtext                 0.10.0                     py36    pytorch
[conda] torchvision               0.10.0               py36_cu102    pytorch

@bonlime
Copy link

bonlime commented Aug 4, 2021

I've also encountered this issue after installing nightly build of PyTorch. On latest stable 1.9 release it works. For me the output of python3 -m torch.utils.collect_env is as follows:

Collecting environment information...
PyTorch version: 1.10.0.dev20210804+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Clang version: 3.8.0 (tags/RELEASE_380/final)
CMake version: version 2.8.12.2
Libc version: glibc-2.17

Python version: 3.8.5 (default, Jul 29 2020, 13:59:36)  [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] (64-bit runtime)
Python platform: Linux-5.4.15-1.el7.elrepo.x86_64-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: 10.2.89
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.10.0.dev20210804+cu111
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111
[conda] Could not collect

@zeakey
Copy link

zeakey commented Nov 27, 2021

Same error with torch==1.8.1, gcc==7.3.0.
The output of python3 -m torch.utils.collect_env is:

Collecting environment information...
PyTorch version: 1.8.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Tencent tlinux 2.2 (Final) (x86_64)
GCC version: (GCC) 7.3.0
Clang version: Could not collect
CMake version: version 3.18.5

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: A100-SXM4-40GB
GPU 1: A100-SXM4-40GB
GPU 2: A100-SXM4-40GB
GPU 3: A100-SXM4-40GB
GPU 4: A100-SXM4-40GB
GPU 5: A100-SXM4-40GB
GPU 6: A100-SXM4-40GB
GPU 7: A100-SXM4-40GB

Nvidia driver version: 450.80.02
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.0.5
/usr/lib64/libcudnn_adv_infer.so.8.0.5
/usr/lib64/libcudnn_adv_train.so.8.0.5
/usr/lib64/libcudnn_cnn_infer.so.8.0.5
/usr/lib64/libcudnn_cnn_train.so.8.0.5
/usr/lib64/libcudnn_ops_infer.so.8.0.5
/usr/lib64/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] torch==1.8.1+cu111
[pip3] torchvision==0.9.1+cu111
[conda] Could not collect

@GangLiTarheel
Copy link

GangLiTarheel commented Oct 26, 2022

Same error:

$ python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.8.2003 (Core) (x86_64)
GCC version: (GCC) 9.4.0
Clang version: Could not collect
CMake version: version 3.18.0
Libc version: glibc-2.17

Python version: 3.9.1 (default, Dec 11 2020, 14:32:07)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.15.2.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080
GPU 2: NVIDIA GeForce GTX 1080
GPU 3: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.2.4
/usr/lib64/libcudnn_adv_infer.so.8.2.4
/usr/lib64/libcudnn_adv_train.so.8.2.4
/usr/lib64/libcudnn_cnn_infer.so.8.2.4
/usr/lib64/libcudnn_cnn_train.so.8.2.4
/usr/lib64/libcudnn_ops_infer.so.8.2.4
/usr/lib64/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.10.2
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchvision==0.9.1
[conda] _tflow_select             2.3.0                       mkl  
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296  
[conda] mkl-service               2.3.0            py39h27cfd23_1  
[conda] mkl_fft                   1.3.0            py39h42c9631_2  
[conda] mkl_random                1.2.1            py39ha9443f7_2  
[conda] numpy                     1.20.2           py39h2d18471_0  
[conda] numpy-base                1.20.2           py39hfae3a4d_0  
[conda] tensorflow                2.4.1           mkl_py39h4683426_0  
[conda] tensorflow-base           2.4.1           mkl_py39h43e0292_0  
[conda] torch                     1.10.2                   pypi_0    pypi
[conda] torchaudio                0.8.1                      py39    pytorch
[conda] torchvision               0.9.1                py39_cu111    pytorch

@elvinagam
Copy link

it is so weird that this issue is everywhere. But, I believe it is about older Ubuntu version of the server/setup that you are using. Try using a server/reproducing the same error on a server which has Ubuntu 20< or newer

@parvathirajan
Copy link

+1

I'm getting this issue while running pipenv install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests