New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
libgcc_s.so.1 must be installed for pthread_cancel to work #41661
Comments
very likely an issue with your setup; googling around that error message shows a lot of examples. |
@malfet would know more though. |
Dumped hours of debugging time into this issue since upgrading to torch 1.6.0, still have no clue what causes it. Some of my models work fine, while others abort with the previously mentioned error. Only fix I found was downgrading to 1.5.1 :/ |
raising priority based on activity |
@SteffenCzolbe can you post the environment information (see above example, which is on master). |
I did a binary search. I am not having trouble (getting regular error trace) with 1.6.0.dev20200411+cu101.
between 412-421 422-424 I just got segmentation fault. The code I use is here https://gist.github.com/ruotianluo/54c25460b2ca43a274f50e1a7daa409a. |
I cannot reproduce in a cuda-less docker environment using the pypa/manylinux2014 image based on CentOS 7.8.2003 and torch 1.6.0 official wheels. It does seem strange that @ruotianluo has two different NumPy versions. |
FYI. I seem to be able to get correct error trace with stable 1.6.0 now. But still fail with torch==1.6.0.dev20200428+cu101.
|
Other info that will be relevant:
@ruotianluo looking at you package info, you are using a mix between
If you could reproduce this in a clean environment where all packages are installed with the same package manager, then that would really help (and hint at a real issue). E.g.:
|
I built my gcc from source. (I used to use conda to install torch and I switched to pip. That's why there is something left in the conda env. (magma/cudatoolkit)) Using a clean conda env as you suggested: the same result. Getting "libgcc_s.so.1 must be installed for pthread_cancel to work" with torch==1.6.0.dev20200428+cu101, and correct error trace with stable 1.6.0. Maybe it has been fixed in 1.6.0??
|
Try installing the conda compilers |
Installed gcc:
Still fail with torch==1.6.0.dev20200428+cu101 |
Are you using the pytorch you think you are? |
|
So installing the conda compilers was not useful, since the "wrong" |
Yes.
|
Can you put |
It can pick up the conda libgcc_s now. But I still get "libgcc_s.so.1 must be installed for pthread_cancel to work" |
I am thinking this is not connected to pytorch, rather any program you compile that uses |
Yes. It runs correctly. |
It looks to me like @ruotianluo installed compilers without rebuilding PyTorch and Torchvision with those compilers, or is still working in a conda env that's somehow messed up. I'd suggest the current back-and-forth isn't all that helpful. There are two people who reported this issue, but there's no reproducer. We need a full reproducer, either with Docker with a system GCC from the distro's package manager, or in a clean conda env with conda compilers. |
After I upgraded to 1.6.0, I also encountered the same problem. I have checked that libgcc_s.so.1 exists. My GCC is also 7.5. |
@kenjewu thanks for the report. Could you please add the output of |
@ruotianluo Did you solve this problem锛孖 also encounter this problem when I use pytorch1.6. |
For me it consistently happens when I install latest pytorch and use it with:
without dataparallel works fine, I checked that LD_LIBRARY_PATH contains libgcc_s.so.1 |
how do you downgrade? |
@brando90 , if you are using pip, you can downgrade by doing |
Any updates on this? Updating to 1.6.0 gives the same error. |
So far we have theorized that the wrong Since I cannot reproduce nor help without more information, I am unassigning myself from the issue. |
I don't think it is actually. A test with the exact
Yep, me neither. If anyone who encounters this issue could put together a reproducer, that would be super helpful. Try one of: |
Had this error too with |
I can confirm that I got the same error with torch version '1.9.0+cu102', and downgrading to previous versions solved it. |
I also get this error with pytorch 1.9.0+cu102 |
@seyeeet can you run `python3 -m torch.utils.collect_env" and share its output here? |
@malfet
|
I've also encountered this issue after installing nightly build of PyTorch. On latest stable 1.9 release it works. For me the output of
|
Same error with
|
Same error:
|
it is so weird that this issue is everywhere. But, I believe it is about older Ubuntu version of the server/setup that you are using. Try using a server/reproducing the same error on a server which has Ubuntu 20< or newer |
+1 I'm getting this issue while running |
馃悰 Bug
Only get error with nightly, 1.5.1 works fine.
(Edit: I saw this at other places. The main problem of getting this error is I can't see the original error trace.)
How to reproduce
By stepping in, it seems the error occurs at the end of downloading.
Environment
Collecting environment information...
PyTorch version: 1.7.0.dev20200709+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: CentOS Linux 7 (Core)
GCC version: (GCC) 7.5.0
CMake version: version 3.14.0
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
Nvidia driver version: 418.43
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] detectron-pytorch==0.1
[pip] gluoncv-torch==0.0.3
[pip] numpy==1.18.4
[pip] numpydoc==0.9.2
[pip] pytorch-lightning==0.8.6.dev0
[pip] pytorch-pretrained-bert==0.6.2
[pip] torch==1.7.0.dev20200709+cu101
[pip] torchvision==0.8.0.dev20200719+cu101
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] detectron-pytorch 0.1 dev_0
[conda] gluoncv-torch 0.0.3 pypi_0 pypi
[conda] magma-cuda102 2.5.2 1 pytorch
[conda] mkl 2019.5 281 conda-forge
[conda] mkl-include 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] numpy 1.17.2 pypi_0 pypi
[conda] numpydoc 0.9.2 py_0
[conda] pytorch-lightning 0.8.6.dev0 dev_0
[conda] pytorch-pretrained-bert 0.6.2 pypi_0 pypi
[conda] torch 1.7.0.dev20200709+cu101 pypi_0 pypi
[conda] torchvision 0.8.0.dev20200719+cu101 pypi_0 pypi
cc @ezyang @gchanan @zou3519 @seemethere @malfet
The text was updated successfully, but these errors were encountered: