Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please help me with OSError: libcusparse.so.10: cannot open shared object file: No such file or directory #1125

Closed
yrwangxd opened this issue Apr 13, 2020 · 39 comments

Comments

@yrwangxd
Copy link

yrwangxd commented Apr 13, 2020

❓ Questions & Help

this is the traceback

`Traceback (most recent call last):
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_sparse/init.py", line 15, in
library, [osp.dirname(file)]).origin)
File "/home/yrwang/.local/lib/python3.6/site-packages/torch/_ops.py", line 106, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_geometric/init.py", line 2, in
import torch_geometric.nn
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_geometric/nn/init.py", line 2, in
from .data_parallel import DataParallel
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_geometric/nn/data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_geometric/data/data.py", line 7, in
from torch_sparse import coalesce
File "/home/yrwang/.local/lib/python3.6/site-packages/torch_sparse/init.py", line 23, in
raise OSError(e)
OSError: libcusparse.so.10: cannot open shared object file: No such file or directory
`

my cuda,cudnn is well installed :
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
my torch version:
>>> print(torch.__version__) 1.4.0
I use

`pip3 install torch-scatter==2.0.4+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html

pip3 install torch-sparse==0.6.1+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html

pip3 install torch-cluster==1.5.4+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html

pip3 install torch-spline-conv==1.2.0+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html

pip3 install torch-geometric`
to install torch-geometric, but the problem occur, thanks for helping me

@yrwangxd
Copy link
Author

libcusparse.so and libcusparse.so.10 is already included in usr/local/cuda/lib64

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2020

Is this path added to LD_LIBRARY_PATH?

@yrwangxd
Copy link
Author

yrwangxd commented Apr 13, 2020

thank you ,yes ,I check it,this is the result:
`echo $LD_LIBRARY_PATH

/usr/lcoal/cuda-10.1/lib64:
`

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2020

And what does torch.cuda.version say?

@yrwangxd
Copy link
Author

yrwangxd commented Apr 13, 2020

Do you mean torch.version.cuda?

the result of ' torch.version.cuda' is :
`>>> print(torch.version.cuda)

10.1
the result of 'torch.cuda.version' is >>> torch.cuda.version
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'torch.cuda' has no attribute 'version'
`

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2020

Can you do me a favor and see if you can install from torch-scatter from source?

@yrwangxd
Copy link
Author

Yes, I am glad to do it. what should I do?And how can I install from torch-scatter from source?

@yrwangxd
Copy link
Author

where can I find the way to install torch-scatter from source codes?

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2020

See here.

@yrwangxd
Copy link
Author

yrwangxd commented Apr 13, 2020

I follow your instruction to install torch-scatter from source,.the process and result is as following, but it still has the problem mentioned above.What should I do? Your PyG is really important for me,thank you very much.

~$ python3 -c "import torch; print(torch.version)"
1.4.0
~$ echo $PATH
/usr/local/cuda-10.1/bin:/home/yrwang/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
~$ echo $CPATH
/usr/local/cuda/include:
~$ pip3 install torch-scatter
Collecting torch-scatter
Installing collected packages: torch-scatter
Successfully installed torch-scatter-2.0.4

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2020

Mh, this is super weird :( Do you have multiple CUDA versions installed on your system? There must be a reason why it tries to look in the wrong folder.

@yrwangxd
Copy link
Author

no,I only have one CUDA version installed on my system.

@yrwangxd
Copy link
Author

Dear author, I made it. Thank you for your help. I downgrade CUDA to version 10.0, pytorch to version 1.4.0+cu100, torchvision to 0.5.0+cu100, and install torch-scatter torch-sparse torch-cluster torch-spline-conv from source. I tried to use version cu100 .whl to install them, but it doesn't work. The commands I used are as follows:

pip3 install torch-scatter
pip3 install torch-sparse
pip3 install torch-cluster
pip3 install torch-spline-conv
pip3 install torch-geometric torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

If someone need help, please contact me.

@rusty1s
Copy link
Member

rusty1s commented Apr 14, 2020

Glad that you made it, but why did downgrading help?

@chrislouis0106
Copy link

Mh, this is super weird :( Do you have multiple CUDA versions installed on your system? There must be a reason why it tries to look in the wrong folder.

I want to ask the problem. In the usr/local have the CUDE and CUDE-10.0. Does the meaning represent the one code or multiply CUDE

@chrislouis0106
Copy link

I follow your instruction to install torch-scatter from source,.the process and result is as following, but it still has the problem mentioned above.What should I do? Your PyG is really important for me,thank you very much.

~$ python3 -c "import torch; print(torch.version)"
1.4.0
~$ echo $PATH
/usr/local/cuda-10.1/bin:/home/yrwang/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
~$ echo $CPATH
/usr/local/cuda/include:
~$ pip3 install torch-scatter
Collecting torch-scatter
Installing collected packages: torch-scatter
Successfully installed torch-scatter-2.0.4
thank you very much. And I use your first orders, then I complete the setup so amazing. And I don't why.
And my environment is Conda env,ubuntu,NVIDIA-SMI 418.67 Driver Version: 418.67,default CUDA and CUDA-10.0, Pytorch1.4 ,cudnn7.6.5.
Then I try to install the torch_geometric. I achieve the work with your orders.

@aHuiWang
Copy link

Dear author, I made it. Thank you for your help. I downgrade CUDA to version 10.0, pytorch to version 1.4.0+cu100, torchvision to 0.5.0+cu100, and install torch-scatter torch-sparse torch-cluster torch-spline-conv from source. I tried to use version cu100 .whl to install them, but it doesn't work. The commands I used are as follows:

pip3 install torch-scatter
pip3 install torch-sparse
pip3 install torch-cluster
pip3 install torch-spline-conv
pip3 install torch-geometric torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

If someone need help, please contact me.

Could you please show your versions of torch-scatter, torch-sparse, torch-cluster, and torch-split-conv respectively?

@aHuiWang
Copy link

CUDA 10.0
torch 1.4.0+cu100
torch-cluster 1.5.4
torch-geometric 1.4.3
torch-scatter 2.0.4
torch-sparse 0.6.4
torch-spline-conv 1.2.0
torchvision 0.5.0+cu100
When I import torch_geometric, I meet this error:
Traceback (most recent call last):
File "gcn.py", line 6, in
from torch_geometric.datasets import Planetoid
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_geometric/init.py", line 2, in
import torch_geometric.nn
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_geometric/nn/init.py", line 2, in
from .data_parallel import DataParallel
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_geometric/nn/data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_geometric/data/data.py", line 7, in
from torch_sparse import coalesce
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_sparse/init.py", line 34, in
from .storage import SparseStorage # noqa
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_sparse/storage.py", line 21, in
class SparseStorage(object):
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch/jit/init.py", line 1274, in script
_compile_and_register_class(obj, _rcb, qualified_name)
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch/jit/init.py", line 1115, in _compile_and_register_class
_jit_script_class_compile(qualified_name, ast, rcb)
RuntimeError:

init(torch.torch_sparse.storage.SparseStorage self, Tensor? row, Tensor? rowptr, Tensor? col, Tensor? value, (int, int)? sparse_sizes, Tensor? rowcount, Tensor? colptr, Tensor? colcount, Tensor? csr2csc, Tensor? csc2csr, bool is_sorted) -> (None):
Expected a value of type 'Optional[Tensor]' for argument 'row' but instead found type 'int'.
:
File "/home/wanghui/anaconda3/envs/gnn/lib/python3.7/site-packages/torch_sparse/storage.py", line 283
col = idx % num_cols

    return SparseStorage(row=row, rowptr=None, col=col, value=self._value,
           ~~~~~~~~~~~~~ <--- HERE
                         sparse_sizes=(num_rows, num_cols), rowcount=None,
                         colptr=None, colcount=None, csr2csc=None,

@rusty1s
Copy link
Member

rusty1s commented May 27, 2020

Can you hack torch_sparse.storage.py by replacing this line with:

row = idx / num_cols

@aHuiWang
Copy link

OK! It works! Thank you!

@jurasq
Copy link

jurasq commented Jun 10, 2020

I've encountered a similar problem (OSError: libcusparse.so.10.0: cannot open shared object file: No such file or directory), and I think torch-sparse might be looking in a wrong place for the library: I have CUDA 10.2, but I'm using an older version of torch: torch==1.4.0+cu100 and somehow installation of dependencies with cu100 made it problematic. Updating torch to 1.4.0+cu101 and installing the dependencies as in README with cu101 made the issue disappear.

@aajanqd
Copy link

aajanqd commented Jun 12, 2020

torch 1.5.0+cu101
torch-cluster 1.5.4
torch-geometric 1.5.0
torch-scatter 2.0.4
torch-sparse 0.6.4
torch-spline-conv 1.2.0
torchvision 0.6.0+cu101

I'm having a similar but slightly different issue:

File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_geometric/init.py", line 2, in
import torch_geometric.nn
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_geometric/nn/init.py", line 2, in
from .data_parallel import DataParallel
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_geometric/nn/data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_geometric/data/data.py", line 7, in
from torch_sparse import coalesce
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch_sparse/init.py", line 13, in
library, [osp.dirname(file)]).origin)
File "/home/aqd215/pyenv/py3.7/lib/python3.7/site-packages/torch/_ops.py", line 105, in load_library
ctypes.CDLL(path)
File "/share/apps/anaconda3/5.3.1/lib/python3.7/ctypes/init.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

@xdwang0726
Copy link

Dear author, I made it. Thank you for your help. I downgrade CUDA to version 10.0, pytorch to version 1.4.0+cu100, torchvision to 0.5.0+cu100, and install torch-scatter torch-sparse torch-cluster torch-spline-conv from source. I tried to use version cu100 .whl to install them, but it doesn't work. The commands I used are as follows:

pip3 install torch-scatter
pip3 install torch-sparse
pip3 install torch-cluster
pip3 install torch-spline-conv
pip3 install torch-geometric torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

If someone need help, please contact me.

Hey, I followed your installation, but the problem is still here.

torch 1.4.0+cu100
torch-cluster 1.5.4
torch-scatter 2.0.4
torch-sparse 0.6.1
torch-spline-conv 1.2.0
torchvision-0.5.0+cu100

@phamquiluan
Copy link

phamquiluan commented Jun 16, 2020

Ubuntu 18.04
This is my procedure to fix this bug.

  1. cd to /usr/local/cuda
  2. run find -name libcus*

image

if you see "libcusparse.so.11", continue following steps:

remove current cuda

  1. sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
  2. sudo apt-get --purge remove "*nvidia*"

install new cuda-10-2
5. wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
6. sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
7. wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
8. sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
9. sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
10. sudo apt-get update
11. sudo apt-get -y install cuda-10-2

image

add CUDA to PATH

$ export PATH=/usr/local/cuda/bin:$PATH
$ echo $PATH
>>> /usr/local/cuda/bin:...
$ export CPATH=/usr/local/cuda/include:$CPATH
$ echo $CPATH
>>> /usr/local/cuda/include:...
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
$ echo $LD_LIBRARY_PATH
>>> /usr/local/cuda/lib64:...
$ export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
$ echo $DYLD_LIBRARY_PATH
>>> /usr/local/cuda/lib:...

image

@Sanchez2020
Copy link

First check this

  1. cd to /usr/local/cuda
  2. find -name libcus*
    if don't have libcusparse.so.10(maybe u find libcusparse.so.10.0 or libcusparse.so.11 etc.)
    try install corresponding CUDA Toolkit from nvidia org.

I install CUDA Toolkit 10.1 and change environment variables. It's works.

@cisprague
Copy link

@bsaberla
Copy link

bsaberla commented Oct 28, 2020

se
pip3 install torch-cluster
pip3 install torch-spline-conv
pip3 install torch-geometric torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

I followed your steps now I am getting this error:
OSError: libtorch_cpu.so: cannot open shared object file: No such file or directory

Can you help me to fix it?

@rusty1s
Copy link
Member

rusty1s commented Oct 28, 2020

Using PyTorch 1.4.0 is no longer supported and it is recommended to update your PyTorch version. I suggest to use PyTorch 1.6.0 (since wheels are not yet ready for PyTorch 1.7.0).

You can then install PyG as described here:
https://github.com/rusty1s/pytorch_geometric#pytorch-160

@vi-codes
Copy link

vi-codes commented Nov 5, 2020

I followed your steps as describe in: https://github.com/rusty1s/pytorch_geometric#pytorch-160

My setting is as follows:
torch 1.6.0+cu102
torch-cluster 1.5.8
torch-scatter 2.0.5
torch-sparse 0.6.8
torch-spline-conv 1.2.0
torchvision-0.7.0+cu102

Error: OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

Trace:
Traceback (most recent call last):File "/home/Documents/Graph-master/code/Test.py", line 15, in from inputsdata import MyOwnDatasetFile "/home/Documents/Graph-master/code/inputsdata.py", line 11, in from torch_geometric.data import Data, DataLoader, DatasetFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_geometric/init.py", line 2, in import torch_geometric.nnFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_geometric/nn/init.py", line 2, in from .data_parallel import DataParallelFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_geometric/nn/data_parallel.py", line 5, in from torch_geometric.data import BatchFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in from .data import DataFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_geometric/data/data.py", line 7, in from torch_sparse import coalesce, SparseTensorFile "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch_sparse/init.py", line 12, in torch.ops.load_library(importlib.machinery.PathFinder().find_spec(File "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/site-packages/torch/_ops.py", line 105, in load_libraryctypes.CDLL(path)File "/home/anaconda2/envs/torch_vi_Work/lib/python3.8/ctypes/init.py", line 381, in __init__self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

I tried every other method that worked for others listed in this thread but still this issue persists for me. @rusty1s Could you please help me resolve this? This is key to proceed with my work.

Thanks in advance!

@rusty1s
Copy link
Member

rusty1s commented Nov 5, 2020

libcusparse.so.xxx should be either contained in $CUDA_HOME/lib or in .../miniconda3/lib. In case it is only included in the latter, please add that path to LD_LIBRARY_PATH.

@vi-codes
Copy link

vi-codes commented Nov 5, 2020

libcusparse.so.xxx should be either contained in $CUDA_HOME/lib or in .../miniconda3/lib. In case it is only included in the latter, please add that path to LD_LIBRARY_PATH.

I wasn't able to figure that out but I managed to get it running with the +cpu version instead.
Thanks for the response and this library!

@vi-codes
Copy link

vi-codes commented Nov 9, 2020

libcusparse.so.xxx should be either contained in $CUDA_HOME/lib or in .../miniconda3/lib. In case it is only included in the latter, please add that path to LD_LIBRARY_PATH.

I wasn't able to figure that out but I managed to get it running with the +cpu version instead.
Thanks for the response and this library!

Just an update:
The CUDA on my system got accidentally deleted and therefore the error. I reinstalled CUDA 11 and then installed the corresponding pytorch 1.7.0 just the way it's mentioned in step 3 of installation via binaries here: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
I no longer have this issue.
Thanks again!

@filipekstrm
Copy link
Contributor

filipekstrm commented Dec 10, 2020

I had this problem when using conda. In my conda environment I had installed pytorch and torchvision with pip, but pip uninstall pytorch and torchvision and then install them through conda instead solved the issue

@qwu01
Copy link

qwu01 commented Feb 13, 2021

Had this problem on a cluster (cuda11, pytorch1.7). export LD_LIBRARY_PATH=somewhere/cudacore/11.0.2/lib64 solved the problem.

@janblumenkamp
Copy link

I am having a similar issue. My system has CUDA 11.2 installed:

/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

But Torch uses 10.2:

>>> import torch
>>> torch.__version__
'1.8.0'
>>> torch.cuda.is_available()
True
>>> torch.version.cuda
'10.2'

I learned that torch only works with CUDA 10.2 and is installed with its own CUDA, so that makes sense. I install everything in a virtualenv. If I install torch geometric like this:

pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html
pip install torch_geometric

I get the error

OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

So it appears like torch geometric is searching for the library in the system's cuda installation and not in the cuda installation that is delivered with pytorch. I can't find the library in the virtualenv folder though and I do not want to downgrade the CUDA on my system... Is there a way to point torch geometric to the cuda supplied with torch installed in a virtualenv?

@rusty1s
Copy link
Member

rusty1s commented May 7, 2021

This should be possible by setting $LD_LIBRARY_PATH to the CUDA provided by PyTorch. You can search for this folder by directly searching for libcusparse.so.10 globally on your machine.

@JD-ETH
Copy link

JD-ETH commented Jun 2, 2021

@rusty1s Tried to go through the answers, but couldn't find any that answer my question, so I would like to ask you here:

Is it not possible to install pytorch-geometric without a system-wide install of cuda-toolkit (nvcc)? I tried to avoid installing cuda systemwide and use cuda-toolkit in conda environment only, but from the installation guide it seems a path to /usr/local/cuda is needed.

I'd like to be able to install torch wheel built on specific cuda and do the same for pytorch-geometric. Is that possible?

@rusty1s
Copy link
Member

rusty1s commented Jun 4, 2021

You can either use a system-wide CUDA installation or via cuda-toolkit from conda. Which path is picked up by PyTorch depends on your system configuration, i.e. environment variables LD_LIBRARY_PATH and CUDA_HOME.

If you want to build PyTorch from source using a specific CUDA installation, you can also do that for our extension packages (e.g., by cloning and running python setup.py install).

@EJHyun
Copy link

EJHyun commented Aug 9, 2021

I think i found the answer.. the reason is
when you are inside the virtual environment,
you can not open shared directory of cudatoolkit
which is standing outside the virtual environment.

you can use sudo like

--> sudo python main.py

or just install cudatoolkit inside the virtual environment like

--> conda activate [your env]
--> conda install -c anaconda cudatoolkit=10.1
--> python main.py

I used second way (installing cudatoolkit inside the environment)
because "sudo python" uses "python" outside the virtual environment.
So it's not using the right version I want to use inside the virtual environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests