Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation issue: undefined symbol: __cudaPopCallConfiguration #19

Closed
davidbau opened this issue Dec 17, 2018 · 10 comments
Closed

Installation issue: undefined symbol: __cudaPopCallConfiguration #19

davidbau opened this issue Dec 17, 2018 · 10 comments

Comments

@davidbau
Copy link

On linux, when I try to install and use pytorch_scatter, I get undefined symbol: __cudaPopCallConfiguration immediately upon importing torch_scatter.

Using pytorch 1.0.0 and CUDA 9.0 is on the PATH (and include is on the CPATH):

$ python -c "import torch; print(torch.__version__)"
1.0.0
$ echo $CPATH
/usr/local/cuda-9.0/include
$ echo $PATH
/usr/local/cuda-9.0/bin:/afs/csail.mit.edu/u/d/davidbau/.conda/envs/p3t1/bin...

I've tried uninstalling and resintalling (without cache) on pip pip install --no-cache-dir torch_scatter, but the error remains. Any tips?

Details - ubuntu 16.04

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.5 LTS
Release:        16.04
Codename:       xenial

Environment installed via conda using the following env.yml

name: p3t1
channels:
  - pytorch
dependencies:
  - python=3.6
  - cudatoolkit=9.0
  - cudnn=7.1.2
  - pytorch=1.0
  - torchvision
  - mkl-include
  - numpy
  - scipy
  - scikit-learn
  - matplotlib
  - graphviz
  - numba
  - jupyter
  - pyyaml
  - mkl
  - setuptools
  - cmake
  - cffi
  - ujson
  - tqdm
  - pip
  - pip:
    - torch-scatter
@rayush7
Copy link

rayush7 commented Dec 23, 2018

I am facing the same problem. Did you figure out how to solve the problem?

@davidbau
Copy link
Author

Not yet.

@rusty1s
Copy link
Owner

rusty1s commented Dec 24, 2018

Did you try to download the repo and run python setup.py install?

@davidbau
Copy link
Author

Yes, same issue occurs. python setup.py install looks fine, running a lot of build steps, but then

$ python -c "import torch; from torch_scatter import scatter_max"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/csail.mit.edu/u/d/davidbau/git/pytorch_scatter/torch_scatter/__init__.py", line 3, in <module>
    from .mul import scatter_mul
  File "/afs/csail.mit.edu/u/d/davidbau/git/pytorch_scatter/torch_scatter/mul.py", line 3, in <module>
    from torch_scatter.utils.ext import get_func
  File "/afs/csail.mit.edu/u/d/davidbau/git/pytorch_scatter/torch_scatter/utils/ext.py", line 5, in <module>
    import torch_scatter.scatter_cuda
ImportError: /afs/csail.mit.edu/u/d/davidbau/git/pytorch_scatter/torch_scatter/scatter_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

@rusty1s
Copy link
Owner

rusty1s commented Dec 24, 2018

Does the PyTorch CUDA version torch.version.cuda match the one of the system?

Can you make sure the official PyTorch extensions run on your machine?

@davidbau
Copy link
Author

davidbau commented Dec 24, 2018

Thanks for the tip! The system has multiple nvcc and even though the version on the PATH matched torch.version.cuda, it looks like setup.py was picking up the wrong one. It looks like the torch extension API is looking for CUDA_HOME. So doing this before pip install or python setup.py install fixes the problem (it's not necessary for the right nvcc to show up on PATH or for the right include to be on CPATH - everything is keyed off of CUDA_HOME):

export CUDA_HOME=/usr/local/cuda-9.0

Another problem I was having while testing configurations was failing to sufficiently clean out binaries built with the wrong compiler. For others following along, I found this was enough to clean things:

pip uninstall torch-scatter
rm -rf build torch_scatter/*.so; python setup.py clean # within the torch_scatter sources

And then use --no-cache-dir when reinstalling with the proper environment variable.

export CUDA_HOME=/usr/local/cuda-9.0
pip install --no-cache-dir torch-scatter

Problem solved!

@rayush7
Copy link

rayush7 commented Dec 24, 2018

@rusty1s Thanks for pointing out to check the PyTorch CUDA version and the Cuda version installed on the system.

I had Cuda-9.2 installed on my system and with PyTorch 1.0, Cuda-9.0 was getting installed by default. Therefore there was a mismatch.

I am using Ubuntu 18.04, therefore I changed to CUDA-10 instead of CUDA-9 in order to avoid mismatch of nvcc compiler & gcc/g++ compilers and then installed Pytorch for CUDA-10.0.

After that I followed the same steps as mentioned by @davidbau (with change of cuda-10.0) and everything worked.

Thank you both of you.

@rusty1s
Copy link
Owner

rusty1s commented Dec 25, 2018

Cool that it works now :)

@zc-alexfan
Copy link

Thanks for the tip! The system has multiple nvcc and even though the version on the PATH matched torch.version.cuda, it looks like setup.py was picking up the wrong one. It looks like the torch extension API is looking for CUDA_HOME. So doing this before pip install or python setup.py install fixes the problem (it's not necessary for the right nvcc to show up on PATH or for the right include to be on CPATH - everything is keyed off of CUDA_HOME):

export CUDA_HOME=/usr/local/cuda-9.0

Another problem I was having while testing configurations was failing to sufficiently clean out binaries built with the wrong compiler. For others following along, I found this was enough to clean things:

pip uninstall torch-scatter
rm -rf build torch_scatter/*.so; python setup.py clean # within the torch_scatter sources

And then use --no-cache-dir when reinstalling with the proper environment variable.

export CUDA_HOME=/usr/local/cuda-9.0
pip install --no-cache-dir torch-scatter

Problem solved!

Thanks for the great response! I have the same problem here, but I am still getting stuck. I am wondering if I could get some suggestions.

My PyTorch was installed on a Conda environment. Inside the environment, the CUDA version is:

>>> import torch
>>> torch.version.cuda
'9.0.176'

My cudatoolkit's gives:

➜  bin ./nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

Yes, there is a mismatch. I tried your suggestion with CUDA_HOME=/path/to/cudatoolkit9.2, but the same error occurs. I am wondering if I should let CUDA_HOME be the path of CUDA that came with conda install pytorch torchvision cudatoolkit=9.0 -c pytorch when I installed PyTorch?

However, I cannot find the path of that cudatoolkit.

Thanks in Advanced.

@davidbau
Copy link
Author

No, you want nvcc for 9.0 not 9.2 since that's the version of cuda you're running inside your environment. Since nvcc doesn't get included in the conda cuda packages, you need to just install it separately. E.g., on ubuntu apt-get install cuda-9-0 will end up installing in /usr/local/cuda-9.0. This can happily coexist with your 9.2 installlation. Then you can do the following to put cuda-9.0 on CUDA_HOME automaticallly within your conda environment when it is activated:

# Set up CUDA_HOME to set itself up correctly on every source activate
# https://stackoverflow.com/questions/31598963
mkdir -p ~/.conda/envs/${ENV_NAME}/etc/conda/activate.d
echo "export CUDA_HOME=/usr/local/cuda-9.0" > \
    ~/.conda/envs/${ENV_NAME}/etc/conda/activate.d/CUDA_HOME.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants