Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: can't copy 'build/lib.linux-x86_64-3.9/torchrl/_torchrl.so': doesn't exist or not a regular file #147

Closed
bkpcoding opened this issue May 17, 2022 · 13 comments
Labels
help wanted Extra attention is needed

Comments

@bkpcoding
Copy link

image
When trying to install the package using "pip install -e ." or "python setup.py develop", I encounter this error.
OS: Ubuntu 22.04 LTS (x86_64)
Python version: 3.9.12

@Benjamin-eecs Benjamin-eecs added the bug Something isn't working label May 17, 2022
@Benjamin-eecs
Copy link
Contributor

May because of missing cuda driver installation.

@Benjamin-eecs Benjamin-eecs added help wanted Extra attention is needed and removed bug Something isn't working labels May 17, 2022
@vmoens
Copy link
Contributor

vmoens commented May 18, 2022

Did you perhaps install the cuda version of pytorch but don't have cuda device on your machine?

$ conda install pytorch torchvision torchaudio cpuonly -c pytorch

instead of

$ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

If uninstalling / reinstalling pytorch doesn't work, could you post the logs following this?

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
python collect_env.py

@bkpcoding
Copy link
Author

Yeah, it works. I installed cudatoolkit even though I don't have Cuda on my machine. Thank you very much. But I think it would be nice to provide a 'cpu only' command for installation in the "stable" installation section of the readme.

@vmoens
Copy link
Contributor

vmoens commented May 18, 2022

Will do! Thanks for the suggestion !

@vmoens vmoens closed this as completed May 18, 2022
@Andrewzh112
Copy link

Andrewzh112 commented Jun 23, 2022

Got the same error and this is the output collect_env.py

Collecting environment information...

PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.27

Python version: 3.9.12 (main, Jun  1 2022, 11:38:51)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.15.0-177-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.4.48
GPU models and configuration: 
GPU 0: NVIDIA A100-PCIE-40GB
GPU 1: NVIDIA A100-PCIE-40GB
GPU 2: NVIDIA A100-PCIE-40GB
GPU 3: NVIDIA A100-PCIE-40GB
GPU 4: NVIDIA A100-PCIE-40GB
GPU 5: NVIDIA A100-PCIE-40GB
GPU 6: NVIDIA A100-PCIE-40GB
GPU 7: NVIDIA A100-PCIE-40GB

Nvidia driver version: 470.82.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.1
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.3.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.3.2
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.1
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.4
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] functorch==0.1.1
[pip3] numpy==1.22.3
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] functorch                 0.1.1                    pypi_0    pypi
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.22.3           py39he7a7128_0  
[conda] numpy-base                1.22.3           py39hf524024_0  
[conda] pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.11.0               py39_cu113    pytorch
[conda] torchvision               0.12.0               py39_cu113    pytorch

@vmoens
Copy link
Contributor

vmoens commented Jun 23, 2022

@Andrewzh112 if you're on a cluster with cudnn, can you check if you have loaded that module (as well as cuda)?
Most of the time you can do that via module avail and module load cuda/xyz && module load cudnn/xyz

@Andrewzh112
Copy link

@Andrewzh112 if you're on a cluster with cudnn, can you check if you have loaded that module (as well as cuda)? Most of the time you can do that via module avail and module load cuda/xyz && module load cudnn/xyz

What if I don't have access to module or sudo to install module? Are there alternatives to check?

@vmoens
Copy link
Contributor

vmoens commented Jun 24, 2022

Unfortunately AFAICT no, cudnn needs to be installed with cuda.

Can you post the error message when calling python setup.py develop?

@Andrewzh112
Copy link

Unfortunately AFAICT no, cudnn needs to be installed with cuda.

Can you post the error message when calling python setup.py develop?

Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
Change Dir: /data/lyh/reward_machines/rl/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeTmp

Run Build Command(s):/data/lyh/anaconda3/envs/rl/bin/ninja cmTC_3ce42 && [1/2] Building C object CMakeFiles/cmTC_3ce42.dir/src.c.o
[2/2] Linking C executable cmTC_3ce42
FAILED: cmTC_3ce42 
: && /usr/bin/cc   CMakeFiles/cmTC_3ce42.dir/src.c.o -o cmTC_3ce42   && :
CMakeFiles/cmTC_3ce42.dir/src.c.o: In function `main':
src.c:(.text+0x3e): undefined reference to `pthread_create'
src.c:(.text+0x4a): undefined reference to `pthread_detach'
src.c:(.text+0x56): undefined reference to `pthread_cancel'
src.c:(.text+0x67): undefined reference to `pthread_join'
src.c:(.text+0x7b): undefined reference to `pthread_atfork'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.


Source file was:
#include <pthread.h>

static void* test_func(void* data)
{
  return data;
}

int main(void)
{
  pthread_t thread;
  pthread_create(&thread, NULL, test_func, NULL);
  pthread_detach(thread);
  pthread_cancel(thread);
  pthread_join(thread, NULL);
  pthread_atfork(NULL, NULL, NULL);
  pthread_exit(NULL);

  return 0;
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /data/lyh/reward_machines/rl/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeTmp

Run Build Command(s):/data/lyh/anaconda3/envs/rl/bin/ninja cmTC_5ce26 && [1/2] Building C object CMakeFiles/cmTC_5ce26.dir/CheckFunctionExists.c.o
[2/2] Linking C executable cmTC_5ce26
FAILED: cmTC_5ce26 
: && /usr/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create  CMakeFiles/cmTC_5ce26.dir/CheckFunctionExists.c.o -o cmTC_5ce26  -lpthreads && :
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

here you go

@vmoens
Copy link
Contributor

vmoens commented Jun 28, 2022

I tried to google that and here's an SO issue related to this

Here are the related conda packages to install

conda install -c statiskit libboost-dev
conda install -c anaconda libboost
conda install -c conda-forge magics

Let me know if that helps!

@Andrewzh112
Copy link

Andrewzh112 commented Jun 28, 2022

I tried to google that and here's an SO issue related to this

Here are the related conda packages to install

conda install -c statiskit libboost-dev
conda install -c anaconda libboost
conda install -c conda-forge magics

Let me know if that helps!

unfortunately, still getting the same errors, even tried a new machine, followed the instructions in readme still getting the same error

@vmoens
Copy link
Contributor

vmoens commented Jul 1, 2022

We're not using cmake anymore, wanna try installing the lib now?

@Andrewzh112
Copy link

We're not using cmake anymore, wanna try installing the lib now?

works perfectly now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants