Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch version #5

Open
qingzi02010 opened this issue Dec 26, 2019 · 33 comments
Open

torch version #5

qingzi02010 opened this issue Dec 26, 2019 · 33 comments

Comments

@qingzi02010
Copy link

Some errors occurred during compiling the code, can you tell us the version of the torch, and other software environment, such as cuda, cudnn, gcc, ninja, re2c. Thank you !

@onion-liu
Copy link
Contributor

I have tested it on pytorch1.3 + cuda10, it runs successfully

@rosinality
Copy link
Owner

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

@qingzi02010
Copy link
Author

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

@rosinality
Copy link
Owner

Could you retry after remove /tmp/torch_extensions directory?

@qingzi02010
Copy link
Author

Sorry, I have no idea to remove /tmp/torch_extensions, and I am not familiar with pytorch-c++ extension. Could you explain more?

@rosinality
Copy link
Owner

I suspect it is trying to use cached binaries even after CUDA updates.

@qingzi02010
Copy link
Author

now I have update cuda to 10.2, and add cuda to .bashrc file, but tha same error occurred. So do you have some suggestion? I had better reboot the machine?

@rosinality
Copy link
Owner

I don't think you need to reboot after CUDA updates. Could you post full error logs?

@qingzi02010
Copy link
Author

Traceback (most recent call last):
File "train.py", line 20, in
from model import Generator, Discriminator
File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/model.py", line 11, in
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/mnt/stylegan2_pytorch_rosinality/stylegan2-pytorch-master/op/fused_act.py", line 6, in
fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu'])
File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 841, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1052, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/opt/anaconda3/lib/python3.7/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/opt/anaconda3/lib/python3.7/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv

@qingzi02010
Copy link
Author

how about your gcc version? my gcc is 5.4, I am hesitating to update to gcc7.3

@rosinality
Copy link
Owner

I'm using gcc 5.4

Did you tried to remove cached binaries in /tmp/torch_extensions? Then could you show me

> ldd /tmp/torch_extensions/fused/fused.so

@qingzi02010
Copy link
Author

ldd /tmp/torch_extensions/fused/fused.so
linux-vdso.so.1 => (0x00007ffdeb198000)
libcudart.so.10.0 => /usr/local/lib/libcudart.so.10.0 (0x00007f24bc54d000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f24bc1cb000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f24bbfb5000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f24bbbeb000)
/lib64/ld-linux-x86-64.so.2 (0x00007f24bca7c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f24bb9e7000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f24bb7ca000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f24bb5c2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f24bb2b9000)

@rosinality
Copy link
Owner

Seems like that there are cases that pytorch couldn't resolve CUDA shared libraries. (NVIDIAGameWorks/kaolin#30) But I don't know how you can resolve it. If you use anaconda, maybe you can try to make new virtual envs and try again after install pytorch 1.3 and cudatoolkit 10.1 on new venvs.

@qingzi02010
Copy link
Author

qingzi02010 commented Dec 26, 2019 via email

@wosecz
Copy link

wosecz commented Jan 6, 2020

I have the same problem...But I was unable to solve this problem by removing /tmp/torch_extensions. Did you do anything else to solve this problem? @qingzi02010
image

@qingzi02010
Copy link
Author

No, I used the commended version of torch, once operating 'rm -rf /tmp/torch_extensions', "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv" disappeared.

@qingzi02010
Copy link
Author

qingzi02010 commented Jan 6, 2020 via email

@wosecz
Copy link

wosecz commented Jan 6, 2020

Yes this method is correct. I tried several times and fix this problem. (But got another problem......) Thank you for your reply!

@qingzi02010
Copy link
Author

https://www.cnblogs.com/rainsoul/p/12162779.html
I do not know what the problem is, you can refer to and try this method.

@kevinstan
Copy link

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

Does anyone else which tensorflow version to use? Because neither tf 1.14 or 1.15 (see original stylegan2 repo) are compatible with CUDA 10.2

@rosinality
Copy link
Owner

@kevinstan I use tf 1.15 on CUDA 10.2. It seems it can run on it.

@sadransh
Copy link

something weird happens to me. when I try to train it from screen

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /tmp/torch_extensions/fused/fused.so)

I tried removing /tmp/torchextensions but no luck!

@Harsha-Musunuri
Copy link

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

I face the same issue, did this resolve, if yes how ?
Could you please pass .yml file of conda env ?

@denabazazian
Copy link

denabazazian commented Apr 14, 2021

@Harsha-Musunuri could you resolve this issue?
I face the same problem, tensorflow1.14 is not compatible with CUDA10.2. Also, pytorch1.3 is not compatible with gcc>5 and CUDA10.2. But, the convert_weight.py code requires gcc>5 and CUDA10.2.
Do you have any .yml file of conda env which is compatible with all the versions of required libraries?

@denabazazian
Copy link

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

@rosinality pytorch 1.3 is not compatible with CUDA 10.2, did you install it locally and build PyTorch from source?

@rosinality
Copy link
Owner

@denabazazian I don't remember the environments well. You can use recent version of pytorch.

@Harsha-Musunuri
Copy link

@rosinality I installed pytorch 1.3.1,torchvision 0.4.2, cuda10.1, it occurred that "ImportError: /tmp/torch_extensions/fused/fused.so: undefined symbol: _ZN3c1011CPUTensorIdEv". Your torchvision is 0.4.2, right?

I face the same issue, did this resolve, if yes how ?
Could you please pass .yml file of conda env ?

@denabazazian try this https://drive.google.com/file/d/1EaYl5IP0gBqjagX9mZfXr88l13eUzKay/view?usp=sharing

@MHRosenberg
Copy link

I tried the conda env file to no avail. I'm using cuda 10.1 with pytorch 1.7.1. I failed to downgrade this to 1.3.1. I tried other pytorch versions but ran into other problems which when resolved ended back to this state:


CalledProcessError Traceback (most recent call last)
~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1532 stdout_fileno = 1
-> 1533 subprocess.run(
1534 command,

~/miniconda3/envs/dG/lib/python3.9/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
527 if check and retcode:
--> 528 raise CalledProcessError(retcode, process.args,
529 output=stdout, stderr=stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
~/Documents/dG/alias-free-gan-pytorch/train.py in
29 get_world_size,
30 )
---> 31 from stylegan2.op import conv2d_gradfix
32 from stylegan2.non_leaking import augment, AdaptiveAugment
33 from stylegan2.model import Discriminator

~/Documents/dG/alias-free-gan-pytorch/stylegan2/op/init.py in
----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu
2 from .upfirdn2d import upfirdn2d

~/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_act.py in
9
10 module_path = os.path.dirname(file)
---> 11 fused = load(
12 "fused",
13 sources=[

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates)
984 verbose=True)
985 '''
--> 986 return _jit_compile(
987 name,
988 [sources] if isinstance(sources, str) else sources,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates)
1191 clean_ctx=clean_ctx
1192 )
-> 1193 _write_ninja_file_and_build_library(
1194 name=name,
1195 sources=sources,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda)
1295 if verbose:
1296 print('Building extension module {}...'.format(name))
-> 1297 _run_ninja_build(
1298 build_directory,
1299 verbose,

~/miniconda3/envs/dG/lib/python3.9/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1553 if hasattr(error, 'output') and error.output: # type: ignore
1554 message += ": {}".format(error.output.decode()) # type: ignore
-> 1555 raise RuntimeError(message) from e
1556
1557

RuntimeError: Error building extension 'fused': [1/2] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/TH -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/mr/miniconda3/envs/dG/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/TH -isystem /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/mr/miniconda3/envs/dG/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /usr/local/cuda-10.1/include/cuda_runtime.h:83,
from :
/usr/local/cuda-10.1/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
| ^~~~~
In file included from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THC.h:4,
from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THCAtomics.cuh:5,
from /home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAApplyUtils.cuh:5,
from /home/mr/Documents/dG/alias-free-gan-pytorch/stylegan2/op/fused_bias_act_kernel.cu:11:
/home/mr/miniconda3/envs/dG/lib/python3.9/site-packages/torch/include/THC/THCGeneral.h:11:10: fatal error: cublas_v2.h: No such file or directory
11 | #include <cublas_v2.h>
| ^~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

@rosinality
Copy link
Owner

@MHRosenberg It is not pytorch version problem, but cuda build environment. You can check you can build cuda programs, or use https://github.com/rosinality/alias-free-gan-pytorch/blob/main/Dockerfile.

@Ameya-Deo
Copy link

Hi, I was working on SAM code and I am getting error in imports: ImportError: /root/.cache/torch_extensions/fused/fused.so: cannot open shared object file: No such file or directory
I am getting error after running from models.psp import pSp
I am running on deepnote. Could you please help me with this error?

@abdul756
Copy link

Hi, I was working on SAM code and I am getting error in imports: ImportError: /root/.cache/torch_extensions/fused/fused.so: cannot open shared object file: No such file or directory I am getting error after running from models.psp import pSp I am running on deepnote. Could you please help me with this error?

I am also facing the same issue

@pk2203
Copy link

pk2203 commented Dec 4, 2023

I have used pytorch 1.3.1, CUDA 10.2. It seems like that pytorch version is crucial. (See #1)

How do we install these version on google colab, it shows for me there is no version like that available

@drscotthawley
Copy link

Getting this same error: ImportError: /home/myusername/.cache/torch_extensions/fused/fused.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr10_M_releaseEv

...while running 2-year-old code in a conda environment with PyTorch version 1.6, CUDA 10.1. (10.1 instead of 10.2, because cudatoolkit-dev doesn't exist for 10.2, and without cudatoolkit-dev, conda duesn't set up CUDA_HOME properly):

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 cudatoolkit-dev=10.1 -c pytorch

Deleting the torch_extensions/ didn't help, it just recreated them with the same error.

@Harsha-Musunuri's GDrive link has now expired, whatever it was linking to.

Anyone have any further tips for resolving this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests