Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on retrain the network or run evaluation #11

Closed
asharma-fy opened this issue May 31, 2021 · 2 comments
Closed

Segmentation fault on retrain the network or run evaluation #11

asharma-fy opened this issue May 31, 2021 · 2 comments

Comments

@asharma-fy
Copy link

Hi Gernot,

Thank you for the great work!

I was trying to get the evaluation scripts exp.py running for both evaluation and retrain. However, I consistently get a segmentation fault like so:

python exp.py --net resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16 --cmd retrain
.
.
.
.
[2021-05-31/09:52/INFO/mytorch] Setup training data loader and other stuff                                                                                                            
invalid device function in /home/fyusion/Documents/projects/StableViewSynthesis/ext/mytorch/include/common_cuda.h at 171                                                             
[1]    633554 segmentation fault (core dumped)  python exp.py --net  --cmd retrain   

Some more details of my system installation:

python -c 'from torch.utils.collect_env import main; main()'
Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 20.04.2 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.16.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: TITAN V
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti

Nvidia driver version: 460.73.01
cuDNN version: /usr/lib/cuda-10.0/lib64/libcudnn.so.7.4.1

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.6.0
[pip3] torch-geometric==1.7.0
[pip3] torch-scatter==2.0.6
[pip3] torch-sparse==0.6.9
[pip3] torchvision==0.7.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              hfd86e86_1
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py36he8ac12f_0
[conda] mkl_fft                   1.3.0            py36h54f3939_0
[conda] mkl_random                1.1.1            py36h0573a6f_0
[conda] numpy                     1.19.2           py36h54aff64_0
[conda] numpy-base                1.19.2           py36hfa32c7d_0
[conda] pytorch                   1.6.0           py3.6_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torch-geometric           1.7.0                    pypi_0    pypi
[conda] torch-scatter             2.0.6                    pypi_0    pypi
[conda] torch-sparse              0.6.9                    pypi_0    pypi
[conda] torchvision               0.7.0                py36_cu102    pytorch
@KaLiMaLi555
Copy link

Hey @asharma-fy, I am facing the same issue during training and evaluation
Since this issue was closed, I am hoping you got a fix for this. Can you help me?
Thanks in advance

@akashsharma02
Copy link

akashsharma02 commented Jul 5, 2021

@KaLiMaLi555 I don't remember exactly, but my issue was fixed when I updated my cudatoolkit version from 10.1.135 to 11+, and following pip installations of the exact versions on the repository in a conda environment with Python 3.8+.

Hope this was helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants