Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #14

Open
ding-hai opened this issue Aug 23, 2018 · 15 comments
Open

Segmentation fault (core dumped) #14

ding-hai opened this issue Aug 23, 2018 · 15 comments

Comments

@ding-hai
Copy link

Doing 560 frames
Segmentation fault (core dumped)
#5

@tcwang0509
Copy link
Contributor

This is due to a discrepancy in PyTorch version and we're trying to solve it. For now, only PyTorch 0.4.0 is supported. We'll update once we fix it.

@tcwang0509
Copy link
Contributor

This should be fixed now. Please pull the latest code and try again.

@zzzkk2009
Copy link

@tcwang0509

i pulled the latest code, when i reinstall flownet2-pytorch, i got below errors:

nvcc fatal : Unsupported gpu architecture 'compute_70'
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

is it only suport cuda9.0?

@ding-hai
Copy link
Author

ding-hai commented Aug 24, 2018

@zzzkk2009
Now I use previous version of the code and change pytorch version to 0.4.0 .
And it works .
〒▽〒

@ding-hai
Copy link
Author

While everything may not be ok,
The result directory doesn`t contain a file name index.html .
There are many images in the result directory.

@zzzkk2009
Copy link

@ding-hai

i commented

'-gencode', 'arch=compute_70,code=sm_70',

'-gencode', 'arch=compute_70,code=compute_70'

these toe lines in

/vid2vid/models/flownet2_pytorch/networks/channelnorm_package/set_up.py
/vid2vid/models/flownet2_pytorch/networks/resample2d_package/set_up.py
/vid2vid/models/flownet2_pytorch/networks/correlation_package/set_up.py

files, and i can get the result dir, and i also have not index.html file in the dir, same with you.

@ding-hai
Copy link
Author

@zzzkk2009
╮(╯▽╰)╭

@kekedan
Copy link

kekedan commented Sep 28, 2018

@tcwang0509 Segmentation fault still exists

@tcwang0509
Copy link
Contributor

@kekedan are you able to run flownet2?

@kekedan
Copy link

kekedan commented Oct 10, 2018

@tcwang0509
run flownet2 failure,and I use pytorch 0.2 to Compile flownet2 ,now it works.

@yuanzhou15
Copy link

@tcwang0509 Hi, im trying to test the model, by running 'bash ./scripts/street/test_2048.sh', but I am also getting segmentation fault. I am using CUDA 9.2, and torch 0.4.1. I also get segmentation error when I try training on my own dataset. I downloaded flownet2 by running the provided script.

@michaelshiyu
Copy link

michaelshiyu commented Jul 18, 2019

Hi @tcwang0509, the segmentation fault issue still exists.

torch=0.4.1
cuda=9.0

flownet2 compiles ok. Testing the pre-trained cityscapes model is ok. But training on single or multi gpu both trigger the seg fault. Here's the complete stdout when training on single gpu:

------------ Options -------------
TTUR: False
add_face_disc: False
basic_point_only: False
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: datasets/Cityscapes/
dataset_mode: temporal
debug: False
densepose_only: False
display_freq: 100
display_id: 0
display_winsize: 512
feat_num: 3
fg: True
fg_labels: [26]
fineSize: 512
fp16: False
gan_mode: ls
gpu_ids: [0]
input_nc: 3
isTrain: True
label_feat: False
label_nc: 35
lambda_F: 10.0
lambda_T: 10.0
lambda_feat: 10.0
loadSize: 256
load_features: False
load_pretrain: 
local_rank: 0
lr: 0.0002
max_dataset_size: inf
max_frames_backpropagate: 1
max_frames_per_gpu: 6
max_t_step: 1
model: vid2vid
nThreads: 2
n_blocks: 9
n_blocks_local: 3
n_downsample_E: 3
n_downsample_G: 2
n_frames_D: 3
n_frames_G: 3
n_frames_total: 6
n_gpus_gen: 1
n_layers_D: 3
n_local_enhancers: 1
n_scales_spatial: 1
n_scales_temporal: 2
name: label2city_256_g1
ndf: 64
nef: 32
netE: simple
netG: composite
ngf: 128
niter: 10
niter_decay: 10
niter_fix_global: 0
niter_step: 5
no_canny_edge: False
no_dist_map: False
no_first_img: False
no_flip: False
no_flow: False
no_ganFeat: False
no_html: False
no_vgg: False
norm: batch
num_D: 1
openpose_only: False
output_nc: 3
phase: train
pool_size: 1
print_freq: 100
random_drop_prob: 0.05
random_scale_points: False
remove_face_labels: False
resize_or_crop: scaleWidth
save_epoch_freq: 1
save_latest_freq: 1000
serial_batches: False
sparse_D: False
tf_log: False
use_instance: True
use_single_G: False
which_epoch: latest
-------------- End ----------------
CustomDatasetDataLoader
dataset [TemporalDataset] was created
#training videos = 6
vid2vid
---------- Networks initialized -------------
-----------------------------------------------
---------- Networks initialized -------------
-----------------------------------------------
create web directory ./checkpoints/label2city_256_g1/web...
Segmentation fault

Many thanks.

@zhuhaozh
Copy link

@michaelshiyu Hi, did you solve the segmentation fault error? I meet the same problem...

CUDA 9.0
Pytorch 1.0.0

@michaelshiyu
Copy link

michaelshiyu commented Jul 26, 2019

Hi, @zhuhaozh!

Yes, it works now after I installed things by reading into the Dockerfile and following the set-ups there. There are some bugs in the Dockerfile, I think. For example, I think the desired environment uses cuda 9.0 but the torch install instruction there would have you install a PyTorch version compiled with cuda 8.0, which will result in extremely slow runtimes if your cuda is actually 9.0.

I'm not sure what caused the segfault earlier so I will just post as much info about my current working set-up as possible. Hopefully, this would work for you and anyone else stuck with this issue.

Right now my working environment has

GPU: NVIDIA Tesla V100 w/ driver version 390.30
python 3.5.6
cuda 9.0
cudnn 7

And here's the complete output of my conda list. This might be more information than you need though.

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.7.1                    pypi_0    pypi
astor                     0.8.0                    pypi_0    pypi
backcall                  0.1.0                    pypi_0    pypi
ca-certificates           2019.5.15                     0  
certifi                   2018.8.24                py35_1  
cffi                      1.12.3                   pypi_0    pypi
chardet                   3.0.4                    pypi_0    pypi
colorama                  0.3.7                    pypi_0    pypi
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.0                    pypi_0    pypi
dill                      0.3.0                    pypi_0    pypi
dominate                  2.3.5                    pypi_0    pypi
future                    0.17.1                   pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
grpcio                    1.22.0                   pypi_0    pypi
h5py                      2.9.0                    pypi_0    pypi
idna                      2.8                      pypi_0    pypi
imageio                   2.5.0                    pypi_0    pypi
ipython                   7.6.1                    pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
jedi                      0.14.1                   pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.0                    pypi_0    pypi
kiwisolver                1.1.0                    pypi_0    pypi
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
markdown                  3.1.1                    pypi_0    pypi
matplotlib                3.0.3                    pypi_0    pypi
mock                      3.0.5                    pypi_0    pypi
ncurses                   6.1                  he6710b0_1  
networkx                  2.3                      pypi_0    pypi
numpy                     1.16.4                   pypi_0    pypi
opencv-python             4.1.0.25                 pypi_0    pypi
openssl                   1.0.2s               h7b6447c_0  
parso                     0.5.1                    pypi_0    pypi
pexpect                   4.7.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    6.1.0                    pypi_0    pypi
pip                       19.1.1                   pypi_0    pypi
prompt-toolkit            2.0.9                    pypi_0    pypi
protobuf                  3.9.0                    pypi_0    pypi
ptyprocess                0.6.0                    pypi_0    pypi
pycparser                 2.19                     pypi_0    pypi
pygments                  2.4.2                    pypi_0    pypi
pyparsing                 2.4.1                    pypi_0    pypi
python                    3.5.6                hc3d631a_0  
python-dateutil           2.8.0                    pypi_0    pypi
pytz                      2019.1                   pypi_0    pypi
pywavelets                1.0.3                    pypi_0    pypi
readline                  7.0                  h7b6447c_5  
requests                  2.22.0                   pypi_0    pypi
scikit-image              0.15.0                   pypi_0    pypi
scipy                     1.2.0                    pypi_0    pypi
setproctitle              1.1.10                   pypi_0    pypi
setuptools                40.2.0                   py35_0  
six                       1.12.0                   pypi_0    pypi
sqlite                    3.29.0               h7b6447c_0  
tensorboard               1.13.1                   pypi_0    pypi
tensorboardx              1.8                      pypi_0    pypi
tensorflow                1.13.1                   pypi_0    pypi
tensorflow-estimator      1.13.0                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
torch                     0.4.0                    pypi_0    pypi
torchvision               0.2.0                    pypi_0    pypi
tqdm                      4.32.2                   pypi_0    pypi
traitlets                 4.3.2                    pypi_0    pypi
urllib3                   1.25.3                   pypi_0    pypi
wcwidth                   0.1.7                    pypi_0    pypi
werkzeug                  0.15.5                   pypi_0    pypi
wheel                     0.31.1                   py35_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  

@birdflyto
Copy link

i have tested on several cuda,cudnn and pytorch version ,the latest vesion is pytorch1.0.1 cuda9.0 cudnn7.1.2,but all the version met the same error(segmentation fault(core dumped)). i have no idea to solve the problem.
Many thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants