Segmentation fault (core dumped) #14

ding-hai opened this issue Aug 23, 2018 · 15 comments

ding-hai opened this issue Aug 23, 2018 · 15 comments


Doing 560 frames
Segmentation fault (core dumped)

This is due to a discrepancy in PyTorch version and we're trying to solve it. For now, only PyTorch 0.4.0 is supported. We'll update once we fix it.

This should be fixed now. Please pull the latest code and try again.

i pulled the latest code, when i reinstall flownet2-pytorch, i got below errors:

nvcc fatal : Unsupported gpu architecture 'compute_70'
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

is it only suport cuda9.0?

ding-hai commented Aug 24, 2018

Now I use previous version of the code and change pytorch version to 0.4.0 .
And it works .

While everything may not be ok,
The result directory doesn`t contain a file name index.html .
There are many images in the result directory.

i commented

'-gencode', 'arch=compute_70,code=sm_70',

'-gencode', 'arch=compute_70,code=compute_70'

these toe lines in


files, and i can get the result dir, and i also have not index.html file in the dir, same with you.

kekedan commented Sep 28, 2018

@tcwang0509 Segmentation fault still exists

@kekedan are you able to run flownet2?

kekedan commented Oct 10, 2018

run flownet2 failure,and I use pytorch 0.2 to Compile flownet2 ,now it works.

@tcwang0509 Hi, im trying to test the model, by running 'bash ./scripts/street/', but I am also getting segmentation fault. I am using CUDA 9.2, and torch 0.4.1. I also get segmentation error when I try training on my own dataset. I downloaded flownet2 by running the provided script.

michaelshiyu commented Jul 18, 2019

Hi @tcwang0509, the segmentation fault issue still exists.


flownet2 compiles ok. Testing the pre-trained cityscapes model is ok. But training on single or multi gpu both trigger the seg fault. Here's the complete stdout when training on single gpu:

------------ Options -------------
TTUR: False
add_face_disc: False
basic_point_only: False
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: datasets/Cityscapes/
dataset_mode: temporal
debug: False
densepose_only: False
display_freq: 100
display_id: 0
display_winsize: 512
feat_num: 3
fg: True
fg_labels: [26]
fineSize: 512
fp16: False
gan_mode: ls
gpu_ids: [0]
input_nc: 3
isTrain: True
label_feat: False
label_nc: 35
lambda_F: 10.0
lambda_T: 10.0
lambda_feat: 10.0
loadSize: 256
load_features: False
local_rank: 0
lr: 0.0002
max_dataset_size: inf
max_frames_backpropagate: 1
max_frames_per_gpu: 6
max_t_step: 1
model: vid2vid
nThreads: 2
n_blocks: 9
n_blocks_local: 3
n_downsample_E: 3
n_downsample_G: 2
n_frames_D: 3
n_frames_G: 3
n_frames_total: 6
n_gpus_gen: 1
n_layers_D: 3
n_local_enhancers: 1
n_scales_spatial: 1
n_scales_temporal: 2
name: label2city_256_g1
ndf: 64
nef: 32
netE: simple
netG: composite
ngf: 128
niter: 10
niter_decay: 10
niter_fix_global: 0
niter_step: 5
no_canny_edge: False
no_dist_map: False
no_first_img: False
no_flip: False
no_flow: False
no_ganFeat: False
no_html: False
no_vgg: False
norm: batch
num_D: 1
openpose_only: False
output_nc: 3
phase: train
pool_size: 1
print_freq: 100
random_drop_prob: 0.05
random_scale_points: False
remove_face_labels: False
resize_or_crop: scaleWidth
save_epoch_freq: 1
save_latest_freq: 1000
serial_batches: False
sparse_D: False
tf_log: False
use_instance: True
use_single_G: False
which_epoch: latest
-------------- End ----------------
dataset [TemporalDataset] was created
#training videos = 6
---------- Networks initialized -------------
---------- Networks initialized -------------
create web directory ./checkpoints/label2city_256_g1/web...
Segmentation fault

Many thanks.

Copy link

@michaelshiyu Hi, did you solve the segmentation fault error? I meet the same problem...

CUDA 9.0
Pytorch 1.0.0

Copy link

michaelshiyu commented Jul 26, 2019

Hi, @zhuhaozh!

Yes, it works now after I installed things by reading into the Dockerfile and following the set-ups there. There are some bugs in the Dockerfile, I think. For example, I think the desired environment uses cuda 9.0 but the torch install instruction there would have you install a PyTorch version compiled with cuda 8.0, which will result in extremely slow runtimes if your cuda is actually 9.0.

I'm not sure what caused the segfault earlier so I will just post as much info about my current working set-up as possible. Hopefully, this would work for you and anyone else stuck with this issue.

Right now my working environment has

GPU: NVIDIA Tesla V100 w/ driver version 390.30
python 3.5.6
cuda 9.0
cudnn 7

And here's the complete output of my conda list. This might be more information than you need though.

Copy link

i have tested on several cuda,cudnn and pytorch version ,the latest vesion is pytorch1.0.1 cuda9.0 cudnn7.1.2,but all the version met the same error(segmentation fault(core dumped)). i have no idea to solve the problem.
Many thanks!!!

