-
Notifications
You must be signed in to change notification settings - Fork 287
unable to start train #27
Comments
@git-hcLee It is good from my side.. What is your OS version and gcc version? |
looks like the situation of mine, #14, you can try my way to work around without random seed. |
Could you post the backtrace of the dump? For me I rebuilt pytorch from source using gcc 5.4.0-1 then it works fine. |
@yuandong-tian |
@EasyHard Thanks, I'll try it. |
I met the same problem. I also got the Segmentation fault. I use gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0. Can not start to train using run.py. |
Could any of you post a backtrace of the dump? Just for more information. |
hi,I can run standalone backend game_MC successfully, but when I try to train, I got a message as below: |
@Liujiachen: Check your gcc and libcpp version? |
Hi, I am also having segmentation fault problem. And here is what EasyHard was asking for: Thread 1 "python3" received signal SIGSEGV, Segmentation fault. |
@gchlodzinski Your stack looks similar to what I've encountered. Compiling pytorch from source with gcc-5.4 helped me on this. Haven't got a chance to really figure out why this happens though. |
@EasyHard Thanks, it helped to get things started. RuntimeError: input and target have different number of elements: input[128 x 1] has 128 elements, while target[128 x 128] has 16384 elements at /home/grzegorz/pytorch/torch/lib/THCUNN/generic/SmoothL1Criterion.cu:12 Edit: moreover I have the same result even when I reinstall the whole system from scratch and used this time conda for python and packages. It still crashes when I change batch size to various different numbers (but power of 2) - just at different iteration number. |
@gchlodzinski Hi, have you solved the above problem? |
@yuandong-tian Version: 99b9e219b9e23bdc7c5e710c0aec531219d5e9e0_ ./script.sh: line 1: 18981 Segmentation fault (core dumped) game=./rts/game_MC/game model=actor_critic model_file=./rts/game_MC/model python3 run.py --num_games 1024 --batchsize 128 --freq_update 50 --fs_opponent 20 --latest_start 500 --latest_start_decay 0.99 --opponent_type AI_SIMPLE --tqdm --gpu 0 --T 20 |
@LinZichuan, I was not able to find solution to my runtime error problem. I also tried to run ELF on Mac OS but there failed as well (strange CUDA error message). |
@LinZichuan See #45 |
@LinZichuan @gchlodzinski @git-hcLee This commit f268feb might address your issue. |
hi
I can run standalone backend game_MC successfully, but when I try to run the codes below
I get this error message:
The program just terminates with
segmentation fault
.The text was updated successfully, but these errors were encountered: