You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch.
#7
Closed
nievuelo opened this issue
Feb 23, 2023
· 5 comments
I mean, the first epoch seems work well, but in the second epoch and during the mid time of the seonc batch like 4660/5033, my thread will be killed and the pycharm tips are like:"interrupted by signal 9: sigkill" I check my log in linux and it said cpu out of memory, But my main memory of linux is 64g, I think it might be enough. I don't know what causes this kind of error.
Thanks a lot for your generous help. I will star this project.
But I mean it's not the storage capacity 199g, it's main memory capacity 64g. So I revised the eval_exp_fc5 from report each claim=12->report_each_claim=6, and change the report _each_claim from 30 to 15 in train_exp_fc5_liar_raw2. But it seems still throw kill sig problem. But the difference is before I change the parameter, it would crash during training time but after I revised it, it would crash during evaluation time. But I just had changed the parameter of report_each_claim in evaluate_model func from eval_exp_fc5. So could you help me to figure it out?
Hi, @nievuelo . I can not figure it out with limited information, but you can try this code on the other machine with GPU 3090 because we have successfully tested it.
My GPU quadro 6000 24g main cpu memory 64g
The text was updated successfully, but these errors were encountered: