Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch. #7

nievuelo · 2023-02-23T06:27:10Z

My GPU quadro 6000 24g main cpu memory 64g

nievuelo · 2023-02-25T01:01:21Z

I mean, the first epoch seems work well, but in the second epoch and during the mid time of the seonc batch like 4660/5033, my thread will be killed and the pycharm tips are like:"interrupted by signal 9: sigkill" I check my log in linux and it said cpu out of memory, But my main memory of linux is 64g, I think it might be enough. I don't know what causes this kind of error.

Nicozwy · 2023-02-25T08:24:01Z

I think that the CPU memory is too small because we have to load many reports.
Have you tried loading a small portation of reports for each claim?

nievuelo · 2023-02-25T10:25:57Z

Thanks a lot for your generous help. I will star this project.
But I mean it's not the storage capacity 199g, it's main memory capacity 64g. So I revised the eval_exp_fc5 from report each claim=12->report_each_claim=6, and change the report _each_claim from 30 to 15 in train_exp_fc5_liar_raw2. But it seems still throw kill sig problem. But the difference is before I change the parameter, it would crash during training time but after I revised it, it would crash during evaluation time. But I just had changed the parameter of report_each_claim in evaluate_model func from eval_exp_fc5. So could you help me to figure it out?

nievuelo · 2023-02-25T15:43:08Z

Or in other words， how big of the required memory is?

Nicozwy · 2023-02-25T15:59:50Z

Hi, @nievuelo . I can not figure it out with limited information, but you can try this code on the other machine with GPU 3090 because we have successfully tested it.

nievuelo closed this as completed Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch. #7

Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch. #7

nievuelo commented Feb 23, 2023 •

edited

Loading

nievuelo commented Feb 25, 2023

Nicozwy commented Feb 25, 2023

nievuelo commented Feb 25, 2023

nievuelo commented Feb 25, 2023

Nicozwy commented Feb 25, 2023

Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch. #7

Thx for your job. I wonder what's suitable to train model, specially in what type of gpu and how big of gpu memory. I trained a few times in quadro 6000 but the log said sigkill problem in second epoch. #7

Comments

nievuelo commented Feb 23, 2023 • edited Loading

nievuelo commented Feb 25, 2023

Nicozwy commented Feb 25, 2023

nievuelo commented Feb 25, 2023

nievuelo commented Feb 25, 2023

Nicozwy commented Feb 25, 2023

nievuelo commented Feb 23, 2023 •

edited

Loading