Regarding the training epochs #9

FutureOpenAI · 2022-02-02T21:52:16Z

Thanks for sharing this awesome codes!

Is it necessary to train the model for 120 epochs? Since there are more than 1M training samples.
Can you share some performance during the training progress? Such as the performance with 30epochs, 60 epochs, 90 epochs?
Since I trained it for several epochs but the loss is still very large.

To double check the training process, can you share how many training samples for each epoch?

Thanks so much!

kbrodt · 2022-02-07T03:21:19Z

Hi,

You can check the report. We use 136,200 samples per epoch (in 3.3. implementation details the total number of iterations is 340,500 or 120 epochs with batch size 48). The learning curve is on figure 4 page 3. One may see ~115k iterations (or ~40 epochs) are enough to converge.

FutureOpenAI · 2022-02-08T11:28:32Z

Thanks for the answering!

May I ask how to get 136,200 this number for each epoch?
Do you only use 1/8 of the full training data to achieve the reported performance? (136,200 seems to be 1/16 of the full training data).

kbrodt · 2022-02-08T11:54:57Z

Yes, in prerender.py the default value is 1/8 (or 340,500 iterations / 120 epochs * 48 batch size = 136,200 samples per epoch).

FutureOpenAI · 2022-02-08T12:00:37Z

The training set contains about 480k scenes, each has 2-8 interested agents, and the total number of interested agents is about 2M, and 1/8 should be 250k.

Actually after I ran your codes by default setting, I got about 272, 000 training samples instead of 136, 200. Do you know why this happen?

FutureOpenAI · 2022-02-08T12:01:50Z

I am just surprised that 1/8 training data can achieve a good performance of mAP 20+.

kbrodt · 2022-02-08T12:29:23Z

I guess the problem is that the training batch size is 48 (train.py) and the validation batch size is 2 * 48 (train.py). Meantime in the report the figure 4 has shared x-axis shown only for validation. For training the number of iterations should be 2 times more (2 * 340,500 instead 340,500). Hence you are right, 1/8 is 250k.

kbrodt · 2022-02-08T12:36:56Z

If I remember correctly, more data increases training time but doesn't improve the performance much.

FutureOpenAI · 2022-02-08T13:36:09Z

Thanks so much for your answering!

FutureOpenAI closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the training epochs #9

Regarding the training epochs #9

FutureOpenAI commented Feb 2, 2022 •

edited

kbrodt commented Feb 7, 2022

FutureOpenAI commented Feb 8, 2022

kbrodt commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022 •

edited

kbrodt commented Feb 8, 2022

kbrodt commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022

Regarding the training epochs #9

Regarding the training epochs #9

Comments

FutureOpenAI commented Feb 2, 2022 • edited

kbrodt commented Feb 7, 2022

FutureOpenAI commented Feb 8, 2022

kbrodt commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022 • edited

kbrodt commented Feb 8, 2022

kbrodt commented Feb 8, 2022

FutureOpenAI commented Feb 8, 2022

FutureOpenAI commented Feb 2, 2022 •

edited

FutureOpenAI commented Feb 8, 2022 •

edited