Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the training epochs #9

Closed
FutureOpenAI opened this issue Feb 2, 2022 · 8 comments
Closed

Regarding the training epochs #9

FutureOpenAI opened this issue Feb 2, 2022 · 8 comments

Comments

@FutureOpenAI
Copy link

FutureOpenAI commented Feb 2, 2022

Thanks for sharing this awesome codes!

Is it necessary to train the model for 120 epochs? Since there are more than 1M training samples.
Can you share some performance during the training progress? Such as the performance with 30epochs, 60 epochs, 90 epochs?
Since I trained it for several epochs but the loss is still very large.

To double check the training process, can you share how many training samples for each epoch?

Thanks so much!

@kbrodt
Copy link
Owner

kbrodt commented Feb 7, 2022

Hi,

You can check the report. We use 136,200 samples per epoch (in 3.3. implementation details the total number of iterations is 340,500 or 120 epochs with batch size 48). The learning curve is on figure 4 page 3. One may see ~115k iterations (or ~40 epochs) are enough to converge.

@FutureOpenAI
Copy link
Author

Thanks for the answering!

May I ask how to get 136,200 this number for each epoch?
Do you only use 1/8 of the full training data to achieve the reported performance? (136,200 seems to be 1/16 of the full training data).

@kbrodt
Copy link
Owner

kbrodt commented Feb 8, 2022

Yes, in prerender.py the default value is 1/8 (or 340,500 iterations / 120 epochs * 48 batch size = 136,200 samples per epoch).

@FutureOpenAI
Copy link
Author

The training set contains about 480k scenes, each has 2-8 interested agents, and the total number of interested agents is about 2M, and 1/8 should be 250k.

Actually after I ran your codes by default setting, I got about 272, 000 training samples instead of 136, 200. Do you know why this happen?

@FutureOpenAI
Copy link
Author

FutureOpenAI commented Feb 8, 2022

I am just surprised that 1/8 training data can achieve a good performance of mAP 20+.

@kbrodt
Copy link
Owner

kbrodt commented Feb 8, 2022

I guess the problem is that the training batch size is 48 (train.py) and the validation batch size is 2 * 48 (train.py). Meantime in the report the figure 4 has shared x-axis shown only for validation. For training the number of iterations should be 2 times more (2 * 340,500 instead 340,500). Hence you are right, 1/8 is 250k.

@kbrodt
Copy link
Owner

kbrodt commented Feb 8, 2022

If I remember correctly, more data increases training time but doesn't improve the performance much.

@FutureOpenAI
Copy link
Author

Thanks so much for your answering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants