-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference #13
Comments
The best checkpoints were selected and renamed manually based on validation results. |
Thanks for your reply. How much time does it take to fine-tune a model eg: MSVD. I am using 8 gpus and 1 epoch takes almost 40 hours. |
We use 8 x A100 (40Gb) for VideoQA models, n_frames=16, input to Timesformer=224x224. Total training time for 15 epoch is in a couple of hours. 10 epochs usually gave good enough results. 40 hours per epoch sounds a bit unexpected. But please refer to the spec above for reference. You may also profile your code to better understand what's going on. |
Hi,
In the inference we always load the best model. However, after fine-tuning there is no checkpoint named $OUTPUT_DIR/ckpt/model_step_best.pt. Can you point to the line in the code where the best checkpoint is saved?
Thank you.
The text was updated successfully, but these errors were encountered: