Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference #13

Closed
avinashsai opened this issue Jun 20, 2022 · 3 comments
Closed

Inference #13

avinashsai opened this issue Jun 20, 2022 · 3 comments

Comments

@avinashsai
Copy link

Hi,

In the inference we always load the best model. However, after fine-tuning there is no checkpoint named $OUTPUT_DIR/ckpt/model_step_best.pt. Can you point to the line in the code where the best checkpoint is saved?

Thank you.

@dxli94
Copy link
Contributor

dxli94 commented Jun 21, 2022

The best checkpoints were selected and renamed manually based on validation results.

@avinashsai
Copy link
Author

Thanks for your reply. How much time does it take to fine-tune a model eg: MSVD. I am using 8 gpus and 1 epoch takes almost 40 hours.

@dxli94
Copy link
Contributor

dxli94 commented Jun 21, 2022

We use 8 x A100 (40Gb) for VideoQA models, n_frames=16, input to Timesformer=224x224. Total training time for 15 epoch is in a couple of hours. 10 epochs usually gave good enough results.

40 hours per epoch sounds a bit unexpected. But please refer to the spec above for reference.

You may also profile your code to better understand what's going on.

@dxli94 dxli94 closed this as completed Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants