Inference #13

avinashsai · 2022-06-20T22:18:51Z

Hi,

In the inference we always load the best model. However, after fine-tuning there is no checkpoint named $OUTPUT_DIR/ckpt/model_step_best.pt. Can you point to the line in the code where the best checkpoint is saved?

Thank you.

dxli94 · 2022-06-21T01:33:10Z

The best checkpoints were selected and renamed manually based on validation results.

avinashsai · 2022-06-21T02:38:03Z

Thanks for your reply. How much time does it take to fine-tune a model eg: MSVD. I am using 8 gpus and 1 epoch takes almost 40 hours.

dxli94 · 2022-06-21T13:41:28Z

We use 8 x A100 (40Gb) for VideoQA models, n_frames=16, input to Timesformer=224x224. Total training time for 15 epoch is in a couple of hours. 10 epochs usually gave good enough results.

40 hours per epoch sounds a bit unexpected. But please refer to the spec above for reference.

You may also profile your code to better understand what's going on.

dxli94 closed this as completed Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference #13

Inference #13

avinashsai commented Jun 20, 2022

dxli94 commented Jun 21, 2022

avinashsai commented Jun 21, 2022

dxli94 commented Jun 21, 2022 •

edited

Inference #13

Inference #13

Comments

avinashsai commented Jun 20, 2022

dxli94 commented Jun 21, 2022

avinashsai commented Jun 21, 2022

dxli94 commented Jun 21, 2022 • edited

dxli94 commented Jun 21, 2022 •

edited