You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I saw the results of your paper and they were truly outstanding.
I have a few questions.
Could you tell me how long it takes to do pretraining and fine-tuning for the coco image-to-text retrieval?
Also, from what I read in your paper, obtaining the grid pseudo label using CLIP takes around 8 hours. Could I understand that the grid pseudo label is a corpus that is extracted to provide positional information through prompts?
Thank you😁!
The text was updated successfully, but these errors were encountered:
The training time is include in the training logs.
For example, on 8 NVIDIA A100 GPUs:
The pretrain time is Training time 1 day, 2:32:57.
The ft time for coco retrieval is Training time 6:53:15.
Exactly. Use CLIP feedforward only to extract most similar keywords/phrases is very fast.
Hello, I saw the results of your paper and they were truly outstanding.
I have a few questions.
Thank you😁!
The text was updated successfully, but these errors were encountered: