Training time and grid pseudo label extracting time #5

9115jin · 2023-03-29T06:58:30Z

Hello, I saw the results of your paper and they were truly outstanding.
I have a few questions.

Could you tell me how long it takes to do pretraining and fine-tuning for the coco image-to-text retrieval?
Also, from what I read in your paper, obtaining the grid pseudo label using CLIP takes around 8 hours. Could I understand that the grid pseudo label is a corpus that is extracted to provide positional information through prompts?

Thank you😁!

FingerRec · 2023-03-29T07:43:42Z

Hi 9115jin:

The training time is include in the training logs.
For example, on 8 NVIDIA A100 GPUs:
The pretrain time is Training time 1 day, 2:32:57.
The ft time for coco retrieval is Training time 6:53:15.
Exactly. Use CLIP feedforward only to extract most similar keywords/phrases is very fast.

9115jin · 2023-03-30T05:55:15Z

Thank you for your prompt and accurate response!

I'm planning to start researching image-to-text retreival(TR),
and i believe your PTP-BLIP project will be very helpful.

9115jin closed this as completed Mar 30, 2023

Provide feedback