question about ITM loss #53

Qiulin-W · 2022-02-17T08:53:28Z

Hi,

Thanks for the great work.
After reading the code for calculating ITM loss, I have a question below:

The itm labels for positive and negative samples are in a "fixed" order instead of being shuffled. I'm wondering whether the order be an issue for the ITM loss to work correctly? In some other VLP models such as ViLT, the ITM loss is calculated based on an shuffled pos-neg batches, which is detailed at https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L207

Thanks in advance for your kind reply.

LiJunnan1992 · 2022-02-17T09:08:16Z

Thanks for your question!

First, the negative samples are sampled from the mini-batch.
Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Qiulin-W · 2022-02-17T09:22:25Z

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Thanks so much for your reply! And what is the magnitude of ITM loss after pretraining?

LiJunnan1992 · 2022-02-17T09:32:50Z

It is around 0.11-0.13

Qiulin-W · 2022-02-17T11:33:55Z

It is around 0.11-0.13

Thanks so much!

4fee8fea · 2022-08-25T07:23:34Z

Hi, @LiJunnan1992
Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

LiJunnan1992 · 2022-08-25T08:22:15Z

Hi, @LiJunnan1992 Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question!
First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Yes you can do it. Please check out our BLIP's code on hard negative mining from all GPUs:
https://github.com/salesforce/BLIP/blob/main/models/blip_retrieval.py

4fee8fea · 2022-08-25T10:30:25Z

Hi, @LiJunnan1992

Thanks for your reply! We will follow the BLIP work again.

We want to follow the promising ALBEF and BLIP, however the dataset becomes a obstacle.

The SBU Captions is inaccessible, could you please offer one copy to us? Thanks! We will do our best to move forward and appreciate your enthusiastic help！

Qiulin-W closed this as completed Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about ITM loss #53

question about ITM loss #53

Qiulin-W commented Feb 17, 2022

LiJunnan1992 commented Feb 17, 2022

Qiulin-W commented Feb 17, 2022

LiJunnan1992 commented Feb 17, 2022

Qiulin-W commented Feb 17, 2022

4fee8fea commented Aug 25, 2022

LiJunnan1992 commented Aug 25, 2022

4fee8fea commented Aug 25, 2022

question about ITM loss #53

question about ITM loss #53

Comments

Qiulin-W commented Feb 17, 2022

LiJunnan1992 commented Feb 17, 2022

Qiulin-W commented Feb 17, 2022

LiJunnan1992 commented Feb 17, 2022

Qiulin-W commented Feb 17, 2022

4fee8fea commented Aug 25, 2022

LiJunnan1992 commented Aug 25, 2022

4fee8fea commented Aug 25, 2022