Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about ITM loss #53

Closed
Qiulin-W opened this issue Feb 17, 2022 · 7 comments
Closed

question about ITM loss #53

Qiulin-W opened this issue Feb 17, 2022 · 7 comments

Comments

@Qiulin-W
Copy link

Hi,

Thanks for the great work.
After reading the code for calculating ITM loss, I have a question below:
ec5abdbd-e78c-41e5-9ccf-334ec1d4a0cf

The itm labels for positive and negative samples are in a "fixed" order instead of being shuffled. I'm wondering whether the order be an issue for the ITM loss to work correctly? In some other VLP models such as ViLT, the ITM loss is calculated based on an shuffled pos-neg batches, which is detailed at https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L207

Thanks in advance for your kind reply.

@LiJunnan1992
Copy link
Contributor

Thanks for your question!

First, the negative samples are sampled from the mini-batch.
Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

@Qiulin-W
Copy link
Author

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Thanks so much for your reply! And what is the magnitude of ITM loss after pretraining?

@LiJunnan1992
Copy link
Contributor

It is around 0.11-0.13

@Qiulin-W
Copy link
Author

It is around 0.11-0.13

Thanks so much!

@4fee8fea
Copy link

Hi, @LiJunnan1992
Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question!

First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

@LiJunnan1992
Copy link
Contributor

Hi, @LiJunnan1992 Thanks for your work and make it public!

I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.

Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?

Thanks in advance!

Thanks for your question!
First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.

Yes you can do it. Please check out our BLIP's code on hard negative mining from all GPUs:
https://github.com/salesforce/BLIP/blob/main/models/blip_retrieval.py

@4fee8fea
Copy link

Hi, @LiJunnan1992

Thanks for your reply! We will follow the BLIP work again.

We want to follow the promising ALBEF and BLIP, however the dataset becomes a obstacle.

The SBU Captions is inaccessible, could you please offer one copy to us? Thanks! We will do our best to move forward and appreciate your enthusiastic help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants