You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, first of all, thank you for sharing great work!
I try to use your work, but I have some uncertainties in dataset.
In your Dataset.md, you point out that 4M dataset is cleaned followed by BLIP.
Does it mean that your 4M dataset is filtered and synthetically generated as BLIP did? (
Moreover, In Table 2 and 6, it seems that PTP-BLIP scores are different.
What is the difference between these two scores?
Thank you
The text was updated successfully, but these errors were encountered:
Thanks for you attention in our work and carefully check!
Yes, the dataset corpus is from OSCAR and BLIP.
I update the dataset.md, reference for this file for more details.
Thanks for point out the problem in Table6, previous we only use coco image for one times during pertaining.
Later we follow OSCAR, ViLT and use each coco image for five times (each image 5 captions) during pre-training which outperform former significantly. We have alignment this table in camera ready version.
Thanks for sharing the code and pre-training corpus!
In DATASET.md, there are two corpuses
If I want to reproduce your result in the paper,
could I just use the 4M dataset corpus (not 2M dataset) without changing any other corresponding codes or dataset?
(Except path for dataset)
Hello, first of all, thank you for sharing great work!
I try to use your work, but I have some uncertainties in dataset.
In your Dataset.md, you point out that 4M dataset is cleaned followed by BLIP.
Does it mean that your 4M dataset is filtered and synthetically generated as BLIP did? (
Moreover, In Table 2 and 6, it seems that PTP-BLIP scores are different.
What is the difference between these two scores?
Thank you
The text was updated successfully, but these errors were encountered: