Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset information #4

Closed
jaeseokbyun opened this issue Mar 15, 2023 · 2 comments
Closed

Dataset information #4

jaeseokbyun opened this issue Mar 15, 2023 · 2 comments

Comments

@jaeseokbyun
Copy link

Hello, first of all, thank you for sharing great work!

I try to use your work, but I have some uncertainties in dataset.
In your Dataset.md, you point out that 4M dataset is cleaned followed by BLIP.

Does it mean that your 4M dataset is filtered and synthetically generated as BLIP did? (
Moreover, In Table 2 and 6, it seems that PTP-BLIP scores are different.
What is the difference between these two scores?

Thank you

@FingerRec
Copy link
Collaborator

Hi jaeseokbyun:

Thanks for you attention in our work and carefully check!

Yes, the dataset corpus is from OSCAR and BLIP.
I update the dataset.md, reference for this file for more details.

Thanks for point out the problem in Table6, previous we only use coco image for one times during pertaining.
Later we follow OSCAR, ViLT and use each coco image for five times (each image 5 captions) during pre-training which outperform former significantly. We have alignment this table in camera ready version.

Thanks again.

@jaeseokbyun
Copy link
Author

Thanks for sharing the code and pre-training corpus!
In DATASET.md, there are two corpuses

If I want to reproduce your result in the paper,
could I just use the 4M dataset corpus (not 2M dataset) without changing any other corresponding codes or dataset?
(Except path for dataset)

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants