Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VQA object tags are different from image feature #73

Open
kehanlu opened this issue Mar 22, 2021 · 5 comments
Open

VQA object tags are different from image feature #73

kehanlu opened this issue Mar 22, 2021 · 5 comments

Comments

@kehanlu
Copy link

kehanlu commented Mar 22, 2021

Hi, I am currently working on VQA datasets.
The VQA fine-tune Oscar-base script from VinVL_MODEL_ZOO.md use --data_label_type mask, so it will use the text data from train2014_qla_mrcnn.json downloaded from https://biglmdiag.blob.core.windows.net/vinvl/datasets/vqa

I found that the object tags in train2014_qla_mrcnn.json are different from the prediction.tsv downloaded from pre-exacted COCO 2014 Train/Val Image Features (~50G). But the img_features length are the same.

Because the script use--img_feature_type faster_r-cnn and --data_label_type mask. I guess the input object tags(text) use tags from mask and the image feature use the feature from faster_r-cnn.

Can you explain the design choice? Do you have the experiment result of --img_feature_type faster_r-cnn and --data_label_type faster?

Thanks!

@yangapku
Copy link

Excuse me, may I ask whether you have these files train+val2014_qla_mrcnn.json, test2015_qla_mrcnn.json and test-dev2015_qla_mrcnn.json? I found these files are missing, making it difficult for inference and official evaluation.

@kehanlu
Copy link
Author

kehanlu commented Mar 29, 2021

Excuse me, may I ask whether you have these files train+val2014_qla_mrcnn.json, test2015_qla_mrcnn.json and test-dev2015_qla_mrcnn.json? I found these files are missing, making it difficult for inference and official evaluation.

No, they didn't provide in DOWNLOAD. I think we should create them by ourselves somehow.

@yangapku
Copy link

yangapku commented Apr 6, 2021

In this closed issue (#13), I noticed the author has mentioned the way to generate the mask-rcnn-based object labels. I tried to reproduce the labels on the VQA training images. My generated labels are similar but still with some differences compared with the released image labels. I'm not sure whether these generated labels can reproduce the same VQA scores.

@CCYChongyanChen
Copy link

I have exactly the same question. I am so confused about which image features are used for VQA fine-tuning. Whether with predictions.tsv (VinVL features), image_feature_type(faster_r-cnn), or data_label_type(mask r-cnn? #13 (comment)_)
Have you figured it out? Many thanks!

@shizhediao
Copy link

Same question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants