VQA object tags are different from image feature #73

kehanlu · 2021-03-22T09:38:25Z

Hi, I am currently working on VQA datasets.
The VQA fine-tune Oscar-base script from VinVL_MODEL_ZOO.md use --data_label_type mask, so it will use the text data from train2014_qla_mrcnn.json downloaded from https://biglmdiag.blob.core.windows.net/vinvl/datasets/vqa

I found that the object tags in train2014_qla_mrcnn.json are different from the prediction.tsv downloaded from pre-exacted COCO 2014 Train/Val Image Features (~50G). But the img_features length are the same.

Because the script use--img_feature_type faster_r-cnn and --data_label_type mask. I guess the input object tags(text) use tags from mask and the image feature use the feature from faster_r-cnn.

Can you explain the design choice? Do you have the experiment result of --img_feature_type faster_r-cnn and --data_label_type faster?

Thanks!

The text was updated successfully, but these errors were encountered:

yangapku · 2021-03-23T11:53:42Z

Excuse me, may I ask whether you have these files train+val2014_qla_mrcnn.json, test2015_qla_mrcnn.json and test-dev2015_qla_mrcnn.json? I found these files are missing, making it difficult for inference and official evaluation.

kehanlu · 2021-03-29T11:15:42Z

Excuse me, may I ask whether you have these files train+val2014_qla_mrcnn.json, test2015_qla_mrcnn.json and test-dev2015_qla_mrcnn.json? I found these files are missing, making it difficult for inference and official evaluation.

No, they didn't provide in DOWNLOAD. I think we should create them by ourselves somehow.

yangapku · 2021-04-06T09:16:52Z

In this closed issue (#13), I noticed the author has mentioned the way to generate the mask-rcnn-based object labels. I tried to reproduce the labels on the VQA training images. My generated labels are similar but still with some differences compared with the released image labels. I'm not sure whether these generated labels can reproduce the same VQA scores.

CCYChongyanChen · 2021-10-08T17:03:01Z

I have exactly the same question. I am so confused about which image features are used for VQA fine-tuning. Whether with predictions.tsv (VinVL features), image_feature_type(faster_r-cnn), or data_label_type(mask r-cnn? #13 (comment)_)
Have you figured it out? Many thanks!

shizhediao · 2022-01-27T01:06:34Z

Same question

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VQA object tags are different from image feature #73

VQA object tags are different from image feature #73

kehanlu commented Mar 22, 2021 •

edited

yangapku commented Mar 23, 2021

kehanlu commented Mar 29, 2021

yangapku commented Apr 6, 2021

CCYChongyanChen commented Oct 8, 2021

shizhediao commented Jan 27, 2022

VQA object tags are different from image feature #73

VQA object tags are different from image feature #73

Comments

kehanlu commented Mar 22, 2021 • edited

yangapku commented Mar 23, 2021

kehanlu commented Mar 29, 2021

yangapku commented Apr 6, 2021

CCYChongyanChen commented Oct 8, 2021

shizhediao commented Jan 27, 2022

kehanlu commented Mar 22, 2021 •

edited