New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VQA object tags are different from image feature #73
Comments
Excuse me, may I ask whether you have these files |
No, they didn't provide in DOWNLOAD. I think we should create them by ourselves somehow. |
In this closed issue (#13), I noticed the author has mentioned the way to generate the mask-rcnn-based object labels. I tried to reproduce the labels on the VQA training images. My generated labels are similar but still with some differences compared with the released image labels. I'm not sure whether these generated labels can reproduce the same VQA scores. |
I have exactly the same question. I am so confused about which image features are used for VQA fine-tuning. Whether with predictions.tsv (VinVL features), image_feature_type(faster_r-cnn), or data_label_type(mask r-cnn? #13 (comment)_) |
Same question |
Hi, I am currently working on VQA datasets.
The VQA fine-tune Oscar-base script from
VinVL_MODEL_ZOO.md
use--data_label_type mask
, so it will use the text data fromtrain2014_qla_mrcnn.json
downloaded from https://biglmdiag.blob.core.windows.net/vinvl/datasets/vqaI found that the object tags in
train2014_qla_mrcnn.json
are different from theprediction.tsv
downloaded from pre-exacted COCO 2014 Train/Val Image Features (~50G). But the img_features length are the same.Because the script use
--img_feature_type faster_r-cnn
and--data_label_type mask
. I guess the input object tags(text) use tags frommask
and the image feature use the feature fromfaster_r-cnn
.Can you explain the design choice? Do you have the experiment result of
--img_feature_type faster_r-cnn
and--data_label_type faster
?Thanks!
The text was updated successfully, but these errors were encountered: