You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, guys:
Thank you for your diligent work. I'm trying to prepare VQA input for single sample inference.
I'm not sure about the architecture of the VQA model, Such as the "decoder_prompts" , "prefix_tokens" in the autonomously constructed "sample".
and following sentence in the readme description about VQA is vague to me:
"we transform original VQA training questions with multiple golden answers into multiple training samples. "
Do you have any suggestions?
The text was updated successfully, but these errors were encountered:
We use decoder_prompts and prefix_tokens for better VQA finetuning performance. Specifically, for VQA we have an hyper-parameter option called --prompt-type, which determines whether to add the question before the answer in the input sequence of the decoder during finetuning & evaluation. The question has already been input in the encoder, here we consider whether to feed it into the decoder again. If the --prompt-type is not none, then the decoder_prompts and prefix_tokens will record the prepended question to construct the decoder input sequence during evaluation. The decoder_prompts is used for all-candidate evaluation and the prefix_tokens is used for beam-search generative evaluation. In our experiments, we found concatenating the question with the answer in the decoder input sequence improves the accuracy somewhat, compared with not performing concatenation.
For the other question, note that in the original VQAv2 dataset, most questions are annotated with more than one ground-truth answers. However, OFA is a seq2seq model which requires one source sequence (image & question) paired with only one target sequence (ground-truth answer) during training. In this case, we split the original sample with one question paired with more than one answers into multiple seq2seq samples, each consists of the question paired with one of the ground-truth answer.
Hi, guys:
Thank you for your diligent work. I'm trying to prepare VQA input for single sample inference.
I'm not sure about the architecture of the VQA model, Such as the "decoder_prompts" , "prefix_tokens" in the autonomously constructed "sample".
and following sentence in the readme description about VQA is vague to me:
"we transform original VQA training questions with multiple golden answers into multiple training samples. "
Do you have any suggestions?
The text was updated successfully, but these errors were encountered: