Comparing Multimodal Representations in Co-Attention-Based Models for Visual Question Answering

This repository includes code for our paper. We investigate the properties of joint multimodal representations derived from both a task-specific model and a multi-task model with respect to different training objective and information streams. We compare MCAN and multi-task ViLBERT on the VQA task and evaluate their performance on the VQA 2.0 and GQA datasets. We extend the implementation of both MCAN and multi-task ViLBERT.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
openvqa		openvqa
reports		reports
results		results
vilbert-multi-task		vilbert-multi-task
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Multimodal Representations in Co-Attention-Based Models for Visual Question Answering

About

Releases

Packages

Contributors 2

Languages

lkopf/joint-multimodal-embeddings

Folders and files

Latest commit

History

Repository files navigation

Comparing Multimodal Representations in Co-Attention-Based Models for Visual Question Answering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages