Skip to content

lkopf/joint-multimodal-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparing Multimodal Representations in Co-Attention-Based Models for Visual Question Answering

This repository includes code for our paper. We investigate the properties of joint multimodal representations derived from both a task-specific model and a multi-task model with respect to different training objective and information streams. We compare MCAN and multi-task ViLBERT on the VQA task and evaluate their performance on the VQA 2.0 and GQA datasets. We extend the implementation of both MCAN and multi-task ViLBERT.