Skip to content

Siamese network for binary Visual Question Answering in Torch

Notifications You must be signed in to change notification settings

sumehta/siamese_network_vqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Siamese Network for Binary Visual Question Answering

Introduction

Binary Visual Question Answering is the task of visual verification of concepts in the image. Concepts are represented by <P, R, S> tuples, if the concept is present then the answer is yes if the concept is absent the answer is no. Here P is the primary object, R is a relation and S is a secondary concept.

Example: <cat, in, room>

Model

Base Model - One branch extracts image features corresponding to the objects in the tuple and another branch encodes the question using LSTM. Features are fused using an MLP followed by cross entropy.

Proposed Model - A siamese network based architecture for binary(Yes/No) visual question answering (VQA) with a max-margin loss. The basic idea is to have two VQA networks that share parameters. The loss function is designed in such a way that the ‘yes’ probability output by the ‘yes’ network is atleast a margin away from the ‘yes’ probability output by the ‘no’ network. For background and details on the binary VQA task please refer to this paper.

Project Structure

About

Siamese network for binary Visual Question Answering in Torch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages