Skip to content

Latest commit

 

History

History
4 lines (4 loc) · 1.07 KB

README.md

File metadata and controls

4 lines (4 loc) · 1.07 KB

Visual Feature representation

Name: Each feature file is named as [File ID].npy which corresponds to the file ID in Flickr30K Entities.
Proposal generation: We use Selective Search to generate proposals for each image in Flickr30K Entities. For Referit Game dataset, we use Edge Box to generate proposals for each image. We select top 100 proposals in each image.
Feature extractor: We apply a Faster-RCNN network pre-trained on PASCAL VOC 2012 for Flickr30K Entities and pre-trained on ImageNet for Referit Game. The visual feature for each image in these two datasets is represented as a 100 x 4096 matrix. Each row corresponds to visual feature (fc7 layer of Faster-RCNN) in each proposal bounding box.