Visual Feature representation

Name: Each feature file is named as [File ID].npy which corresponds to the file ID in Flickr30K Entities.
Proposal generation: We use Selective Search to generate proposals for each image in Flickr30K Entities. For Referit Game dataset, we use Edge Box to generate proposals for each image. We select top 100 proposals in each image.
Feature extractor: We apply a Faster-RCNN network pre-trained on PASCAL VOC 2012 for Flickr30K Entities and pre-trained on ImageNet for Referit Game. The visual feature for each image in these two datasets is represented as a 100 x 4096 matrix. Each row corresponds to visual feature (fc7 layer of Faster-RCNN) in each proposal bounding box.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Feature representation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Feature representation