Mapping-Images-to-Scene-Graphs

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Introduction

Scene graph prediction is the task of mapping an image into a set of bounding boxes, along with their categories and relations.

We present a new architecture for graph inference that has the following structural property: on the one hand, the architecture is invariant to input permutations; on the other hand, every permutation-invariant function can be implemented via this architecture.

In this repository, we share our architecture implementation for the task of scene graph prediction.

Model implementation

Scene Graph Predictor (SGP) gets as an input inital confidience distributions per entity and relation and processes these to obtain new labels. SGP satisfies the graph permutation invariance property intoduced in the paper. The model is implemented in TensorFlow. For the initial confidence distributions per entity and relation, we simply re-use features learned by the baseline model from Zellers et al. (2017). (git repositiry https://github.com/rowanz/neural-motifs)

SGP architecture

Our SGP implementation is using an iteratively RNN to process predictions. Each step outputs an improved predictions.

A schematic representation of the architecture. Given an image, a Label predictor outputs initial predictions . Then, our SGP model, computes each element wise. Next, they are summed to create vector , which is concatenated with . Then, is applied, and another summation creates the graph representation. Finally, classifies objects and classifies relation. The process of SGP could be repeated iteratively (in the paper we repeat it 2 times).

For more information, please look at the code (Module/Module.py file) and the paper.

Attention with SGP architecture

Our SGP architecture uses attention at the feature-level for each node during inference. We weight the significance of each feature per node, such that the network can choose which features from adjacent nodes contributes the most information.

An example of attention per entities and global attention over all nodes. The size and location of objects provide a key signal to the attention mechanism. The model assigns higher confidence for the label "tie" when the label "shirt" is detected (third panel from the left). Similarly, the model assigns a higher confidence for the label "eye" when it is located near "hair".

Dependencies

To get started with the framework, install the following dependencies:

Run "pip install -r requirements.txt" - to install all the requirements.

Usage

Run "python Run.py download" to download and extract train, validation and test data. The data already contains the result of applying the baseline detecor over the VisualGenome data.
Run "python Run.py eval gpi_linguistic_pretrained <gpu-number>" to evaluate the pre-trained model of our best variant, linguistic with multi-head attention. (recall@100 SG Classification).
Run "python Run.py train gpi_linguistic <gpu-number>" to train a new model (linguistic with multi-head attention).
Run "python Run.py eval gpi_linguistic_best <gpu-number>" to evaluate the new model. (recall@100 SG Classification).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
FilesManager		FilesManager
Module		Module
Utils		Utils
README.md		README.md
Run.py		Run.py
__init__.py		__init__.py
qualitive_results_att_boxes.png		qualitive_results_att_boxes.png
requirements.txt		requirements.txt
sg_example_final.png		sg_example_final.png
sgp_arch_git.png		sgp_arch_git.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mapping-Images-to-Scene-Graphs

Introduction

Model implementation

SGP architecture

Attention with SGP architecture

Dependencies

Usage

About

Releases

Packages

Languages

nips2018axiomatic/Mapping-Images-to-Scene-Graphs-master

Folders and files

Latest commit

History

Repository files navigation

Mapping-Images-to-Scene-Graphs

Introduction

Model implementation

SGP architecture

Attention with SGP architecture

Dependencies

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages