Skip to content
/ TR2 Public

code of Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Notifications You must be signed in to change notification settings

qncsn2016/TR2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

paper | video | slides

This repository contains the code implementation for the paper "Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs" accepted by ICRA 2023.

Introduction

Dynamic scene graphs generated from video clips could help enhance the semantic visual understanding in a wide range of challenging tasks. In the process of temporal and spatial modeling during dynamic scene graph generation, it is particularly intractable to learn time-variant relations in dynamic scene graphs among frames. In this paper, we propose a Time-variant Relation-aware TRansformer (TR2), which aims to model the temporal change of relations in dynamic scene graphs.

Getting Started

To get started with the code and reproduce the results presented in the paper, follow the steps below:

  1. Clone this repository:
git clone https://github.com/qncsn2016/TR2.git
  1. Environment:

We use Python 3.7, PyTorch 1.10, and torchvision 0.11. Install the required dependencies:

pip install -r requirements.txt
  1. Dataset:

Our experiments are conducted on the Action Genome (AG) dataset, which is the benchmark dataset of dynamic scene graph generation. Download and process the dataset according to the official repository of AG. Please modify the data_path in the config.

Following STTran, we keep bounding boxes with short edges larger than 16 pixels for SGCls and SGDet tasks. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader.

  1. We borrow some compiled code for bbox operations:
cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch. Following STTran, we use the pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

For the cross-modality guidance module, please download the ViT-B/32 model and put it in

lib/models/clip/ViT-B-32.pt

If you want to accelerate the training speed, we recommend downloading the precomputed features and modifying the pre_path in the config to the corresponding download path.

We borrowed some code from STTran.

Train

# PredCls
python train.py --mode predcls
# SgCls
python train.py --mode sgcls
# SgDet
python train_amp.py

Evaluation

python test.py --mode mode --model_path path_to_ckpt

Results

setting ckpt With
R@20
No
R@20
Top 6
R@20
PredCls link 73.8 96.6 93.5
SgCls link 48.7 64.4 62.4
SgDet link 35.5 39.2 39.1

Citation

If you find this work useful in your research, please consider citing our paper:

@inproceedings{tr2,
    title = {Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs},
    author = {Jingyi Wang, Jinfa Huang, Can Zhang, and Zhidong Deng},
    booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
    year = {2023}
}

About

code of Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published