Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

This repository contains the code implementation for the paper "Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs" accepted by ICRA 2023.

Introduction

Dynamic scene graphs generated from video clips could help enhance the semantic visual understanding in a wide range of challenging tasks. In the process of temporal and spatial modeling during dynamic scene graph generation, it is particularly intractable to learn time-variant relations in dynamic scene graphs among frames. In this paper, we propose a Time-variant Relation-aware TRansformer (TR²), which aims to model the temporal change of relations in dynamic scene graphs.

Getting Started

To get started with the code and reproduce the results presented in the paper, follow the steps below:

Clone this repository:

git clone https://github.com/qncsn2016/TR2.git

Environment:

We use Python 3.7, PyTorch 1.10, and torchvision 0.11. Install the required dependencies:

pip install -r requirements.txt

Dataset:

Our experiments are conducted on the Action Genome (AG) dataset, which is the benchmark dataset of dynamic scene graph generation. Download and process the dataset according to the official repository of AG. Please modify the data_path in the config.

Following STTran, we keep bounding boxes with short edges larger than 16 pixels for SGCls and SGDet tasks. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader.

We borrow some compiled code for bbox operations:

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch. Following STTran, we use the pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

For the cross-modality guidance module, please download the ViT-B/32 model and put it in

lib/models/clip/ViT-B-32.pt

If you want to accelerate the training speed, we recommend downloading the precomputed features and modifying the pre_path in the config to the corresponding download path.

We borrowed some code from STTran.

Train

# PredCls
python train.py --mode predcls
# SgCls
python train.py --mode sgcls
# SgDet
python train_amp.py

Evaluation

python test.py --mode mode --model_path path_to_ckpt

Results

setting	ckpt	With R@20	No R@20	Top 6 R@20
PredCls	link	73.8	96.6	93.5
SgCls	link	48.7	64.4	62.4
SgDet	link	35.5	39.2	39.1

Citation

If you find this work useful in your research, please consider citing our paper:

@inproceedings{tr2,
    title = {Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs},
    author = {Jingyi Wang, Jinfa Huang, Can Zhang, and Zhidong Deng},
    booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
    year = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Introduction

Getting Started

Train

Evaluation

Results

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
dataloader		dataloader
fasterRCNN		fasterRCNN
lib		lib
log_paper		log_paper
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
train_amp.py		train_amp.py

qncsn2016/TR2

Folders and files

Latest commit

History

Repository files navigation

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Introduction

Getting Started

Train

Evaluation

Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages