Pytorch Implementation of the framework TEMPURA proposed in our paper Unbiased Scene Graph Generation in Videos accepted by CVPR2023.
The inherent challenges in dynamic scene graph generation, such as long-tailed distribution of the visual relationships, noisy annotations and temporal fluctuation of model predictions, makes existing methods prone to generate biased scene graphs. We address this by introducing a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic scene graph generation. TEMPURA employs object-level temporal consistencies via transformer-based sequence modeling, learns to synthesize unbiased relationship representations using memory-guided training, and tackles the inherent noise in the dataset by attenuating the predictive uncertainty of visual relations using a Gaussian Mixture Model (GMM).
Please install packages in the environment.yml
We borrow some compiled code for bbox operations.
cd lib/draw_rectangles
python build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python build_ext --inplace
For the object detector part, please follow the compilation from We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in
We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:
|-- ag
|-- annotations #gt annotations
|-- frames #sampled frames
|-- videos #original videos
In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader
python -mode predcls -datasize large -data_path $DATAPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.5 -rel_head gmm -obj_head linear -K 6 -lr 1e-5 -save_path output/
- For SGCLS:
python -mode sgcls -datasize large -data_path $DATAPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.3 -rel_head gmm -obj_head linear -obj_con_loss euc_con -lambda_con 1 -eos_coef 1 -K 4 -tracking -lr 1e-5 -save_path output/
- For SGDET:
python -mode sgdet -datasize large -data_path $DATAPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.5 -rel_head gmm -obj_head linear -obj_con_loss euc_con -lambda_con 1 -eos_coef 1 -K 4 -tracking -lr 1e-5 -save_path output/
python -mode predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.5 -rel_head gmm -obj_head linear -K 6
- For SGCLS:
python -mode sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.3 -rel_head gmm -obj_head linear -K 4 -tracking
- For SGDET:
python -mode sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH -rel_mem_compute joint -rel_mem_weight_type simple -mem_fusion late -mem_feat_selection manual -mem_feat_lambda 0.5 -rel_head gmm -obj_head linear -K 4 -tracking
We would like to acknowledge the authors of the following repositories from where we borrowed some code
If our work is helpful for your research, please cite our publication:
title={Unbiased Scene Graph Generation in Videos},
author={Nag, Sayak and Min, Kyle and Tripathi, Subarna and Roy-Chowdhury, Amit K},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},