Visiting the Invisible
Paper |ArXiv | Project Page | Video
This repository implements the training and testing for "Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition" by Chuanxia Zheng, Duy-Son Dao (instance segmentation),Guoxian Song (data rendering), Tat-Jen Cham and Jianfei Cai.
Example
Example results of scene decomposition and recomposition. Given a single RBG image, the proposed CSDNet model is able to structurally decompose the scene into semantically completed instances, and background, while completing the RGB appearance for previously invisible regions, such as the cup. The completely decomposed instances can be used for image editing and scene recomposition, such object removal and moving without manually input annotations.
Getting started
Requirements
- The code architecture is based on mmdetection (Version: 1.0rc1+621ecd2) and mmcv (Version: 0.2.15), please see https://github.com/open-mmlab/mmdetection for the installation details. We tried to update version to the latest one, but they are failed due to many functions are different between different versions.
Installation
- The original code was tested with Pytorch 1.4.0, CUDA 10.0, Python 3.6 and Ubuntu 16.04 (18.04 is also supported)
conda create -n viv python=3.6 -y
conda activate viv
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
- Install the mmdetection (Version: 1.0rc1+621ecd2) and mmcv (Version: 0.2.15)
pip install Cython==0.29.21
pip install mmcv==0.2.15
pip install -r requirements.txt
pip install -v -e .
Datasets
CSD
: Our rendered synthetic dataset, which contains 8,298 images, 95,030 instances for training and 1,012 images, 11,648 instances for testing. The dataset is built upon SUNCG. When we built the dataset (more than half year), SUNCG dataset is publicly available.- COCOA: is annotated from COCO2014, in which 5,000 images are selected to manually label with pairwise occlusion orders and amodal masks.
- KINS: is derived from KITTI, in which 14,991 images are labeled with absolute layer orders and amodal masks.
Testing
- Test the model
cd tools
bash test.sh
- The testing and evaluation configuration can be found in
test.py
file. - Please select the corresponding configuration and pre-trained model for each dataset.
- More settings needs to be modified in the code.
- Single image visualization testing (demo). Please modify the configuration for the different inputs.
cd demo
python predictor.py
Training
- Train a model (three phases in synthetic dataset)
cd tools
bash tran.sh
- Configuration files are stored in
configs/rgba
directory. - The synthetic model is trained in three phases: decomposition, completion, and end, which can be set in the corresponding configure file by set mode.
- More settings are followed as the previous works Mask-RCNN, HTC in MMdetection and PICNet.
Pretrained Models
Download the pre-trained models using the following links and put them under checkpoints
directory.
Citation
If you find our code or paper useful, please cite our paper.
@article{zheng2021vinv,
title={Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition},
author={Zheng, Chuanxia and Dao, Duy-Son and Song, Guoxian and Cham, Tat-Jen and Cai, Jianfei},
journal={International Journal of Computer Vision},
pages={},
year={2021},
publisher={Springer}
}