This repo contains the implementation of the method described in the paper
Unsupervised Moving Object Detection via Contextual Information Separation
Published in the International Conference of Computer Vision and Pattern Recognition (CVPR) 2019.
For a brief overview, check out the project VIDEO!
If you use this code in academic context, please cite the following publication:
@inproceedings{yang_loquercio_2019,
title={Unsupervised Moving Object Detection via Contextual Information Separation},
author={Yang, Yanchao and Loquercio, Antonio and Scaramuzza, Davide and Soatto, Stefano},
booktitle = {Conference on Computer Vision and Pattern Recognition {(CVPR)}}
year={2019}
}
Visit the project webpage for more details. For any question, please contact Antonio Loquercio.
This code was tested with the following packages. Note that previous version of them might work but are untested.
- Ubuntu 18.04
- Python3
- Tensorflow 1.13.1
- python-opencv
- CUDA 10.1
- python-gflags
- Keras 2.2.4
We have used three publicly available dataset for our experiments:
DAVIS 2016 | FBMS59 | SegTrackV2
The datasets can be used without any pre-processing.
We generate optical flows with a tensorflow implementation of PWCNet, which is an adapted version of this repository. To compute flows, please download the model checkpoint of PWCNet we used for our experiments, available at this link.
Additionally, you can find our trained models in the project webpage.
Once you have downloaded the datasets (at least one of the three), you can start training the model. All the required flags (and their defaults) are explained in the common_flags.py file.
The folder scripts contains an example of how to train a model on the DAVIS dataset. To start training, edit the file train_DAVIS2016.sh and add there the paths to the dataset and to the PWCNet checkpoint. After that you should be able to start training with the following command:
bash ./scripts/train_DAVIS2016.sh
You can monitor the training process in tensorboard
running the following command
tensorboard --logdir=/path/to/tensorflow/log/files
and by opening https://localhost:6006 on your browser.
To speed up training, we pre-trained the recover on the task of optical flow in-painting on box-shaped occlusions. We used the Flying Chair dataset for this training. The resulting checkpoint is used to initialize the recover network before the adversarial training. This checkpoint can be found together with our trained models in the project webpage. Although not strictly required, the recover pre-training significantly speeds up model convergence.
You can test a trained model with the function test_generator.py. An example is provided for the DAVIS 2016 dataset in the scripts folder. To run it, edit the file test_DAVIS2016_raw.sh with the paths to the dataset, the optical flow and the model checkpoint. After that, you can test the model with the following command:
bash ./scripts/test_DAVIS2016_raw.sh
Raw predictions are post-processed to increase model accuracy. In particular, the post-processing is composed of two steps: (i) averaging the predictions over different time shifts between the first and second image, as well as for multiple central crops, and (ii) Conditional Random Fields (CRF) of the average predictions and best candidate mask selection.
To generate predictions over multiple time steps and crops for the DAVIS 2016 dataset, please use the generate_buffer_DAVIS2016.sh script. This can be done by editing the script to add the path to the dataset, the PWCNet and the trained model checkpoints.
After predictions buffers are generated, please use the post-processing script to compute refined predictions.
Our final, post-processed results are available for the DAVIS 2016, FBMS59 and SegTrackv2 datasets at this link. In case you will evaluate on other datasets and would like to share the predictions please contact us!
The training loss seems symmetric for the mask and its complementary. How do you tell which one is the foreground and which the background?
For the training process it is very important to keep this symmetry. Without it the optimum of the training process is not guaranteed to be the separation of independent components anymore. However, to detect whether the masks cover the object or the background, we use the heuristic that background usually occupies more than two boundaries of the image. You can find the corresponding implementation of this heuristic in the function disambiguate_forw_back.
Some of the code belonging to this project has been inspired by the following repositories SfMLearner, TFOptflow, Generative_Inpaiting. We would like to thank all the authors of these repositories.