This repository contains the TensorFlow prototype implementation of my bachelor thesis Motion R-CNN: Instance-level 3D Motion Estimation with Region-based CNNs.
In addition to the functionality provided by the TensorFlow Object Detection API (at the time of writing), the code supports:
- prediction of instance masks for detected objects
- Feature Pyramid Networks
- prediction of inter-frame 3D camera ego-motion (translation and rotation) given a second image temporally consecutive to the first image
- prediction of 3D motions (translation and rotation) between the two frames for all objects detected in the first frame
Note that the code only supports training on the Virtual KITTI dataset,
but it is easy to adapt it to other datasets.
Motion prediction and frame pair input is fully optional and the code can be used as a Mask R-CNN
implementation with single image input.
Support for cityscapes is implemented, but using the records created with
may require adapting the
data_decoder or the record writing as the record interface changed.
Motion R-CNN is released under the MIT License (refer to the LICENSE file for details).
- tensorflow (>= 1.3.0) with GPU support.
- sudo apt-get install protobuf-compiler
pip install opencv-python pandas pillow lxml matplotlib
- from the project root directory, run
- download and extract the
- download all of the
ground truth and extract the folders into a directory named
- cd to the project root directory
protoc object_detection/protos/*.proto --python_out=.
python create_vkitti_tf_record.py --data_dir=<data_parent_dir> --output_dir=data/records --set val
python create_vkitti_tf_record.py --data_dir=<data_parent_dir> --output_dir=data/records --set train
<data_parent_dir> is the directory containing the
Training & evaluating
python train.py --logtostderr --pipeline_config_path=data/configs/motion_rcnn_vkitti_cam.config --train_dir=output/train/motion_rcnn_vkitti_cam --gpu 0
python eval.py --logtostderr --pipeline_config_path=data/configs/motion_rcnn_vkitti_cam.config --checkpoint_dir=output/train/motion_rcnn_vkitti_cam --eval_dir=output/eval/motion_rcnn_vkitti_cam
to train and evaluate a model with camera and instance motion prediction.
You can adapt the configurations found in
data/configs/. For a description of the configuration parameters, see
Navigating the code
The following files were added or modified from the original Object Detection API code
- motion_util.py: losses for training the motion estimation and post-processing utilities
- np_motion_util.py: composition of optical flow using motion predictions and performance evaluation utilities
- create_vkitti_tf_record.py: process virtual kitti dataset
- create_cityscapes_tf_record.py: process cityscapes dataset
- faster_rcnn_meta_arch.py: adapted to support instance mask, instance motion, and camera motion training.
- target_assigner.py: updated to support mask and motion target assignment
- box_predictor.py: updated to support FPN as well as mask and motion prediction
- post_processing.py: updated to pass through instance motions
- faster_rcnn_resnet_v1_fpn_feature_extractor.py: FPN feature extractor
- faster_rcnn_resnet_v1_feature_extractor.py: added support for camera branch
The following tests were added or modified:
This repository is based on the TensorFlow Object Detection API.