Skip to content
Implements an adaptive gating sensor fusion approach for object detection based on a mixture of convolutional neural networks
Jupyter Notebook C++ Python Cuda CMake MATLAB Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments

This is the code implementing an adaptive gating sensor fusion approach for object detection based on a mixture of convolutional neural networks. Our approach learns how to best combine different sensor modalities to handle sensor noise induced by dynamic environments such as lighting changes, out of range readings in the depth sensor and motion blur among others. More information at our project page.


If you find the code helpful please consider citing our work

  author = {Oier Mees and Andreas Eitel and Wolfram Burgard},
  title = {Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments},
  booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
  year = 2016,
  address = {Daejeon, South Korea},
  url = {},


Please refer to for setup instructions. The code was tested on Ubuntu 14.04. The gating layer implementing the adaptive fusion scheme can be found at gating_inner_product_layer.cpp and


We provide several models from the paper. A RGB-D gating network trained on the InOutDoorPeople dataset is available at googlenet_rgb_depth_gating_iter_2500.caffemodel. Inference can be made with:

./tools/ --gpu 1 --def models/googlenet_rgb_depth_gating/deploy.prototxt \
	--net models/googlenet_rgb_depth_gating/googlenet_rgb_depth_gating_iter_2500.caffemodel \
        --cfg experiments/cfgs/day_night.yml


The gating network, i.e. learning how to combine best the convolutional neural networks trained on the RGB and Depth modalities, can be trained with the following command:

./tools/ --gpu 1 --solver models/googlenet_rgb_depth_gating/solver.prototxt  \
--weights models/googlenet_rgb_depth_gating/googLeNet_fus.caffemodel --imdb inria_train \
--rand --cfg experiments/cfgs/day_night.yml --iters 10000  2>&1 | tee /tmp/caffe_google_fus.log.$(date +%Y%m%d-%H%M)

We merged the two separately trained RGB and Depth deep experts based on the GoogLeNet-xxs architecture into a single model googLeNet_fus.caffemodel in order to train then the adaptive weighting. We provide a script to do the merging if you want to use your own models.


We provide a script to evaluate the detections for our InOutDoorPeople dataset.


Our InOutDoorPeople dataset containing 8305 annotated frames of RGB and Depth data can be found here.


If you want to integrate our multimodal mixture of deep experts into a semantic segmentation task, check out the following repo.

If you want to implement our approach with a different network architecture you should take a look at the fusion layers in our reference implementation.


For academic usage, the code related to the gating layers, our caffe models and utility scripts are released under the GPLv3 license. For any commercial purpose, please contact the authors. For Fast-RCNN and Caffe see their respective licenses.

You can’t perform that action at this time.