Handheld Multi-Frame Neural Depth Refinement

This is the official code repository for the work: The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement , presented at CVPR 2022.

If you use parts of this work, or otherwise take inspiration from it, please considering citing our paper:

  title={The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement},
  author={Chugunov, Ilya and Zhang, Yuxuan and Xia, Zhihao and Zhang, Xuaner and Chen, Jiawen and Heide, Felix},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


  • Developed using PyTorch 1.10.0 on Linux x64 machine
  • Condensed package requirements are in \requirements.txt. Note that this contains the package versions at the time of publishing, if you update to, for example, a newer version of PyTorch you will need to watch out for changes in class/function calls


  • Download data from this Google Drive link and unpack into the \data folder
  • Each folder corresponds to a scene [castle, double, eagle, elephant, embrace, frog, ganesha, gourd, rocks, thinker] and contains five files.
    • is the frozen, trained MLP corresponding to the scene
    • frame_bundle.npz is the recorded bundle of data (images, depth, and poses)
    • pose_bundle.npz is a much smaller recorded bundle of data (poses only)
    • reprojected_lidar.npy is the merged LiDAR depth baseline as described in the paper
    • snapshot.mp4 is a video of the recorded snapshot for visualization purposes

An explanation of the format and contents of the frame bundles (frame_bundle.npz) is given in an interactive format in \0_data_format.ipynb. We recommend you go through this jupyter notebook before you record your own bundles or otherwise manipulate the data.

Project Structure:

  ├── checkpoints  
  │   └── // folder for network checkpoints
  ├── data  
  │   └── // folder for recorded bundle data
  ├── utils  
  │   ├──  // dataloader class for bundle data
  │   ├──  // MLP blocks and positional encoding
  │   └──  // miscellaneous helper functions (e.g. grid/patch sample)
  ├── 0_data_format.ipynb  // interactive tutorial for understanding bundle data
  ├── 1_reconstruction.ipynb  // interactive tutorial for depth reconstruction
  ├──  // the learned implicit depth model
  │             // -> reproject points, query MLP for offsets, visualization
  ├──  // a README in the README, how meta
  ├── requirements.txt  // frozen package requirements
  ├──  // wrapper class for arg parsing and setting up training loop
  └──  // example script to run training


The jupyter notebook \1_reconstruction.ipynb contains an interactive tutorial for depth reconstruction: loading a model, loading a bundle, generating depth.


The script \ demonstrates a basic call of \ to train a model on the gourd scene data. It contains the arguments

  • checkpoint_path - path to save model and tensorboard checkpoints
  • device - device for training [cpu, cuda]
  • bundle_path - path to the bundle data

For other training arguments, see the argument parser section of \

Best of luck,


