Project Website | Video | Paper
Dynamic View Synthesis from Dynamic Monocular Video
Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang
in ICCV 2021
The code is test with
- Linux (tested on CentOS Linux release 7.4.1708)
- Anaconda 3
- Python 3.7.11
- CUDA 10.1
- 1 V100 GPU
To get started, please create the conda environment dnerf
by running
conda create --name dnerf python=3.7
conda activate dnerf
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install imageio scikit-image configargparse timm lpips
and install COLMAP manually. Then download MiDaS and RAFT weights
ROOT_PATH=/path/to/the/DynamicNeRF/folder
cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/weights.zip
unzip weights.zip
rm weights.zip
The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please download the pre-processed data by running:
cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip
You can train a model from scratch by running:
cd $ROOT_PATH/
python run_nerf.py --config configs/config_Balloon2.txt
Every 100k iterations, you should get videos like the following examples
The novel view-time synthesis results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/novelviewtime
.
The reconstruction results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset
.
The fix-view-change-time results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_view000
.
The fix-time-change-view results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_time000
.
We also provide pre-trained models. You can download them by running:
cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/logs.zip
unzip logs.zip
rm logs.zip
Then you can render the results directly by running:
python run_nerf.py --config configs/config_Balloon2.txt --render_only --ft_path $ROOT_PATH/logs/Balloon2_H270_DyNeRF_pretrain/300000.tar
Our goal is to make the evaluation as simple as possible for you. We have collected the fix-view-change-time results of the following methods:
NeRF
NeRF + t
Yoon et al.
Non-Rigid NeRF
NSFF
DynamicNeRF (ours)
Please download the results by running:
cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/results.zip
unzip results.zip
rm results.zip
Then you can calculate the PSNR/SSIM/LPIPS by running:
cd $ROOT_PATH/utils
python evaluation.py
PSNR / LPIPS | Jumping | Skating | Truck | Umbrella | Balloon1 | Balloon2 | Playground | Average |
---|---|---|---|---|---|---|---|---|
NeRF | 20.99 / 0.305 | 23.67 / 0.311 | 22.73 / 0.229 | 21.29 / 0.440 | 19.82 / 0.205 | 24.37 / 0.098 | 21.07 / 0.165 | 21.99 / 0.250 |
NeRF + t | 18.04 / 0.455 | 20.32 / 0.512 | 18.33 / 0.382 | 17.69 / 0.728 | 18.54 / 0.275 | 20.69 / 0.216 | 14.68 / 0.421 | 18.33 / 0.427 |
NR NeRF | 20.09 / 0.287 | 23.95 / 0.227 | 19.33 / 0.446 | 19.63 / 0.421 | 17.39 / 0.348 | 22.41 / 0.213 | 15.06 / 0.317 | 19.69 / 0.323 |
NSFF | 24.65 / 0.151 | 29.29 / 0.129 | 25.96 / 0.167 | 22.97 / 0.295 | 21.96 / 0.215 | 24.27 / 0.222 | 21.22 / 0.212 | 24.33 / 0.199 |
Ours | 24.68 / 0.090 | 32.66 / 0.035 | 28.56 / 0.082 | 23.26 / 0.137 | 22.36 / 0.104 | 27.06 / 0.049 | 24.15 / 0.080 | 26.10 / 0.082 |
Please note:
- The numbers reported in the paper are calculated using TF code. The numbers here are calculated using this improved Pytorch version.
- In Yoon's results, the first frame and the last frame are missing. To compare with Yoon's results, we have to omit the first frame and the last frame. To do so, please uncomment line 72 and comment line 73 in
evaluation.py
. - We obtain the results of NSFF and NR NeRF using the official implementation with default parameters.
- Set some paths
ROOT_PATH=/path/to/the/DynamicNeRF/folder
DATASET_NAME=name_of_the_video_without_extension
DATASET_PATH=$ROOT_PATH/data/$DATASET_NAME
- Prepare training images and background masks from a video.
cd $ROOT_PATH/utils
python generate_data.py --videopath /path/to/the/video
- Use COLMAP to obtain camera poses.
colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1
colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db
mkdir $DATASET_PATH/sparse
colmap mapper \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--output_path $DATASET_PATH/sparse \
--Mapper.num_threads 16 \
--Mapper.init_min_tri_angle 4 \
--Mapper.multiple_models 0 \
--Mapper.extract_colors 0
- Save camera poses into the format that NeRF reads.
cd $ROOT_PATH/utils
python generate_pose.py --dataset_path $DATASET_PATH
- Estimate monocular depth.
cd $ROOT_PATH/utils
python generate_depth.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/midas_v21-f6b98070.pt
- Predict optical flows.
cd $ROOT_PATH/utils
python generate_flow.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth
- Obtain motion mask (code adapted from NSFF).
cd $ROOT_PATH/utils
python generate_motion_mask.py --dataset_path $DATASET_PATH
- Train a model. Please change
expname
anddatadir
inconfigs/config.txt
.
cd $ROOT_PATH/
python run_nerf.py --config configs/config.txt
Explanation of each parameter:
expname
: experiment namebasedir
: where to store ckpts and logsdatadir
: input data directoryfactor
: downsample factor for the input imagesN_rand
: number of random rays per gradient stepN_samples
: number of samples per raynetwidth
: channels per layeruse_viewdirs
: whether enable view-dependency for StaticNeRFuse_viewdirsDyn
: whether enable view-dependency for DynamicNeRFraw_noise_std
: std dev of noise added to regularize sigma_a outputno_ndc
: do not use normalized device coordinateslindisp
: sampling linearly in disparity rather than depthi_video
: frequency of novel view-time synthesis video savingi_testset
: frequency of testset video savingN_iters
: number of training iterationsi_img
: frequency of tensorboard image loggingDyNeRF_blending
: whether use DynamicNeRF to predict blending weightpretrain
: whether pre-train StaticNeRF
This work is licensed under MIT License. See LICENSE for details.
If you find this code useful for your research, please consider citing the following paper:
@inproceedings{Gao-ICCV-DynNeRF,
author = {Gao, Chen and Saraf, Ayush and Kopf, Johannes and Huang, Jia-Bin},
title = {Dynamic View Synthesis from Dynamic Monocular Video},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
year = {2021}
}
Our training code is build upon NeRF, NeRF-pytorch, and NSFF. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.