This is the official implementation for the method described in
Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation
Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.
Assuming a fresh Anaconda distribution, you can install the dependencies with:
conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2 # we use albumentations for faster image preprocessing
This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.
We recommend using a conda environment to avoid dependency conflicts.
You can predict scaled disparity for a single image with:
python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320
On its first run either of these commands will download the MS_1024x320
pretrained model (272MB) into the models/
folder.
We provide the following options for --model_name
:
--model_name |
Training modality | Resolution | Abs_Rel | Sq_Rel | |
---|---|---|---|---|---|
M_640x192 |
Mono | 640 x 192 | 0.105 | 0.769 | 0.892 |
M_1024x320 |
Mono | 1024 x 320 | 0.102 | 0.734 | 0.898 |
M_1280x384 |
Mono | 1280 x 384 | 0.102 | 0.715 | 0.900 |
MS_640x192 |
Mono + Stereo | 640 x 192 | 0.102 | 0.752 | 0.894 |
MS_1024x320 |
Mono + Stereo | 1024 x 320 | 0.096 | 0.694 | 0.908 |
You can download the entire raw KITTI dataset by running:
wget -i splits/kitti_archives_to_download.txt -P kitti_data/
Then unzip with
cd kitti_data
unzip "*.zip"
cd ..
Splits
The train/test/validation splits are defined in the splits/
folder.
By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training.
You can also train a model using the new benchmark split or the odometry split by setting the --split
flag.
Monocular training:
python train.py --model_name mono_model
Stereo training:
Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.
python train.py --model_name stereo_model \
--frame_ids 0 --use_stereo --split eigen_full
Monocular + stereo training:
python train.py --model_name mono+stereo_model \
--frame_ids 0 -1 1 --use_stereo
Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320
:
python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
# encoder resolution
# DepthNet resnet50 1024x320
# PoseNet resnet18 640x192
Add the following to the training command to load an existing model for finetuning:
python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19
Run python train.py -h
(or look at options.py
) to see the range of other training options, such as learning rates and ablation settings.
To prepare the ground truth depth maps run:
python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark
...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/
.
The following example command evaluates the weights of a model named MS_1024x320
:
python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen
You can download our precomputed disparity predictions from the following links:
Training modality | Input size | .npy filesize |
Eigen disparities |
---|---|---|---|
Mono | 640 x 192 | 326M | Download 🔗 |
Mono | 1024 x 320 | 871M | Download 🔗 |
Mono | 1280 x 384 | 1.27G | Download 🔗 |
Mono + Stereo | 640 x 192 | 326M | Download 🔗 |
Mono + Stereo | 1024 x 320 | 871M | Download 🔗 |
Monodepth2 - https://github.com/nianticlabs/monodepth2