This repository is the official implementation of PFKV Swin Transformer: Polarized Fusion of Key and Value Swin Transformer for Monocular Depth Estimation.
conda create -n fkvswin python=3.8
conda activate fkvswin
conda install pytorch=1.10.0 torchvision cudatoolkit=11.1
pip install matplotlib, tqdm, tensorboardX, timm, mmcv
You can prepare the datasets KITTI and NYUv2 according to here, and then modify the data path in the config files to your dataset locations.
First download the pretrained encoder backbone from here, and then modify the pretrain path in the config files.
Training the NYUv2 model:
python fkvswin/train.py configs/arguments_train_nyu.txt
Training the KITTI model:
python fkvswin/train.py configs/arguments_train_kittieigen.txt
Evaluate the NYUv2 model:
python fkvswin/eval.py configs/arguments_eval_nyu.txt
Evaluate the KITTI model:
python fkvswin/eval.py configs/arguments_eval_kittieigen.txt
Model | Abs.Rel. | Sqr.Rel | RMSE | RMSElog | a1 | a2 | a3 |
---|---|---|---|---|---|---|---|
NYUv2 | 0.0900 | 0.0433 | 0.3280 | 0.1170 | 0.929 | 0.990 | 0.998 |
KITTI_Eigen | 0.0520 | 0.1532 | 2.111 | 0.079 | 0.974 | 0.997 | 0.999 |
The code is:pfkv
Test images with the indoor model:
python fkvswin/test.py --data_path datasets/test_data --dataset nyu --filenames_file data_splits/test_list.txt --checkpoint_path model_nyu.ckpt --max_depth 10 --save_viz
Thanks to Jin Han Lee for opening source of the excellent work BTS. Thanks to Microsoft Research Asia for opening source of the excellent work Swin Transformer. Thanks to Yuan Gu Dai Zhu Tan for opening source of the excellent work NewCRF