release DTU

jzhangbs · May 13, 2021 · c35d8c6 · c35d8c6
1 parent 3c880ab
commit c35d8c6
Show file tree

Hide file tree

Showing 7 changed files with 631 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,9 @@
 # Visibility-aware Multi-view Stereo Network
 ## Introduction
+
+![Intro](readme_media/method2.png)
+
+This is the official implementation for the BMVC 2020 paper [Visibility-aware Multi-view Stereo Network](https://arxiv.org/abs/2008.07928). In this paper, we explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation. The pair-wise uncertainty map is jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion. As such, the adverse influence of occluded pixels is suppressed in the cost fusion. The proposed framework Vis-MVSNet significantly improves depth accuracies in the scenes with severe occlusion.
 ## How to Use
 ### Environment Setup
 The code is tested in the following environment. The newer version of the packages should also be fine. 
@@ -17,21 +21,77 @@ It is highly recommended to use Anaconda.
 
 You need to install `apex` manually. See https://github.com/NVIDIA/apex for more details. 
 
+### Quick test on your own data
+Vis-MVSNet requires camera parameters and view selection file. If you do not have them, you can use `Colmap` to estimate cameras and convert them to MVSNet format by `colmap2mvsnet.py`. Please arrange your files as follows.
+```
+- <dense_folder>
+    - images_col  # input images of Colmap
+    - sparse_col  # SfM output from colmap in .txt format
+    - cams        # output MVSNet cameras, to be generated
+    - images      # output MVSNet input images, to be generated
+    - pair.txt    # output view selection file, to be generated
+```
+
+An example of running `Colmap`
+```
+colmap feature_extractor \
+    --database_path <dense_folder>/database.db \
+    --image_path <dense_folder>/images_col
+
+colmap exhaustive_matcher \
+    --database_path <dense_folder>/database.db
+
+colmap mapper \
+    --database_path <dense_folder>/database.db \
+    --image_path <dense_folder>/images_col \
+    --output_path <dense_folder>/sparse_col
+
+colmap model_converter \
+    --input_path <dense_folder>/sparse_col/0 \
+    --output_path <dense_folder>/sparse_col \
+    --output_type TXT
+```
+
+Run `colmap2mvsnet.py` by
+```bash
+python colmap2mvsnet.py --dense_folder <dense_folder> --max_d 256 --convert_format
+```
+
+Vis-MVSNet will first resize the inputs (keep aspect ratio). Please determine the target size e.g. `1280,720` for `16:9` image. Then run Vis-MVSNet by 
+``` bash
+python test.py --data_root <dense_folder> --dataset_name general --num_src 4 --max_d 256 --resize 1280,720 --crop 1280,720 --load_path pretrained_model/vis --write_result --result_dir <output_dir>
+```
+
+For depth fusion, please refer to `Post-Processing` section. 
+
 ### Data preparation
 Download the [Blended low res set](https://drive.google.com/open?id=1ilxls-VJNvJnB7IaFj7P0ehMPr7ikRCb), [Tanks and Temple testing set](https://drive.google.com/open?id=1YArOJaX9WVLJh4757uE8AEREYkgszrCo). For more information, please visit [MVSNet](https://github.com/YoYo000/MVSNet). 
 
-The pre-processed DTU dataset will be available soon.
+For the pre-processed DTU dataset, please download the [rectified images](http://roboimagedata.compute.dtu.dk/?page_id=36) from the official website and ground truth depths and cameras: [part1](https://hkustconnect-my.sharepoint.com/:u:/g/personal/jzhangbs_connect_ust_hk/EfZTR-JYiGBJqC873IoQnWgBYCljQBMYv5N7PKQvrCwNbw?e=ThTK8U) [part2](https://hkustconnect-my.sharepoint.com/:u:/g/personal/jzhangbs_connect_ust_hk/ESY13vX9JkBPoAr8sEOfAmgBIQqWoNaEsS0Y10nQqjI-LA?e=uaqZtF). The data should be arranged as
+```
+- <data_root>
+    - Rectified
+        - scan*
+            - rect_*.png
+    - Cameras
+        - *_cam.txt
+    - Depths
+        - scan*
+            - depth_map_*.pfm
+```
 
-### Training, validation & testing
+### Training & validation
 First set the machine dependent parameters e.g. dataset dir in `sh/dir.json`.
 
 Set the job name, and run `python sh/bld.py local` or `python sh/dtu.py local` to train the network on BlendedMVS/DTU. 
 
 Set the job name to load and the number of sources, and run `python sh/bld_val.py local` or `python sh/dtu_val.py local` to validate the network on BlendedMVS/DTU. 
 
+### Testing
+
 Set the dataset dir, dir of the models, job name to load and the output dir, and run `sh/tnt.sh` or `sh/dtu.sh` to generate the outputs for point cloud fusion on Tanks and Temples/DTU. (Note that the indexing of your shell should start from 0, otherwise you need to modify the scripts.)
 
-See `python train.py/val.py/test.py --help` for the explanation of all the flags.
+For advanced usage, please see `python train.py/val.py/test.py --help` for the explanation of all the flags.
 
 ### Explanation of depth number and interval scale
 `max_d` and `interval_scale` is a standard depth sampling. Similar to MVSNet, in the preprocessing, `depth_start` is kept, `depth_interval` is scaled by `interval_scale`, and `depth_num` is set to be `max_d`. So if you want to keep the depth range in the cam files, to need to manually ensure `max_d*interval_scale=<depth num in the cam file>`
@@ -45,10 +105,36 @@ python fusion.py --data <dir_of_depths> --pair <dir_of_pair> --vthresh 4 --pthre
 ```
 where the `--data` is the same as the `--result_dir` in `test.py`. This script uses pytorch so can be accelerated by GPU. 
 
-## File Formats
-The format of all the I/O files follows MVSNet, except for that we output three probability maps instead of one. 
+<!-- Note that this depth fusion script is different from the one used in the experiments which cannot be release for some reason. So the results of point cloud evaluations may not be able to reproduce. Alternatively, you can consider the fusion script provided by [CasMVSNet](https://github.com/alibaba/cascade-stereo/tree/master/CasMVSNet).  -->
+
+Note that this depth fusion script is different from the one used in the experiments which depends on the Altizure internal library and cannot be released. The provided one is re-implemented so cannot guarantee exactly the same result. But it should still produce results with top tier quality. 
+
+## Output File Structure
+```
+- <dir_of_depth>
+    - %08d.jpg             # images with the same size as depth maps
+    - %08d_flow3.pfm       # depth maps
+    - %08d_flow*_prob.pfm  # probability maps with the same size as depth maps
+    - cam_%08d_flow3.txt   # cameras with the same size as depth maps
+    - all_torch.ply        # fused point cloud
+```
+
+## Citation
+If you find our work useful in your research, please kindly cite
+```
+@article{zhang2020visibility,
+	title={Visibility-aware Multi-view Stereo Network},
+	author={Zhang, Jingyang and Yao, Yao and Li, Shiwei and Luo, Zixin and Fang, Tian},
+	journal={British Machine Vision Conference (BMVC)},
+	year={2020}
+}
+```
 
 ## Changelog
+### May 11 2021
+- Update README
+- Add `colmap2mvsnet.py`
+- Release high-res DTU depth ground truth
 ### Oct 21 2020
 - Add pretrained model (`pretrained_model`)
 - Add script for depth fusion