Visual Token Matching

(Update 08/23, 2023) We have uploaded pretrained checkpoints in this link. Meta-trained checkpoints for each fold are included in TRAIN directory, and fine-tuned checkpoints for each task (and each channel) are included in FINETUNE directory.

(News) Our paper received the Outstanding Paper Award in ICLR 2023!

This repository contains official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching (ICLR 2023 oral).

Setup

Download Taskonomy Dataset (tiny split) from the official github page https://github.com/StanfordVL/taskonomy/tree/master/data.

You may download data of depth_euclidean, depth_zbuffer, edge_occlusion, keypoints2d, keypoints3d, normal, principal_curvature, reshading, segment_semantic, and rgb.
(Optional) Resize the images and labels into (256, 256) resolution.
To reduce the I/O bottleneck of dataloader, we stored data from all buildings in a single directory. The directory structure looks like:

<root>
|--<task1>
|   |--<building1>_<file_name1>
|   | ...
|   |--<building2>_<file_name1>
|   |...
|
|--<task2>
|   |--<building1>_<file_name1>
|   | ...
|   |--<building2>_<file_name1>
|   |...
|
|...

Create data_paths.yaml file and write the root directory path (<root> in the above structure) by taskonomy: PATH_TO_YOUR_TASKONOMY.
Install pre-requirements by pip install -r requirements.txt.
Create model/pretrained_checkpoints directory and download BEiT pre-trained checkpoints to the directory.

We used beit_base_patch16_224_pt22k checkpoint for our experiment.

Usage

Training

python main.py --stage 0 --task_fold [0/1/2/3/4]

Fine-tuning

python main.py --stage 1 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]

Evaluation

python main.py --stage 2 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]

After the evaluation, you can print the test results by running python print_results.py

References

Our code refers the following repositores:

Citation

If you find this work useful, please consider citing:

@inproceedings{kim2023universal,
  title={Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching},
  author={Donggyun Kim and Jinwoo Kim and Seongwoong Cho and Chong Luo and Seunghoon Hong},
  booktitle={International Conference on Learning Representations},
  year={2023},
  url={https://openreview.net/forum?id=88nT0j5jAn}
}

Acknowledgements

The development of this open-sourced code was supported in part by the National Research Foundation of Korea (NRF) (No. 2021R1A4A3032834).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
dataset		dataset
model		model
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VTM Overview.png		VTM Overview.png
args.py		args.py
main.py		main.py
print_results.py		print_results.py
requirements.txt		requirements.txt

License

GitGyun/visual_token_matching

Folders and files

Latest commit

History

Repository files navigation

Visual Token Matching

Setup

Usage

Training

Fine-tuning

Evaluation

References

Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages