ViC-MAE

Official PyTorch/GPU codebase for ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders.

This repo is based on:

Requirements

Create a conda environment and install the requirements:

conda create -y -n vicmae python=3.9 cupy pkg-config compilers libjpeg-turbo libwebp opencv=4.7.0 numba ffmpeg av tmux cudatoolkit=11.8 -c conda-forge
conda activate vicmae
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install wandb ffmpeg-python git+https://github.com/rwightman/pytorch-image-models glances[all]
pip install ffcv

Checkpoints

The following table provides the strongest pre-trained checkpoints used in the paper.

Model	Dataset	Epochs	Batch Size	Download
ViC-MAE-B/16	IN1K + K400	800	4096	Link
ViC-MAE-B/16	IN1K + K400 + K600 + K700 +MiT	800	4096	Link
ViC-MAE-L/16	IN1K + K400	800	4096	Link
ViC-MAE-L/16	IN1K + K400 + K600 + K700 +MiT	800	4096	Link

Training

See PRETRAIN.md for pre-training instructions.

Fine-tuning

See FINETUNE.md for fine-tuning instructions.

Citation

@article{hernandez2023visual,
  title={Visual Representation Learning from Unlabeled Video using Contrastive Masked Autoencoders},
  author={Hernandez, Jefferson and Villegas, Ruben and Ordonez, Vicente},
  journal={arXiv preprint arXiv:2303.12001},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
scripts		scripts
util		util
.gitignore		.gitignore
DATA.md		DATA.md
FINETUNE.md		FINETUNE.md
PRETRAIN.md		PRETRAIN.md
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
image_finetune.py		image_finetune.py
models_vicmae.py		models_vicmae.py
models_vit.py		models_vit.py
pretrain.py		pretrain.py

jeffhernandez1995/ViC-MAE

Folders and files

Latest commit

History

Repository files navigation

ViC-MAE

Requirements

Checkpoints

Training

Fine-tuning

Citation

About

Resources

Stars

Watchers

Forks

Languages