GitHub - junchen14/LoMaR: LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

This is a PyTorch/GPU implementation of the paper Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction:

This repo is a modification on the MAE. Installation and preparation follow that repo.
This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.
The relative position encoding is modeled by following iRPE. To enable the iRPE with CUDA supported:

cd rpe_ops/
python setup.py install --user

Main Results on ImageNet-1K

Backbones	Method	Pretrain Epochs	Pretrained Weights	Pretrain Logs	Finetune Logs
ViT/B-16	LoMaR	1600	download	download	download

Pre-training

Pretrain the model:

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29517 main_pretrain_lomar.py \
    --batch_size 256 \
    --accum_iter 4 \
    --output_dir ${LOG_DIR} \
    --log_dir ${LOG_DIR} \
    --model mae_vit_base_patch16 \
    --norm_pix_loss \
    --distributed \
    --epochs 400 \
    --warmup_epochs 20 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --window_size 7 \
    --num_window 4 \
    --mask_ratio 0.8 \
    --data_path ${IMAGENET_DIR}

Fine-tuning

Finetune the model:

python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29510 main_finetune_lomar.py \
    --batch_size 256 \
    --accum_iter 1 \
    --model vit_base_patch16 \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 100 \
    --log_dir ${LOG_DIR} \
    --blr 5e-4 --layer_decay 0.65 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path ${IMAGENET_DIR}

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Citation

@article{chen2022efficient,
  title={Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction},
  author={Chen, Jun and Hu, Ming and Li, Boyang and Elhoseiny, Mohamed},
  journal={arXiv preprint arXiv:2206.00790},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assests		assests
rpe_ops		rpe_ops
util		util
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
irpe.py		irpe.py
main_finetune_lomar.py		main_finetune_lomar.py
main_pretrain_lomar.py		main_pretrain_lomar.py
model_utils.py		model_utils.py
modeling_finetune.py		modeling_finetune.py
models_lomar.py		models_lomar.py
models_vit_rp.py		models_vit_rp.py
profile.py		profile.py
vision_transformer_irpe.py		vision_transformer_irpe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

Main Results on ImageNet-1K

Pre-training

Fine-tuning

License

Citation

About

Releases

Packages

Languages

License

junchen14/LoMaR

Folders and files

Latest commit

History

Repository files navigation

LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)

Main Results on ImageNet-1K

Pre-training

Fine-tuning

License

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages