This is a PyTorch/GPU implementation of the paper Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction:
-
This repo is a modification on the MAE. Installation and preparation follow that repo.
-
This repo is based on
timm==0.3.2
, for which a fix is needed to work with PyTorch 1.8.1+. -
The relative position encoding is modeled by following iRPE. To enable the iRPE with CUDA supported:
cd rpe_ops/
python setup.py install --user
Backbones | Method | Pretrain Epochs | Pretrained Weights | Pretrain Logs | Finetune Logs |
---|---|---|---|---|---|
ViT/B-16 | LoMaR | 1600 | download | download | download |
Pretrain the model:
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29517 main_pretrain_lomar.py \
--batch_size 256 \
--accum_iter 4 \
--output_dir ${LOG_DIR} \
--log_dir ${LOG_DIR} \
--model mae_vit_base_patch16 \
--norm_pix_loss \
--distributed \
--epochs 400 \
--warmup_epochs 20 \
--blr 1.5e-4 --weight_decay 0.05 \
--window_size 7 \
--num_window 4 \
--mask_ratio 0.8 \
--data_path ${IMAGENET_DIR}
Finetune the model:
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 \
--master_addr=127.0.0.1 --master_port=29510 main_finetune_lomar.py \
--batch_size 256 \
--accum_iter 1 \
--model vit_base_patch16 \
--finetune ${PRETRAIN_CHKPT} \
--epochs 100 \
--log_dir ${LOG_DIR} \
--blr 5e-4 --layer_decay 0.65 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path ${IMAGENET_DIR}
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
@article{chen2022efficient,
title={Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction},
author={Chen, Jun and Hu, Ming and Li, Boyang and Elhoseiny, Mohamed},
journal={arXiv preprint arXiv:2206.00790},
year={2022}
}