Good helper is around you: Attention-driven Masked Image Modeling

This repository contains PyTorch implementation of AMT(Attention-driven masking and throwing strategy) with MAE and SimMIM.
For details see Good helper is around you: Attention-driven Masked Image Modeling.

Preparation

Requirements

Pytorch >= 1.11.0
Python >= 3.8
timm == 0.5.4

To configure environment, you can run:

conda create -n AMT python=3.8 -y
conda install pytorch=1.11 torchvision cudatoolkit=11.3 -c pytorch -y
pip install -r requirements.txt

Points for Attention

We slightly modify the vision_transfomer.py in timm.models to provide attention map during pretraining:

class Attention(nn.Module):
...
    return x,attn # add attn 
...

class Block(nn.Module):
...
    def forward(self, x, return_attention=False):# add return_attention as a flag
        y,attn = self.attn(self.norm1(x))
        if return_attention:
            return attn
        x = x + self.drop_path(y)
        ...
...

We only support ViT-B/16 now.

Datasets

Please download and organize the datasets in this structure:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
      ...
    class2/
      img2.jpeg
      ...
  val/
    class1/
      img3.jpeg
      ...
    class/2
      img4.jpeg
      ...

Getting Started

This repo supports both attention-driven masking and AMT strategies, you can switch by setting mask_ratio and throw_ratio.

We test on a 4-gpu server, accum_iter is for mataining effective batch size with the original method. The following script is an example of AMT with MAE.

Pre-training with MAE

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py \
    --model vit_base_patch16 \
    --batch_size 128 \
    --accum_iter 8 \
    --blr 1.5e-4 \
    --data_path ${IMAGENET_DIR}

Finetuning with MAE

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=4 main_finetune.py \
    --accum_iter 8 \
    --batch_size 32 \
    --model vit_base_patch16 \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 100 \
    --blr 1e-3 --layer_decay 0.75 \
    --weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
    --dist_eval --data_path ${IMAGENET_DIR}

Linprobing with MAE

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=4 main_linprobe.py \
    --accum_iter 8 \
    --batch_size 512 \
    --model vit_base_patch16 --cls_token \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 90 \
    --blr 0.1 \
    --weight_decay 0.0 \
    --dist_eval --data_path ${IMAGENET_DIR}

This code is written by Zhengqi Liu. If you have questions or find bugs in the codes, feel free to contact Zhengqi Liu.

ATTN: This package is free for academic usage. You can run it at your own risk. For other purposes, please contact Jie Gui (guijie@ustc.edu).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
SimMIM-AMT		SimMIM-AMT
mae-AMT		mae-AMT
AMT.pdf		AMT.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Good helper is around you: Attention-driven Masked Image Modeling

Preparation

Requirements

Points for Attention

Datasets

Getting Started

Pre-training with MAE

Finetuning with MAE

Linprobing with MAE

About

Releases

Packages

Contributors 2

Languages

guijiejie/AMT

Folders and files

Latest commit

History

Repository files navigation

Good helper is around you: Attention-driven Masked Image Modeling

Preparation

Requirements

Points for Attention

Datasets

Getting Started

Pre-training with MAE

Finetuning with MAE

Linprobing with MAE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages