Skip to content

The Implementation of ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

Notifications You must be signed in to change notification settings

novendrastywn/ParFormer-CAPE-2024

Repository files navigation

ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

The Implementation of ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

arXiv | PDF

ParFormerFramework

ImageNet

Model Name Resolution Params GFLOPs @Top-1 Download
ParFormer-B1 224X224 11M 1.5 80.5 model
ParFormer-B2 224X224 23M 3.4 82.1 model
ParFormer-B3 224X224 34M 6.5 83.1 model

Prerequisites

conda virtual environment is recommended.

conda install pytorch torchvision cudatoolkit=11.8 -c pytorch
pip install timm==0.6.13
pip install wandb
pip install fvcore

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the train folder and val folder respectively:

|-- /path/to/imagenet/
    |-- train
    |-- val

Single machine multi-GPU training

We provide an example training script train_imnet.sh using PyTorch distributed data parallel (DDP).

To train ParFormer-B1 on an 2-GPU machine:

sh train_imnet.sh parformer_b1 2

Tips: specify your data path and experiment name in the script!

About

The Implementation of ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published