EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

If you are interested in getting updates, please join our mailing list here.

[2024/04/23] We released the training code of EfficientViT-SAM.
[2024/04/06] EfficientViT-SAM is accepted by eLVM@CVPR'24.
[2024/03/19] Online demo of EfficientViT-SAM is available: https://evitsam.hanlab.ai/.
[2024/02/07] We released EfficientViT-SAM, the first accelerated SAM model that matches/outperforms SAM-ViT-H's zero-shot performance, delivering the SOTA performance-efficiency trade-off.
[2023/11/20] EfficientViT is available in the NVIDIA Jetson Generative AI Lab.
[2023/09/12] EfficientViT is highlighted by MIT home page and MIT News.
[2023/07/18] EfficientViT is accepted by ICCV 2023.

About EfficientViT Models

EfficientViT is a new family of ViT models for efficient high-resolution dense prediction vision tasks. The core building block of EfficientViT is a lightweight, multi-scale linear attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations, making EfficientViT TensorRT-friendly and suitable for GPU deployment.

Third-Party Implementation/Integration

Getting Started

conda create -n efficientvit python=3.10
conda activate efficientvit
conda install -c conda-forge mpi4py openmpi
pip install -r requirements.txt

EfficientViT Applications

Segment Anything

Model	Resolution	COCO mAP	LVIS mAP	Params	MACs	Jetson Orin Latency (bs1)	A100 Throughput (bs16)	Checkpoint
EfficientViT-SAM-L0	512x512	45.7	41.8	34.8M	35G	8.2ms	762 images/s	link
EfficientViT-SAM-L1	512x512	46.2	42.1	47.7M	49G	10.2ms	638 images/s	link
EfficientViT-SAM-L2	512x512	46.6	42.7	61.3M	69G	12.9ms	538 images/s	link
EfficientViT-SAM-XL0	1024x1024	47.5	43.9	117.0M	185G	22.5ms	278 images/s	link
EfficientViT-SAM-XL1	1024x1024	47.8	44.4	203.3M	322G	37.2ms	182 images/s	link

Table1: Summary of All EfficientViT-SAM Variants. COCO mAP and LVIS mAP are measured using ViTDet's predicted bounding boxes as the prompt. End-to-end Jetson Orin latency and A100 throughput are measured with TensorRT and fp16.

Image Classification

Semantic Segmentation

Contact

Han Cai: hancai@mit.edu

TODO

ImageNet Pretrained models
Segmentation Pretrained models
ImageNet training code
EfficientViT L series, designed for cloud
EfficientViT for segment anything
EfficientViT for image generation
EfficientViT for CLIP
EfficientViT for super-resolution
Segmentation training code

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
applications		applications
assets		assets
configs		configs
demo		demo
deployment		deployment
efficientvit		efficientvit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_sam_model.py		demo_sam_model.py
demo_seg_model.py		demo_seg_model.py
eval_cls_model.py		eval_cls_model.py
eval_sam_model.py		eval_sam_model.py
eval_seg_model.py		eval_seg_model.py
onnx_export.py		onnx_export.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sam_eval_utils.py		sam_eval_utils.py
setup.py		setup.py
slurm_run_sam.sh		slurm_run_sam.sh
tflite_export.py		tflite_export.py
train_cls_model.py		train_cls_model.py
train_sam_model.py		train_sam_model.py
train_sam_model.sh		train_sam_model.sh

License

mit-han-lab/efficientvit

Folders and files

Latest commit

History

Repository files navigation

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction (paper, poster)

News

About EfficientViT Models

Third-Party Implementation/Integration

Getting Started

EfficientViT Applications

Contact

TODO

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages