![](/hustvl/EVF-SAM/raw/main/assets/logo.jpg)
Yuxuan Zhang1,*, Tianheng Cheng1,*, Lei Liu2, Heng Liu2, Longjin Ran2, Xiaoxin Chen2, Wenyu Liu1, Xinggang Wang1,📧
1 Huazhong University of Science and Technology, 2 vivo AI Lab
(* equal contribution, 📧 corresponding author)
- EVF-SAM extends SAM's capabilities with text-prompted segmentation, achieving high accuracy in Referring Expression Segmentation.
- EVF-SAM is designed for efficient computation, enabling rapid inference in few seconds per image on a T4 GPU.
- Release code
- Release weights
- Release demo
- clone this repository
- install pytorch for your cuda version
- pip install -r requirements.txt
Name | SAM | BEIT-3 | Params | Reference Score |
EVF-SAM | SAM-H | BEIT-3-L | 1.32B | 83.7 |
EVF-Effi-SAM-L | EfficientSAM-S | BEIT-3-L | 700M | 83.5 |
EVF-Effi-SAM-B | EfficientSAM-T | BEIT-3-B | 232M | 80.0 |
python inference.py \
--version <path to evf-sam> \
--precision='fp16' \
--vis_save_path "<path to your output direction>" \
--model_type <"ori" or "effi", depending on your loaded ckpt> \
--image_path <path to your input image> \
--prompt <customized text prompt>
--load_in_8bit
and --load_in_4bit
is optional
for example:
python inference.py \
--version evf-sam-21 \
--precision='fp16' \
--vis_save_path "infer" \
--model_type ori \
--image_path "assets/zebra.jpg" \
--prompt "zebra top left"
python demo.py <path to evf-sam>
Referring segmentation datasets: refCOCO, refCOCO+, refCOCOg, refCLEF (saiapr_tc-12) and COCO2014train
├── dataset
│ ├── refer_seg
│ │ ├── images
│ │ | ├── saiapr_tc-12
│ │ | └── mscoco
│ │ | └── images
│ │ | └── train2014
│ │ ├── refclef
│ │ ├── refcoco
│ │ ├── refcoco+
│ │ └── refcocog
torchrun --standalone --nproc_per_node <num_gpus> eval.py \
--version <path to evf-sam> \
--dataset_dir <path to your data root> \
--val_dataset "refcoco|unc|val"
We borrow some codes from LISA, unilm, SAM, EfficientSAM.
@article{zhang2024evfsamearlyvisionlanguagefusion,
title={EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model},
author={Yuxuan Zhang and Tianheng Cheng and Rui Hu and Lei Liu and Heng Liu and Longjin Ran and Xiaoxin Chen and Wenyu Liu and Xinggang Wang},
year={2024},
eprint={2406.20076},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.20076},
}