LV-VIS: Large-Vocabulary Video Instance Segmentation dataset

📄[arXiv] 📄[ICCV(Oral)] 🔥[Dataset Download] 🔥[Evaluation Server]

This repo is the official implementation of Towards Open Vocabulary Video Instance Segmentation (ICCV2023 oral)

News

We are working on the final revision of the annotations. The Codalab set will be released in November.

Towards Open Vocabulary Video Instance Segmentation

Haochen Wang¹, Cilin Yan², Shuai Wang ¹, Xiaolong Jiang ³, Xu Tang³, Yao Hu³, Weidi Xie ⁴,Efstratios Gavves ¹

¹University of Amsterdam, ²Beihang University, ³Xiaohongshu Inc, ⁴ Shanghai Jiao Tong University.

LV-VIS dataset

LV-VIS is a dataset/benchmark for Open-Vocabulary Video Instance Segmentation. It contains a total of 4,828 videos with pixel-level segmentation masks for 26,099 objects from 1,196 unique categories.

Dataset Download

	Videos	Annotations	Annotations (oracle)	Submission Example
Training	Download	Download	-	-
Validation	Download	Download	Download	-
Test	Download	-	-	Download

Dataset Structure

## JPEGImages

|- train
  |- 00000
    |- 00000.jpg
    |- 00001.jpg
       ...
  |- 00001
    |- 00000.jpg
    |- 00001.jpg
       ...
    ...
|- val
    ...
|- test
    ...

## Annotations
train_instances.json
val_instances.json
image_val_instances.json

The annotation files have the same formation as Youtube-VIS 2019.

Annotation Tool

We used this platform for the annotation of LV-VIS. This platform is a smart video segmentation annotation tool based on Lableme, SAM, and STCN. See segment-anything-annotator.

Baseline

We provide our baseline OV2Seg code for LV-VIS. Please check Baseline.md for more details.

TODO

Training and inference code of OV2Seg
Leaderboard for Val/test set

NOTE:

We haven't decided to release the annotation file for the test set yet. Please be patient.
The training set is not exhaustively annotated.
If you find mistakes in the annotations, please contact us (h.wang3@uva.nl). We will update the annotations.

Cite

@inproceedings{wang2023towards,
  title={Towards Open-Vocabulary Video Instance Segmentation},
  author={Wang, Haochen and Yan, Cilin and Wang, Shuai and Jiang, Xiaolong and Tang, XU and Hu, Yao and Xie, Weidi and Gavves, Efstratios},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

Acknowledgement

This repo is built based on Mask2Former and Detic, thanks for those excellent projects.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assert		assert
configs/lvvis/instance-segmentation		configs/lvvis/instance-segmentation
datasets/metadata		datasets/metadata
evaluate		evaluate
ov2seg		ov2seg
scripts		scripts
tools		tools
Baseline.md		Baseline.md
LICENSE		LICENSE
README.md		README.md
mAP.py		mAP.py
requirements.txt		requirements.txt
train_net.py		train_net.py
train_net_video.py		train_net_video.py

License

haochenheheda/LVVIS

Folders and files

Latest commit

History

Repository files navigation

LV-VIS: Large-Vocabulary Video Instance Segmentation dataset

News

LV-VIS dataset

Dataset Download

Dataset Structure

TODO

Cite

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages