Awesome Video Segmentation

A list of some recent papers on different types of video segmentation tasks.

Updates

Tasks and datasets
Recent papers list
- List of papers
- Links to papers and codes
SOTA result tables on various datasets
Discussion on recent trends
Potential future directions

0. Tasks and Datasets

1. Automatic Video Object Segmentation

2. Semi Automatic Video Object Segmentation

3. Interactive Video Object Segmentation

4. Video Instance Segmentation

5. Actor Action Segmentation

6. Video Semantic Segmentation

7. Video Panoptic Segmentation

8. Depth Aware Video Panoptic Segmentation

9. Panoramic Video Panoptic Segmentation

10. Text Referring Video Object Segmentation

11. Audio Referring Video Object Segmentation

Tasks and Datasets

Task Category	Task	Target	Instances	Tracking	Datasets
Objects	Automatic Video Object Segmentation (AVOS)	Primary moving object	-	-	DAVIS 2016, MoCA, YouTube-VOS, YouTube-Objects
	Semi-automatic VOS (SVOS)	Mask-guided object	-	-	DAVIS'2017
	Interactive VOS (IVOS)	Scribble-guided object	-	-	DAVIS'2017
	Video Instance Segmentation (VIS)	All Objects	✔️	✔️	YouTube-VIS, OVIS
Actor-action	Actor-action segmentation	Primary Object related to actions	-	-	A2D
Scene	Video Semantic Segmentation/ Video Scene Parsing (VSS/ VSP)	All thing and stuff classes	-	-	VIPER, VSPW
	Video Panoptic Segmentation (VPS)	All thing and stuff classes	✔️	✔️	Cityscapes-VPS, VIPER, VIPSeg
	Depth-aware Video Panoptic Segmentation (DVPS)	All thing and stuff classes and depth	✔️	✔️	Cityscapes-DVPS, SemanticKITTI-DVPS
	Panoramic Video Panoptic Segmentation (PVPS)	All thing and stuff classes	✔️	✔️	WOD:PVPS
Multimodal	Text Referring VOS/Referring-VOS (RVOS)	Text reference guided object	-	-	A2D-Sentence, RE-DAVIS, RVOS
	Audio Referring VOS (ARVOS)	Audio reference guided object	-	-	AVOS

Automatic Video Object Segmentation

[MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
[PMN] Unsupervised video object segmentation via prototype memory network. WACV 2023, Paper
[TMO] Treating motion as option to reduce motion dependency in unsupervised video object segmentation. WACV 2023, Paper
[HFAN] Hierarchical feature alignment network for unsupervised video object segmentation. ECCV 2022, Paper, Code
[IMP] Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. AAAI 2022, Paper
[RTNet] Reciprocal transformations for unsupervised video object segmentation. CVPR 2021, Paper, Code
[MATNet] Motionattentive transition for zero-shot video object segmentation. AAAI 2020, Paper, Code
[COSNet] See more,know more: Unsupervised video object segmentation with co-attention Siamese networks. CVPR 2019, Paper, Code

Semi Automatic Video Object Segmentation

[PCVOS] Per-clip video object segmentation. CVPR 2022, Paper, Code
[AOT] Associating objects with transformers for video object segmentation. NeurIPS 2021, Paper
[CFBI+] Collaborative video object segmentation by multiscale foreground-background integration. PAMI 2021, Paper, Code
[SST] SSTVOS: Sparse spatiotemporal transformers for video object segmentation. CVPR 2021, Paper, Code
[STCN] Rethinking space-time networks with improved memory coverage for efficient video object segmentation. NeurIPS 2021, Paper, Code
[HMMN] Hierarchical memory matching network for video object segmentation. ICCV 2021, Paper, Code
[KMN] Kernelized memory network for video object segmentation. ECCV 2020, Paper

Interactive Video Object Segmentation

[MiVOS] Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. CVPR 2021, Paper, Code
[GIS] Guided interactive video object segmentation using reliability-based attention maps. CVPR 2021, Paper, Code
[ATNet] Interactive video object segmentation using global and local transfer modules. ECCV 2020, Paper, Code
[MANet] Memory aggregation networks for efficient interactive video object segmentation. CVPR 2020, Paper, Code
[IPNet] Fast user-guided video object segmentation by interaction-and-propagation networks. CVPR 2019, Paper, Code

Video Instance Segmentation

Methods	Transformer-based	YouTube-VIS-2019	YouTube-VIS-2021	OVIS	UVO
CrossVIS	-	36.6	34.2	14.9	-
VisTR		40.1	-	-	-
IFC		42.6	35.2	-	-
Seq Mask R-CNN	-	47.6	-	-	-
EfficientVIS		39.8	-	-	-
TeViT		46.6	37.9	17.4	-
SeqFormer		59.3	-	-	-
TubeFormer-DeepLab		47.5	41.2	-	-
Video K-Net		51.4	-	-	-
FreeSOLO	-	-	-	-	4.8
IDOL		62.2	56.1	42.6	-
VMT		59.7	-	19.8	-
MS-STS VIS		61.0	-	-	-
InstMove		-	-	30.7	-
GenVIS		64.0	59.6	45.4	-
CAROQ		61.4	54.5	38.2	-
CutLER		-	-	-	10.1

[CutLER] Cut and learn for unsupervised object detection and instance segmentation. CVPR 2023, Paper, Code
[CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
[GenVIS] A generalized framework for video instance segmentation. CVPR 2023, Paper, Code
[InstMove] InstMove: Instance motion for object-centric video segmentation. CVPR 2023, Paper, Code
[MS-STS VIS] Video instance segmentation via multi-scale spatio-temporal split attention transformer. ECCV 2022, Paper, Code
[VMT] Video mask transfiner for high-quality video instance segmentation. ECCV 2022, Paper, Code
[IDOL] In defense of online models for video instance segmentation. ECCV 2022, Paper, Code
[FreeSOLO ] FreeSOLO: Learning to segment objects without annotations. CVPR 2022, Paper, Code
[Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
[TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
[SeqFormer] SeqFormer: Sequential transformer for video instance segmentation. ECCV 2022, Paper, Code
[TeViT] Temporally efficient vision transformer for video instance segmentation. CVPR 2022, Paper, Code
[EfficientVIS] Efficient video instance segmentation via tracklet query and proposal. CVPR 2022, Paper, Code
[Seq Mask R-CNN] Video instance segmentation with a propose-reduce paradigm. ICCV 2021, Paper, Code
[IFC] Video instance segmentation using inter-frame communication transformers. NeurIPS 2021, Paper, Code
[VisTR] End-to-end video instance segmentation with transformers. CVPR 2021, Paper, Code
[CrossVIS] Crossover learning for fast online video instance segmentation. ICCV 2021, Paper, Code

Actor Action Segmentation

Methods	Transformer-based	A2D
Ji et al.	-	36.9
Dang et al.	-	38.6
SSA2D	-	39.5
MED-VT		52.6

[MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
[SSA2D] We don’t need thousand proposals: Single shot actor action detection in videos. *WACV 2021 *, Paper, Code
[Dang et al.] Actor-action semantic segmentation with region masks. BMVC 2018, Paper, Code
[Ji et al.] End-to-end joint semantic segmentation of actors and actions in video. *ECCV 2018 *, Paper, Code

Video Semantic Segmentation

[TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
[CFFM] Coarse-to-fine feature mining for video semantic segmentation. CVPR 2022, Paper, Code
[Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
[SegFormer] SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS 2021, Paper, Code
[TCB] VSPW: A large-scale dataset for video scene parsing in the wild. CVPR 2021, Paper, Code
[STT] Video semantic segmentation via sparse temporal transformer. MM 2021, Paper, Code
[TMANet] Temporal memory attention for video semantic segmentation. * *, Paper, Code

Video Panoptic Segmentation

[CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
[Slot-VPS] Slot-VPS: Object-centric representation learning for video panoptic segmentation. CVPR 2022, Paper, Code
[TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
[ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code
[Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
[SiamTrack] Learning to associate every segment for video panoptic segmentation. CVPR 2021, Paper, Code
[VPSNet] Video panoptic segmentation. CVPR 2020, Paper, Code

Depth Aware Video Panoptic Segmentation

[PolyphonicFormer] PolyphonicFormer: Unified query learning for depth-aware video panoptic segmentation. ECCV 2022, Paper, Code
[TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
[ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code

Panoramic Video Panoptic Segmentation

[ViP-DeepLab+] Waymo open dataset: Panoramic video panoptic segmentation. ECCV 2022, Paper, Code

Text Referring Video Object Segmentation

[ReferFormer] Language as queries for referring video object segmentation. CVPR 2022, Paper, Code
[MTTR] End-to-end referring video object segmentation with multimodal transformers. CVPR 2022, Paper, Code
[YOFO] You only infer once: Cross-modal meta-transfer for referring video object segmentation. AAAI 2022, Paper, Code
[URVOS] URVOS: Unified referring video object segmentation network with a large-scale benchmark. ECCV 2020, Paper, Code

Audio Referring Video Object Segmentation

[Wnet] Wnet: Audio-guided video object segmentation via wavelet-based cross-modal denoising networks. CVPR 2022, Paper, Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Video Segmentation

Updates

Contents

Tasks and Datasets

Automatic Video Object Segmentation

Semi Automatic Video Object Segmentation

Interactive Video Object Segmentation

Video Instance Segmentation

Actor Action Segmentation

Video Semantic Segmentation

Video Panoptic Segmentation

Depth Aware Video Panoptic Segmentation

Panoramic Video Panoptic Segmentation

Text Referring Video Object Segmentation

Audio Referring Video Object Segmentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Video Segmentation

Updates

Contents

Tasks and Datasets

Automatic Video Object Segmentation

Semi Automatic Video Object Segmentation

Interactive Video Object Segmentation

Video Instance Segmentation

Actor Action Segmentation

Video Semantic Segmentation

Video Panoptic Segmentation

Depth Aware Video Panoptic Segmentation

Panoramic Video Panoptic Segmentation

Text Referring Video Object Segmentation

Audio Referring Video Object Segmentation