Skip to content

Latest commit

 

History

History
201 lines (148 loc) · 23.6 KB

README.md

File metadata and controls

201 lines (148 loc) · 23.6 KB

Awesome Video Segmentation

A list of some recent papers on different types of video segmentation tasks.

Updates

  • Tasks and datasets
  • Recent papers list
    • List of papers
    • Links to papers and codes
  • SOTA result tables on various datasets
  • Discussion on recent trends
  • Potential future directions

Contents

0. Tasks and Datasets

1. Automatic Video Object Segmentation

2. Semi Automatic Video Object Segmentation

3. Interactive Video Object Segmentation

4. Video Instance Segmentation

5. Actor Action Segmentation

6. Video Semantic Segmentation

7. Video Panoptic Segmentation

8. Depth Aware Video Panoptic Segmentation

9. Panoramic Video Panoptic Segmentation

10. Text Referring Video Object Segmentation

11. Audio Referring Video Object Segmentation

Tasks and Datasets

Task Category Task Target Instances Tracking Datasets
Objects Automatic Video Object Segmentation (AVOS) Primary moving object - - DAVIS 2016, MoCA, YouTube-VOS, YouTube-Objects
Semi-automatic VOS (SVOS) Mask-guided object - - DAVIS'2017
Interactive VOS (IVOS) Scribble-guided object - - DAVIS'2017
Video Instance Segmentation (VIS) All Objects ✔️ ✔️ YouTube-VIS, OVIS
Actor-action Actor-action segmentation Primary Object related to actions - - A2D
Scene Video Semantic Segmentation/ Video Scene Parsing (VSS/ VSP) All thing and stuff classes - - VIPER, VSPW
Video Panoptic Segmentation (VPS) All thing and stuff classes ✔️ ✔️ Cityscapes-VPS, VIPER, VIPSeg
Depth-aware Video Panoptic Segmentation (DVPS) All thing and stuff classes and depth ✔️ ✔️ Cityscapes-DVPS, SemanticKITTI-DVPS
Panoramic Video Panoptic Segmentation (PVPS) All thing and stuff classes ✔️ ✔️ WOD:PVPS
Multimodal Text Referring VOS/Referring-VOS (RVOS) Text reference guided object - - A2D-Sentence, RE-DAVIS, RVOS
Audio Referring VOS (ARVOS) Audio reference guided object - - AVOS

Automatic Video Object Segmentation

  • [MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
  • [PMN] Unsupervised video object segmentation via prototype memory network. WACV 2023, Paper
  • [TMO] Treating motion as option to reduce motion dependency in unsupervised video object segmentation. WACV 2023, Paper
  • [HFAN] Hierarchical feature alignment network for unsupervised video object segmentation. ECCV 2022, Paper, Code
  • [IMP] Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. AAAI 2022, Paper
  • [RTNet] Reciprocal transformations for unsupervised video object segmentation. CVPR 2021, Paper, Code
  • [MATNet] Motionattentive transition for zero-shot video object segmentation. AAAI 2020, Paper, Code
  • [COSNet] See more,know more: Unsupervised video object segmentation with co-attention Siamese networks. CVPR 2019, Paper, Code

Semi Automatic Video Object Segmentation

  • [PCVOS] Per-clip video object segmentation. CVPR 2022, Paper, Code
  • [AOT] Associating objects with transformers for video object segmentation. NeurIPS 2021, Paper
  • [CFBI+] Collaborative video object segmentation by multiscale foreground-background integration. PAMI 2021, Paper, Code
  • [SST] SSTVOS: Sparse spatiotemporal transformers for video object segmentation. CVPR 2021, Paper, Code
  • [STCN] Rethinking space-time networks with improved memory coverage for efficient video object segmentation. NeurIPS 2021, Paper, Code
  • [HMMN] Hierarchical memory matching network for video object segmentation. ICCV 2021, Paper, Code
  • [KMN] Kernelized memory network for video object segmentation. ECCV 2020, Paper

Interactive Video Object Segmentation

  • [MiVOS] Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. CVPR 2021, Paper, Code
  • [GIS] Guided interactive video object segmentation using reliability-based attention maps. CVPR 2021, Paper, Code
  • [ATNet] Interactive video object segmentation using global and local transfer modules. ECCV 2020, Paper, Code
  • [MANet] Memory aggregation networks for efficient interactive video object segmentation. CVPR 2020, Paper, Code
  • [IPNet] Fast user-guided video object segmentation by interaction-and-propagation networks. CVPR 2019, Paper, Code

Video Instance Segmentation

Methods Transformer-based YouTube-VIS-2019 YouTube-VIS-2021 OVIS UVO
CrossVIS - 36.6 34.2 14.9 -
VisTR 40.1 - - -
IFC 42.6 35.2 - -
Seq Mask R-CNN - 47.6 - - -
EfficientVIS 39.8 - - -
TeViT 46.6 37.9 17.4 -
SeqFormer 59.3 - - -
TubeFormer-DeepLab 47.5 41.2 - -
Video K-Net 51.4 - - -
FreeSOLO - - - - 4.8
IDOL 62.2 56.1 42.6 -
VMT 59.7 - 19.8 -
MS-STS VIS 61.0 - - -
InstMove - - 30.7 -
GenVIS 64.0 59.6 45.4 -
CAROQ 61.4 54.5 38.2 -
CutLER - - - 10.1
  • [CutLER] Cut and learn for unsupervised object detection and instance segmentation. CVPR 2023, Paper, Code
  • [CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
  • [GenVIS] A generalized framework for video instance segmentation. CVPR 2023, Paper, Code
  • [InstMove] InstMove: Instance motion for object-centric video segmentation. CVPR 2023, Paper, Code
  • [MS-STS VIS] Video instance segmentation via multi-scale spatio-temporal split attention transformer. ECCV 2022, Paper, Code
  • [VMT] Video mask transfiner for high-quality video instance segmentation. ECCV 2022, Paper, Code
  • [IDOL] In defense of online models for video instance segmentation. ECCV 2022, Paper, Code
  • [FreeSOLO ] FreeSOLO: Learning to segment objects without annotations. CVPR 2022, Paper, Code
  • [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
  • [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
  • [SeqFormer] SeqFormer: Sequential transformer for video instance segmentation. ECCV 2022, Paper, Code
  • [TeViT] Temporally efficient vision transformer for video instance segmentation. CVPR 2022, Paper, Code
  • [EfficientVIS] Efficient video instance segmentation via tracklet query and proposal. CVPR 2022, Paper, Code
  • [Seq Mask R-CNN] Video instance segmentation with a propose-reduce paradigm. ICCV 2021, Paper, Code
  • [IFC] Video instance segmentation using inter-frame communication transformers. NeurIPS 2021, Paper, Code
  • [VisTR] End-to-end video instance segmentation with transformers. CVPR 2021, Paper, Code
  • [CrossVIS] Crossover learning for fast online video instance segmentation. ICCV 2021, Paper, Code

Actor Action Segmentation

Methods Transformer-based A2D
Ji et al. - 36.9
Dang et al. - 38.6
SSA2D - 39.5
MED-VT 52.6
  • [MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
  • [SSA2D] We don’t need thousand proposals: Single shot actor action detection in videos. *WACV 2021 *, Paper, Code
  • [Dang et al.] Actor-action semantic segmentation with region masks. BMVC 2018, Paper, Code
  • [Ji et al.] End-to-end joint semantic segmentation of actors and actions in video. *ECCV 2018 *, Paper, Code

Video Semantic Segmentation

  • [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
  • [CFFM] Coarse-to-fine feature mining for video semantic segmentation. CVPR 2022, Paper, Code
  • [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
  • [SegFormer] SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS 2021, Paper, Code
  • [TCB] VSPW: A large-scale dataset for video scene parsing in the wild. CVPR 2021, Paper, Code
  • [STT] Video semantic segmentation via sparse temporal transformer. MM 2021, Paper, Code
  • [TMANet] Temporal memory attention for video semantic segmentation. * *, Paper, Code

Video Panoptic Segmentation

  • [CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
  • [Slot-VPS] Slot-VPS: Object-centric representation learning for video panoptic segmentation. CVPR 2022, Paper, Code
  • [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
  • [ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code
  • [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
  • [SiamTrack] Learning to associate every segment for video panoptic segmentation. CVPR 2021, Paper, Code
  • [VPSNet] Video panoptic segmentation. CVPR 2020, Paper, Code

Depth Aware Video Panoptic Segmentation

  • [PolyphonicFormer] PolyphonicFormer: Unified query learning for depth-aware video panoptic segmentation. ECCV 2022, Paper, Code
  • [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
  • [ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code

Panoramic Video Panoptic Segmentation

  • [ViP-DeepLab+] Waymo open dataset: Panoramic video panoptic segmentation. ECCV 2022, Paper, Code

Text Referring Video Object Segmentation

  • [ReferFormer] Language as queries for referring video object segmentation. CVPR 2022, Paper, Code
  • [MTTR] End-to-end referring video object segmentation with multimodal transformers. CVPR 2022, Paper, Code
  • [YOFO] You only infer once: Cross-modal meta-transfer for referring video object segmentation. AAAI 2022, Paper, Code
  • [URVOS] URVOS: Unified referring video object segmentation network with a large-scale benchmark. ECCV 2020, Paper, Code

Audio Referring Video Object Segmentation

  • [Wnet] Wnet: Audio-guided video object segmentation via wavelet-based cross-modal denoising networks. CVPR 2022, Paper, Code