A list of some recent papers on different types of video segmentation tasks.
- Tasks and datasets
- Recent papers list
- List of papers
- Links to papers and codes
- SOTA result tables on various datasets
- Discussion on recent trends
- Potential future directions
1. Automatic Video Object Segmentation
2. Semi Automatic Video Object Segmentation
3. Interactive Video Object Segmentation
4. Video Instance Segmentation
6. Video Semantic Segmentation
7. Video Panoptic Segmentation
8. Depth Aware Video Panoptic Segmentation
9. Panoramic Video Panoptic Segmentation
Task Category | Task | Target | Instances | Tracking | Datasets |
---|---|---|---|---|---|
Objects | Automatic Video Object Segmentation (AVOS) | Primary moving object | - | - | DAVIS 2016, MoCA, YouTube-VOS, YouTube-Objects |
Semi-automatic VOS (SVOS) | Mask-guided object | - | - | DAVIS'2017 | |
Interactive VOS (IVOS) | Scribble-guided object | - | - | DAVIS'2017 | |
Video Instance Segmentation (VIS) | All Objects | ✔️ | ✔️ | YouTube-VIS, OVIS | |
Actor-action | Actor-action segmentation | Primary Object related to actions | - | - | A2D |
Scene | Video Semantic Segmentation/ Video Scene Parsing (VSS/ VSP) | All thing and stuff classes | - | - | VIPER, VSPW |
Video Panoptic Segmentation (VPS) | All thing and stuff classes | ✔️ | ✔️ | Cityscapes-VPS, VIPER, VIPSeg | |
Depth-aware Video Panoptic Segmentation (DVPS) | All thing and stuff classes and depth | ✔️ | ✔️ | Cityscapes-DVPS, SemanticKITTI-DVPS | |
Panoramic Video Panoptic Segmentation (PVPS) | All thing and stuff classes | ✔️ | ✔️ | WOD:PVPS | |
Multimodal | Text Referring VOS/Referring-VOS (RVOS) | Text reference guided object | - | - | A2D-Sentence, RE-DAVIS, RVOS |
Audio Referring VOS (ARVOS) | Audio reference guided object | - | - | AVOS |
- [MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
- [PMN] Unsupervised video object segmentation via prototype memory network. WACV 2023, Paper
- [TMO] Treating motion as option to reduce motion dependency in unsupervised video object segmentation. WACV 2023, Paper
- [HFAN] Hierarchical feature alignment network for unsupervised video object segmentation. ECCV 2022, Paper, Code
- [IMP] Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. AAAI 2022, Paper
- [RTNet] Reciprocal transformations for unsupervised video object segmentation. CVPR 2021, Paper, Code
- [MATNet] Motionattentive transition for zero-shot video object segmentation. AAAI 2020, Paper, Code
- [COSNet] See more,know more: Unsupervised video object segmentation with co-attention Siamese networks. CVPR 2019, Paper, Code
- [PCVOS] Per-clip video object segmentation. CVPR 2022, Paper, Code
- [AOT] Associating objects with transformers for video object segmentation. NeurIPS 2021, Paper
- [CFBI+] Collaborative video object segmentation by multiscale foreground-background integration. PAMI 2021, Paper, Code
- [SST] SSTVOS: Sparse spatiotemporal transformers for video object segmentation. CVPR 2021, Paper, Code
- [STCN] Rethinking space-time networks with improved memory coverage for efficient video object segmentation. NeurIPS 2021, Paper, Code
- [HMMN] Hierarchical memory matching network for video object segmentation. ICCV 2021, Paper, Code
- [KMN] Kernelized memory network for video object segmentation. ECCV 2020, Paper
- [MiVOS] Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. CVPR 2021, Paper, Code
- [GIS] Guided interactive video object segmentation using reliability-based attention maps. CVPR 2021, Paper, Code
- [ATNet] Interactive video object segmentation using global and local transfer modules. ECCV 2020, Paper, Code
- [MANet] Memory aggregation networks for efficient interactive video object segmentation. CVPR 2020, Paper, Code
- [IPNet] Fast user-guided video object segmentation by interaction-and-propagation networks. CVPR 2019, Paper, Code
Methods | Transformer-based | YouTube-VIS-2019 | YouTube-VIS-2021 | OVIS | UVO |
---|---|---|---|---|---|
CrossVIS | - | 36.6 | 34.2 | 14.9 | - |
VisTR | 40.1 | - | - | - | |
IFC | 42.6 | 35.2 | - | - | |
Seq Mask R-CNN | - | 47.6 | - | - | - |
EfficientVIS | 39.8 | - | - | - | |
TeViT | 46.6 | 37.9 | 17.4 | - | |
SeqFormer | 59.3 | - | - | - | |
TubeFormer-DeepLab | 47.5 | 41.2 | - | - | |
Video K-Net | 51.4 | - | - | - | |
FreeSOLO | - | - | - | - | 4.8 |
IDOL | 62.2 | 56.1 | 42.6 | - | |
VMT | 59.7 | - | 19.8 | - | |
MS-STS VIS | 61.0 | - | - | - | |
InstMove | - | - | 30.7 | - | |
GenVIS | 64.0 | 59.6 | 45.4 | - | |
CAROQ | 61.4 | 54.5 | 38.2 | - | |
CutLER | - | - | - | 10.1 |
- [CutLER] Cut and learn for unsupervised object detection and instance segmentation. CVPR 2023, Paper, Code
- [CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
- [GenVIS] A generalized framework for video instance segmentation. CVPR 2023, Paper, Code
- [InstMove] InstMove: Instance motion for object-centric video segmentation. CVPR 2023, Paper, Code
- [MS-STS VIS] Video instance segmentation via multi-scale spatio-temporal split attention transformer. ECCV 2022, Paper, Code
- [VMT] Video mask transfiner for high-quality video instance segmentation. ECCV 2022, Paper, Code
- [IDOL] In defense of online models for video instance segmentation. ECCV 2022, Paper, Code
- [FreeSOLO ] FreeSOLO: Learning to segment objects without annotations. CVPR 2022, Paper, Code
- [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
- [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
- [SeqFormer] SeqFormer: Sequential transformer for video instance segmentation. ECCV 2022, Paper, Code
- [TeViT] Temporally efficient vision transformer for video instance segmentation. CVPR 2022, Paper, Code
- [EfficientVIS] Efficient video instance segmentation via tracklet query and proposal. CVPR 2022, Paper, Code
- [Seq Mask R-CNN] Video instance segmentation with a propose-reduce paradigm. ICCV 2021, Paper, Code
- [IFC] Video instance segmentation using inter-frame communication transformers. NeurIPS 2021, Paper, Code
- [VisTR] End-to-end video instance segmentation with transformers. CVPR 2021, Paper, Code
- [CrossVIS] Crossover learning for fast online video instance segmentation. ICCV 2021, Paper, Code
Methods | Transformer-based | A2D |
---|---|---|
Ji et al. | - | 36.9 |
Dang et al. | - | 38.6 |
SSA2D | - | 39.5 |
MED-VT | 52.6 |
- [MED-VT] MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. CVPR 2023, Paper Code
- [SSA2D] We don’t need thousand proposals: Single shot actor action detection in videos. *WACV 2021 *, Paper, Code
- [Dang et al.] Actor-action semantic segmentation with region masks. BMVC 2018, Paper, Code
- [Ji et al.] End-to-end joint semantic segmentation of actors and actions in video. *ECCV 2018 *, Paper, Code
- [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
- [CFFM] Coarse-to-fine feature mining for video semantic segmentation. CVPR 2022, Paper, Code
- [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
- [SegFormer] SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS 2021, Paper, Code
- [TCB] VSPW: A large-scale dataset for video scene parsing in the wild. CVPR 2021, Paper, Code
- [STT] Video semantic segmentation via sparse temporal transformer. MM 2021, Paper, Code
- [TMANet] Temporal memory attention for video semantic segmentation. * *, Paper, Code
- [CAROQ] Context-aware relative object queries to unify video instance and panoptic segmentation. CVPR 2023, Paper, Code
- [Slot-VPS] Slot-VPS: Object-centric representation learning for video panoptic segmentation. CVPR 2022, Paper, Code
- [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
- [ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code
- [Video K-Net] Video K-Net: A simple, strong, and unified baseline for video segmentation. CVPR 2022, Paper, Code
- [SiamTrack] Learning to associate every segment for video panoptic segmentation. CVPR 2021, Paper, Code
- [VPSNet] Video panoptic segmentation. CVPR 2020, Paper, Code
- [PolyphonicFormer] PolyphonicFormer: Unified query learning for depth-aware video panoptic segmentation. ECCV 2022, Paper, Code
- [TubeFormer-DeepLab] TubeFormer-DeepLab: Video mask transformer. CVPR 2022, Paper, Code
- [ViP-Deeplab] ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. CVPR 2021, Paper, Code
- [ReferFormer] Language as queries for referring video object segmentation. CVPR 2022, Paper, Code
- [MTTR] End-to-end referring video object segmentation with multimodal transformers. CVPR 2022, Paper, Code
- [YOFO] You only infer once: Cross-modal meta-transfer for referring video object segmentation. AAAI 2022, Paper, Code
- [URVOS] URVOS: Unified referring video object segmentation network with a large-scale benchmark. ECCV 2020, Paper, Code