Skip to content

fiskrt/XMem

This branch is 64 commits ahead of, 29 commits behind hkchengrex/XMem:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6675e13 · Oct 9, 2024
Aug 2, 2023
Oct 18, 2022
Aug 2, 2023
Aug 9, 2023
Jul 27, 2022
Aug 2, 2023
Aug 2, 2023
Jul 17, 2022
Oct 9, 2024
Sep 14, 2023
Aug 2, 2023
Jul 9, 2022
Aug 9, 2023
Jul 6, 2022
Mar 19, 2023
Jun 26, 2023
Jul 6, 2022
Mar 19, 2023
Sep 14, 2023
Aug 2, 2023
Sep 14, 2023

Repository files navigation

Filip's thesis @ CVL

Link to full thesis: filipskogh.com/thesis.pdf

Weakly supervised Video Object Segmentation

Video object segmentation is a fundamental problem in computer vision used in a variety of application across many fields. Over the past few years video object segmentation has witnessed rapid progress catalyzed by increasingly large datasets. These datasets consisting of pixel-accurate masks with object association between frames are especially labor-intensive and costly, prohibiting truly large-scale datasets. We propose a video object segmentation model capable of being trained exclusively with bounding boxes, a cheaper type of annotation. To achieve this, our method employs loss functions tailored for box-annotations that leverages self-supervision through color similarity and spatio-temporal coherence. We validate our approach against traditional fully-supervised methods and various other settings on YouTube-VOS and DAVIS, achieving over 90% relative performance on J & F in comparison to fully-supervised models in the box-initialization setting, while scoring around 85% in the mask-initialization setting. We also investigate practical aspects of our model, achieving a relative performance of 87% on longer term videos with 1000s of frames. We also perform ablations both quantitatively and qualitatively and show visually how the loss function improves fine-detail along with failure cases. Moreover, our method is practical with over 22 frames per second on the YouTubeVOS validation set.

Base training code is from XMem

@inproceedings{cheng2022xmem,
  title={{XMem}: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model},
  author={Cheng, Ho Kei and Alexander G. Schwing},
  booktitle={ECCV},
  year={2022}
}

About

XMem but without the mask-annotations

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.6%
  • Cuda 2.0%
  • C++ 1.5%
  • Other 0.9%