Filip's thesis @ CVL

Link to full thesis: filipskogh.com/thesis.pdf

Weakly supervised Video Object Segmentation

Video object segmentation is a fundamental problem in computer vision used in a variety of application across many fields. Over the past few years video object segmentation has witnessed rapid progress catalyzed by increasingly large datasets. These datasets consisting of pixel-accurate masks with object association between frames are especially labor-intensive and costly, prohibiting truly large-scale datasets. We propose a video object segmentation model capable of being trained exclusively with bounding boxes, a cheaper type of annotation. To achieve this, our method employs loss functions tailored for box-annotations that leverages self-supervision through color similarity and spatio-temporal coherence. We validate our approach against traditional fully-supervised methods and various other settings on YouTube-VOS and DAVIS, achieving over 90% relative performance on $J & F$ in comparison to fully-supervised models in the box-initialization setting, while scoring around 85% in the mask-initialization setting. We also investigate practical aspects of our model, achieving a relative performance of 87% on longer term videos with 1000s of frames. We also perform ablations both quantitatively and qualitatively and show visually how the loss function improves fine-detail along with failure cases. Moreover, our method is practical with over 22 frames per second on the YouTubeVOS validation set.

Base training code is from XMem

@inproceedings{cheng2022xmem,
  title={{XMem}: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model},
  author={Cheng, Ho Kei and Alexander G. Schwing},
  booktitle={ECCV},
  year={2022}
}

Name	Name	Last commit message	Last commit date
Latest commit fiskrt feat: add link to pdf Oct 9, 2024 6675e13 · Oct 9, 2024 History 137 Commits
dataset	dataset	Added first_frame_only training to vos dataset	Aug 2, 2023
docs	docs	fix gid	Oct 18, 2022
inference	inference	Added MOSETestDataset for MOSE evaluation	Aug 2, 2023
model	model	modify loss test	Aug 9, 2023
scripts	scripts	update resize youtube to new ver	Jul 27, 2022
util	util	Added MOSE option, first_frame_training, original loss option	Aug 2, 2023
.gitignore	.gitignore	Added *.pth to gitignore	Aug 2, 2023
LICENSE	LICENSE	Create LICENSE	Jul 17, 2022
README.md	README.md	feat: add link to pdf	Oct 9, 2024
create_video.py	create_video.py	Added train and test files	Sep 14, 2023
eval.py	eval.py	Fixed first frame not being predicted when using first_frame_bbox. Ad…	Aug 2, 2023
interactive_demo.py	interactive_demo.py	fixed GUI on Windows; fixed mask buffering	Jul 9, 2022
losses_test.py	losses_test.py	modify loss test	Aug 9, 2023
merge_multi_scale.py	merge_multi_scale.py	upload all code	Jul 6, 2022
myeval.py	myeval.py	Added eval SLURM script	Mar 19, 2023
requirements.txt	requirements.txt	Color-space conversion now done on GPU	Jun 26, 2023
requirements_demo.txt	requirements_demo.txt	fixes	Jul 6, 2022
slurm_train.py	slurm_train.py	Added eval SLURM script	Mar 19, 2023
test.sh	test.sh	Added train and test files	Sep 14, 2023
train.py	train.py	Added MOSE training	Aug 2, 2023
train.sh	train.sh	Added train and test files	Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Filip's thesis @ CVL

Weakly supervised Video Object Segmentation

Base training code is from XMem

About

Releases 1

Packages

Languages

License

fiskrt/XMem

Folders and files

Latest commit

History

Repository files navigation

Filip's thesis @ CVL

Weakly supervised Video Object Segmentation

Base training code is from XMem

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages