This repository provides code implementations for baselines and links to the proposed dataset mentioned in the paper "Watching the News: Towards VideoQA Models that can Read" (WACV-2023).
Video Question Answering methods focus on commonsense reasoning and visual cognition of objects or persons and their interactions over time. Current VideoQA approaches ignore the textual information present in the video. We introduce the ``NewsVideoQA'' dataset that comprises more than 8,600+ QA pairs on 3,000+ news videos obtained from diverse news channels from around the world.
- BERT:
baselines/BERT
- M4C:
baselines/M4C
- SINGULARITY:
baselines/SINGULARITY
If you find our dataset/code useful, feel free to leave a star and please cite our paper as follows:
@inproceedings{DBLP:conf/wacv/JahagirdarMKJ23,
author = {Soumya Jahagirdar and
Minesh Mathew and
Dimosthenis Karatzas and
C. V. Jawahar},
title = {Watching the News: Towards VideoQA Models that can Read},
booktitle = {{IEEE/CVF} Winter Conference on Applications of Computer Vision, {WACV}
2023, Waikoloa, HI, USA, January 2-7, 2023},
pages = {4430--4439},
publisher = {{IEEE}},
year = {2023},
}
For any clarifications, comments, or suggestions, please create an issue or contact Soumya Shamarao Jahagirdar.