Live Video Captioning (LVC) involves detecting and describing dense events within video streams. Traditional dense video captioning approaches typically focus on offline solutions where the entire video is available for analysis by the captioning model. In contrast, the LVC paradigm requires models to generate captions for video streams in an online manner. This imposes significant constraints, such as working with incomplete observations of the video and the need for temporal anticipation.
In this repository we release the evaluation toolkit for the LVC problem, where we include the scripts for the novel Live Score metric detailed in our [paper].
If you use any content of this repo for your work, please cite the following bib entry:
@article{lvc2024,
title={Live Video Captioning},
author={Eduardo Blanco-Fernández and Carlos Gutiérrez-Álvarez and Nadia Nasri and Saturnino Maldonado-Bascón and Roberto J. López-Sastre},
journal={arXiv preprint arXiv:2406.14206},
year={2024}
}
Dependencies:
- Python 3.9: We recommend using Anaconda to create a virtual environment with the required dependencies:
conda create -n lvc python=3.9
- Activate the environment:
conda activate lvc
- Python dependencies:
pip install -r requirements.txt
- Activate the environment:
- Java 1.8.0
Some basic instructions:
-
Clone this github repository.
-
Unzip the file with the LVC annotations for the ActivityNet Captions dataset.
cd lvc/data/validation
tar -xvzf data_validation.tar.gz
- Unzip the file with the dense captions produced by our LVC model
cd lvc/data/captions
tar -xvzf data_captions.tar.gz
-
To obtain the Live Score metric, for all the scorers used in our paper (METEOR, Bleu_4 and ROUGE_L), run the script: generate_results_lvc.py. This script will generate the results for the LVC model and the scorers used in our paper. The results will be saved in the
results
folder. Then, run the script average_scores.py -
To obtain the live evolution of the Live Score metric for a particular video, run the following script: generate_images.py This script generates the images that we release in the paper, where one can observe the evolution of the novel metrics in all the videos.
Enjoy!
- Save your LVC results in a JSON file. Use the JSON format detailed in the ActivityNet Captions challenge. We provide a sample JSON file in the results folder.
- Specify the delta_t parameter your LVC model is using to cast the dense video captions, and run the generate_results_lvc.py with your JSON file.
This repository is released under the GNU General Public License v3.0 License (refer to the LICENSE file for details).