Skip to content

esh04/Towards-Coherent-Sequences-of-Audio-Descriptions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

More than a Moment: Towards Coherent Sequences of Audio Descriptions

Eshika Khandelwal1, Junyu Xie2, Tengda Han2, Max Bain2, Arsha Nagrani2, Andrew Zisserman2, Gül Varol3, Makarand Tapaswi1

1 CVIT, IIIT Hyderabad
2 Visual Geometry Group, University of Oxford
3 LIGM, École des Ponts ParisTech


Datasets and Results

Our framework is evaluated on the following datasets:


Coherent-AD Pipeline

1. Video Description

python vlm/main.py \
--dataset={dataset} \                  #e.g. "cmdad"
--video_dir={video_dir} \
--anno_path={anno_path} \              #e.g. "resources/annotations/cmdad_anno_with_face_0.2_0.4.csv"
--charbank_path={charbank_path} \      #e.g. "resources/charbanks/cmdad_charbank.json" 
--model_path={videollama2_ckpt_path} \
--output_dir={output_dir}

2. Summarisation

python llm/main.py \
--path={vlm_result_path} \
--prompt_idx=0

3. Candidate Generation

python llm/main.py \
--path={summarised_result_path} \
--prompt_idx=1

Candidate Scoring

The criteria that evaluate each candidate independently (without requiring context from previous intervals) are scored separately.

python candidate_scorer/independent_scoring/main.py \
--path={multiple_candidates_result_path} \
--criterion="ad"
python candidate_scorer/independent_scoring/main.py \
--path={multiple_candidates_result_path} \
--criterion="counts"

Next, the remaining context-dependent criteria are evaluated while recursively selecting the best candidate.

python candidate_scorer/main.py \
--path={multiple_candidates_result_path} \
--story --redundancy --ad --action --other --char

PS: There is an additional criterion --salience available in the code. (not part of the original paper)


Metrics

StoryRecall

python metrics/storyrecall.py \
--path={path_to_evaluate}

Repetition Metrics

python metrics/repeat.py \
--path={path_to_evaluate}

Citation

If you find this repository helpful, please consider citing our work:

@article{khandelwal2025coherentad,
    title={More than a Moment: Towards Coherent Sequences of Audio Descriptions},
    author    = {Eshika Khandelwal and Junyu Xie and Tengda Han and Max Bain and Arsha Nagrani and Andrew Zisserman and G\"ul Varol and Makarand Tapaswi },
    year={2025},
    url={https://arxiv.org/abs/2510.25440}
}

For any issues or questions while running this repository, please feel free to reach out.

About

Code repository for "More than a Moment: Towards Coherent Sequences of Audio Descriptions". https://arxiv.org/abs/2510.25440

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages