This is the implementation for the paper "Probability Distribution Based Frame-supervised Language-driven Action Localization" (ACM MM2023). Arxiv Preprint
This repository is based on the repository of the paper "Video Moment Retrieval from Text Queries via Single Frame Annotation".
- pytorch=1.10.0
- python=3.7
- numpy
- scipy
- pyyaml
- tqdm
You can also run the following commands to prepare the conda environmnet.
# preparing environment
bash conda.sh
conda activate DBFS
The frame-annotations we used are available in the data/charadessta/annotations
and data/tacos/annotations
folder.
We use I3D features for charadessta and C3D features for tacos. I3D features for charadessta can be downloaded from link. C3D features for tacos can be downloaded from link and be extracted as individual files. Then save them to the data/charadessta/features
and data/tacos/features
folder seperately.
Please also download glove to the data/glove
folder.
Our trained model are provided in link. Please download them to the ckpt/
folder.
Run the following commands for evaluation:
# Evaluate charades
python -m src.experiment.eval --exp ckpt/charades
# Evaluate tacos
python -m src.experiment.eval --exp ckpt/tacos
Shuo Yang, Zirui Shang, and Xinxiao Wu. 2023. Probability Distribution Based Frame-supervised Language-driven Action Localization. In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23), October 29–November 3, 2023, Ottawa, ON, Canada. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3581783.3612512