multilingual TVRetrieval

mTVR: Multilingual Moment Retrieval in Videos, ACL 2021

We introduce MTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips. The dataset is collected by extending the popular TVR dataset (in English) with paired Chinese queries and subtitles. Compared to existing moment retrieval datasets, MTVR is multilingual, larger, and comes with diverse annotations. We further propose mXML, a multilingual moment retrieval model that learns and operates on data from both languages, via encoder parameter sharing and language neighborhood constraints. We demonstrate the effectiveness of mXML on the newly collected MTVR dataset, where mXML outperforms strong monolingual baselines while using fewer parameters. In addition, we also provide detailed dataset analyses and model ablations.

Resources

Data: mTVR dataset
CodaLab Submission: standalone_eval/README.md

Getting started

Prerequisites

Clone this repository

git clone https://github.com/jayleicn/mTVRetrieval.git
cd mTVRetrieval

Prepare feature files

Download mtvr_feature.tar.gz (24GB). After downloading the feature file, extract it to the project directory:

tar -xf path/to/mtvr_features.tar.gz -C .

You should be able to see mtvr_features under your project root directory. It contains video features (ResNet, I3D) and text features (subtitle and query, from RoBERTa). Please refer the TVR repo for details on feature extraction.

Install dependencies.

# 1, create conda environment
conda create -n mtvr python=3.7 -y
conda activate mtvr
# 2, install PyTorch 
conda install pytorch torchvision -c pytorch
conda activate mtvr 
pip install easydict tqdm tensorboard
pip install h5py==2.9.0

Add project root to PYTHONPATH

source setup.sh

Note that you need to do this each time you start a new session.

Training and Inference

mXML training

bash baselines/mxml/scripts/train.sh

Training using the above config will stop at around epoch 60, around 1 day with a single 2080Ti GPU. On val set, for VCMR R@1, IoU=0.7, you should be able to get ~2.4 for Chinese and ~2.9 for English. The resulting model and config will be saved at a dir: baselines/mxml/results/tvr-video_sub-*.

mXML inference

After training, you can inference using the saved model on val or test_public set:

bash baselines/mxml/scripts/inference.sh MODEL_DIR_NAME SPLIT_NAME

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., tvr-video_sub-*. SPLIT_NAME could be val or test_public. By default, this code evaluates all the 3 tasks (VCMR, SVMR, VR), you can change this behavior by appending option, e.g. --tasks VCMR VR where only VCMR and VR are evaluated. The generated predictions will be saved at the same dir as the model, you can evaluate the predictions by following the instructions here Evaluation and Submission.

Evaluation and Submission

We only release ground-truth for train and val splits, to get results on test-public split, please submit your results follow the instructions here: standalone_eval/README.md

Citations

If you find this code useful for your research, please cite our paper:

@inproceedings{lei2020tvr,
  title={mTVR: Multilingual Moment Retrieval in Videos},
  author={Lei, Jie and Berg, Tamara L and Bansal, Mohit},
  booktitle={ACL},
  year={2021}
}

Acknowledgement

This research is supported by grants and awards from NSF, DARPA and ARO. This code is built upon TVRetrieval.

Contact

jielei [at] cs.unc.edu

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
baselines		baselines
data		data
standalone_eval		standalone_eval
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multilingual TVRetrieval

Resources

Getting started

Prerequisites

Training and Inference

Evaluation and Submission

Citations

Acknowledgement

Contact

About

Releases

Packages

Languages

jayleicn/mTVRetrieval

Folders and files

Latest commit

History

Repository files navigation

multilingual TVRetrieval

Resources

Getting started

Prerequisites

Training and Inference

Evaluation and Submission

Citations

Acknowledgement

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages