MSPAN-VideoQA

Multi-Scale Progressive Attention Network for Video Question Answering, ACL 2021.

Zhicheng Guo, Jiaxuan Zhao, Licheng Jiao, Xu Liu, Lingling Li

Setups

Install the python dependency packages:
```
pip install -r requirements.txt
```
Download TGIF-QA, MSVD-QA, MSRVTT-QA datasets and edit absolute paths in preprocess/question_features.py , preprocess/appearance_features.py and preprocess/motion_features.py upon where you locate your data.

Preprocessing features

For above three datasets of VideoQA, you can choose 3 options of --dataset:

tgif-qa, msvd-qa and msrvtt-qa.

For different datasets, you can choose 5 options of --question_type:

none, action, count, frameqa and transition.

Extracting question features

Download Glove 300D to preprocess/pretrained/ and process it into a pickle file:
```
python preprocess/txt2pickle.py
```

To extract question features.

For TGIF-QA dataset:

python preprocess/question_features.py 
        --dataset tgif-qa \
        --question_type action \
        --mode total
python preprocess/question_features.py \
        --dataset tgif-qa \
        --question_type action \
        --mode train
python preprocess/question_features.py \
        --dataset tgif-qa \
        --question_type action \
        --mode test

For MSVD-QA/MSRVTT-QA dataset:

python preprocess/question_features.py \
        --dataset msvd-qa \
        --question_type none \
        --mode total
python preprocess/question_features.py \
        --dataset msvd-qa \
        --question_type none \
        --mode train
python preprocess/question_features.py \
        --dataset msvd-qa \
        --question_type none \
        --mode val
python preprocess/question_features.py \
        --dataset msvd-qa \
        --question_type none \
        --mode test

Extracting visual features

Download pre-trained 3D-ResNet152 to preprocess/pretrained/ .

You can learn more about this model in the following paper:

"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs", arXiv preprint, 2020.

To extract appearance features:

python preprocess/appearance_features.py \
        --gpu_id 0 \
        --dataset tgif-qa \
        --question_type action \
        --feature_type pool5 \
        --num_frames 16

To extract motion features:

python preprocess/motion_features.py \
        --gpu_id 0 \
        --dataset tgif-qa \
        --question_type action \
        --num_frames 16

Training

You can choose the suitable --dataset and --question_type to start training:

python train.py \
        --dataset tgif-qa \
        --question_type action \
        --T 2 \
        --K 3 \
        --num_scale 8 \
        --num_frames 16 \
        --gpu_id 0 \
        --max_epochs 30 \
        --batch_size 64 \
        --dropout 0.1 \
        --model_id 0 \
        --use_test \
        --use_train

Or, you can run the following command to start training:

sh train_sh/action.sh

You can see the training commands for all datasets and tasks under the train_sh folder.

Evaluation

You can download our pre-trained models from here.

To evaluate the trained model, run the following command:

sh test_sh/action.sh

You can see the evaluating commands for all datasets and tasks under the test_sh folder.

Citation

@inproceedings{guo2021multi,
  title={Multi-scale progressive attention network for video question answering},
  author={Guo, Zhicheng and Zhao, Jiaxuan and Jiao, Licheng and Liu, Xu and Li, Lingling},
  booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  pages={973--978},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
preprocess		preprocess
test_sh		test_sh
train_sh		train_sh
DataLoader.py		DataLoader.py
README.md		README.md
SSP.py		SSP.py
network.py		network.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSPAN-VideoQA

Setups

Preprocessing features

Extracting question features

Extracting visual features

Training

Evaluation

Citation

About

Releases

Packages

Languages

gzcsudo/MSPAN-VideoQA

Folders and files

Latest commit

History

Repository files navigation

MSPAN-VideoQA

Setups

Preprocessing features

Extracting question features

Extracting visual features

Training

Evaluation

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages