Accepted at the EMNLP 2022
The reorganized How2-MCLS text data can be downloaded from here [Baidu Netdisk, Passcode: 6cd9], as well as video features [Baidu Netdisk, Passcode: eqqj] (derived from the original How2 dataset). The original How2 dataset for multimodal summarization is provided by https://github.com/srvk/how2-dataset.
Some demo data is placed in "data/demo_data" folder, and you can replace the demo data with the full How2-MCLS dataset, following the format of "data/demo_data" folder. Then run the following command to preprocess the data.
python preprocess.py #Please modify the data storage path configuration.
You can run the following script commands to execute the training and prediction procedures of the proposed models, VDF, VDF-TS-E, and VDF-TS-V.
VDF
./run_scripts/VDF.sh
VDF-TS-E
./run_scripts/VDF-TS-E.sh
VDF-TS-V
./run_scripts/VDF-TS-V.sh
Alternatively, we also provide a well-trained first-stage model [Baidu Netdisk, Passcode: rcqo] that you can choose to use directly to skip the first-stage training in the triple-stage training framework.
nmtpytorch library is used to evaluate models, which includes BLEU (1, 2, 3, 4), ROUGE-L, METEOR, and CIDEr evaluation metrics.
As an alternative, nlg-eval evaluation library can obtain the same evaluation scores as nmtpytorch.
In addition, ROUGE evaluation library is used to calculate the ROUGE (1, 2, L) score.
We are very grateful that the code is based on MFN, nmtpytorch, fairseq, machine-translation, and Transformers.