A pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.
-
CNN daily mail validation data, NVIDIA-V100-16GB
BatchSize 32 64 128 prophetnet 2.4 samples/s 2.8 samples/s OOM above + fastseq 6.0 samples/s 7.6 samples/s 10.7 samples/s
ProphetNet-large-160GB (fine-tuned on CNN/Daily Mail with 9 epochs) link
CNN/DM validation data
$ fastseq-generate-for-fairseq \
cnn_dm_bert/len-512.bin \
--path prophetnet/model.pt \
--fp16 \
--task translation_prophetnet \
--batch-size BATCH_SIZE \
--beam 4 \
--num-workers 4 \
--min-len 55 \
--max-len-b 140 \
--no-repeat-ngram-size 3 \
--lenpen 2.0 \
--remove-bpe \
--gen-subset valid \
To get baseline speed number which doesn't use FastSeq optimizations, replace fastseq-generate-for-fairseq
by fairseq-generate
.
Refer to file.
bash generate_binary_data_for_prophetnet.sh INPUT_DATA_DIR