Skip to content
/ TEGTOK Public

Official code for Findings of ACL 2022 paper: "TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge"

License

Notifications You must be signed in to change notification settings

lxchtan/TEGTOK

Repository files navigation

TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge

This repository contains the source code for the Findings of ACL 2022 paper TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge. Chao-Hong Tan, Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Huang Hu, Xiubo Geng, Daxin Jiang.

Introduction

Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models.

Python environment

The requirements package is in requirements.txt.

If you are using nvidia's GPU and CUDA version supports 11.0, you can use the following code to create the desired virtual python environment:

conda create -n TegTok python=3.8
conda activate TegTok
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install -r requirements.txt

Instruction

First, download the datasets from Google Drive and put the two bz2 files into data folder.

Then unpack data files:

cd data
tar -jxvf reddit.tar.bz2
tar -jxvf squad_nqg.tar.bz2

Please refer to the shell file under the run_shell folder.

Evaluation

The Reddit dialogue generation task can be evaluated by scripts run_shell/cal_score.sh.

The squad_nqg task can be evaluated by the tools offer by xinyadu/nqg: neural question generation for reading comprehension. First you should convert the output format to xinyadu/nqg version using the code analyse/convert_squad_nqg_output_format.py.

Update

[20230213] Evaluation metrics are updated.

[20220523] Upload model source codes and generation results. Evaluation metrics will be updated later.

Please keep an eye on this repository if you are interested in our work. Feel free to contact us ({chtan, gujc}@mail.ustc.edu.cn) or open issues.

Cite

@inproceedings{DBLP:conf/acl/TanGTLXHGJ22,
  author    = {Chao{-}Hong Tan and
               Jia{-}Chen Gu and
               Chongyang Tao and
               Zhen{-}Hua Ling and
               Can Xu and
               Huang Hu and
               Xiubo Geng and
               Daxin Jiang},
  editor    = {Smaranda Muresan and
               Preslav Nakov and
               Aline Villavicencio},
  title     = {TegTok: Augmenting Text Generation via Task-specific and Open-world
               Knowledge},
  booktitle = {Findings of the Association for Computational Linguistics: {ACL} 2022,
               Dublin, Ireland, May 22-27, 2022},
  pages     = {1597--1609},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
  url       = {https://doi.org/10.18653/v1/2022.findings-acl.125},
  doi       = {10.18653/v1/2022.findings-acl.125},
}

About

Official code for Findings of ACL 2022 paper: "TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published