Skip to content

Latest commit

 

History

History
 
 

EMNLP2019-AKGCM

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

AKGCM

Knowledge Aware Conversation Generation with Explainable Reasoning over Augmented Graphs 【EMNLP2019 version

Abstract

Two types of knowledge, triples from knowledge graphs and texts from documents, have been studied for knowledge aware open-domain conversation generation, in which graph paths can narrow down vertex candidates for knowledge selection decision, and texts can provide rich information for response generation. Fusion of a knowledge graph and texts might yield mutually reinforcing advantages, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, an augmented knowledge graph with both triples and texts, knowledge selector, and knowledge aware response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning to effectively capture conversation flow, which is more explainable and flexible in comparison with previous work. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate the effectiveness of our system on two datasets in comparison with state-of-the-art models.

Datasets

EMNLP dialog dataset

This dataset contains movie chats from two participants, wherein each response is explicitly generated by copying or modifying sentences from background knowledge such as IMDB’s facts/plots, or comments about movies. We follow their data split for training, validation and test. Please refer to the paper (Moghe et al., 2018) for more details.

ICLR dialog dataset

This wizard-of-wiki dataset contains multi-turn conversations from two participants. One participant selects a beginning topic, and during the conversation the topic is allowed to naturally change. The two participants are not symmetric: one will play the role of a knowledgeable expert while the other is a curious learner. We filter their training data and test data by removing instances without the use of knowledge and finally keep 30% instances for our study since we focus on knowledge selection and knowledge aware generation. Please refer to the paper (Dinan et al., 2019) for more details.

The download address is: https://baidu-nlp.bj.bcebos.com/emnlp2019_akgcm_datasets.tar.gz

Experiments

Implementation of baselines

GTTP: It is an end-to-end text summarization model (See et al., 2017) studied on the EMNLP data.

CCM: It is a state-of-the-art knowledge graph based conversation model (Zhou et al., 2018).

BiDAF+G: It is a Bi-directional Attention Flow based QA Model (BiDAF) (Seo et al., 2017) that performs best on the EMNLP dataset. Moreover, we use a response generator (as same as AKGCM) for NLG with the predicted knowledge span.

TMemNet: It is a two-stage transformerMemNet based conversation system that performs best on the ICLR dataset (Dinan et al., 2019).

Implementation of AKGCM

We implement our knowledge selection model based on the code of MINERVA released by (Das et al., 2018). We use BiDAF as the MRC module, and we train the MRC module on the same training set for our knowledge selection model. We implement the knowledge aware generation model based on the code of GTTP released by (Moghe et al., 2018). We also implement a variant AKGCM-5, in which top five knowledge texts are used for generation, and other setting are not changed.

Results

Metrics for Automatic evaluations: following the work of (Moghe et al., 2018), we adopt BLEU-4, ROUGE-2 and ROUGE-L to evaluate how similar the output response is to the reference text. We use Hit@1 (the top 1 accuracy) to evaluate the performance of knowledge selection.

Metrics for Human evaluations: we resort to a web crowdsourcing service for human evaluations. We randomly sample 200 messages from test set and run each model to generate responses, and then we conduct pair-wise comparison between the response by AKGCM and the one by a baseline for the same message. In total, we have 1400 pairs on each dataset since there are seven baselines. For each pair, we ask five evaluators to give a preference between the two responses, in terms of the following two metrics: (1) appropriateness (Appr.), e.g., whether the response is appropriate in relevance, and logic, (2) informativeness (Infor.), whether the response provides new information and knowledge in addition to the input message, instead of generic responses such as “This movie is amazing”. Tie is allowed. Notice that system identifiers are masked during evaluation.

EMNLP dialog dataset: Automatic evaluations

Model BLEU-4 ROUGE-2 ROUGE-L Hit@1
Seq2seq 1.59 5.73 14.49 NA
HRED 2.08 8.83 18.13 NA
MemNet 5.86 10.64 18.48 NA
GTTP 11.05 17.70 25.13 NA
CCM 2.40 4.84 17.70 NA
BiDAF+G 32.45 31.28 36.95 40.80
TMemNet 8.92 13.15 19.97 38.10
AKGCM-5 13.29 13.12 21.22 42.04
AKGCM 30.84 29.29 34.72 42.04

EMNLP dialog dataset: Human evaluations

Metrics Appr.( * vs. AKGCM) Infor. ( * vs. AKGCM)
Model Win/Tie/Lose Win/Tie/Lose
Seq2seq 0.04/0.42/0.54 0.05/0.21/0.74
HRED 0.03/0.50/0.47 0.03/0.27/0.70
MemNet 0.03/0.43/0.54 0.03/0.23/0.74
GTTP 0.03/0.52/0.45 0.10/0.42/0.48
CCM 0.01/0.18/0.81 0.01/0.15/0.84
BiDAF+G 0.04/0.83/0.13 0.07/0.79/0.14
TMemNet 0.04/0.50/0.46 0.05/0.36/0.59

ICLR dialog dataset: Automatic valuations

Model BLEU-4 ROUGE-2 ROUGE-L Hit@1
Seq2seq 0.17 1.01 7.02 NA
HRED 0.23 1.08 7.32 NA
MemNet 0.89 2.33 11.84 NA
GTTP 6.74 7.18 17.11 NA
CCM 0.86 1.68 12.74 NA
BiDAF+G 6.48 6.54 15.56 17.40
TMemNet 1.09 1.86 8.51 16.80
AKGCM-5 6.94 7.38 17.02 18.24
AKGCM 5.52 6.10 15.46 18.24

ICLR dialog dataset: Human evaluations

Metrics Appr.( * vs. AKGCM-5) Infor. ( * vs. AKGCM-5)
Model Win/Tie/Lose Win/Tie/Lose
Seq2seq 0.00/0.10/0.90 0.00/0.11/0.89
HRED 0.01/0.14/0.85 0.01/0.14/0.85
MemNet 0.00/0.19/0.81 0.00/0.17/0.83
GTTP 0.07/0.73/0.20 0.12/0.68/0.20
CCM 0.00/0.17/0.83 0.00/0.16/0.84
BiDAF+G 0.04/0.61/0.35 0.04/0.56/0.40
TMemNet 0.01/0.25/0.74 0.01/0.21/0.78

Citation

If you find AKGCM useful in your work, please cite the following paper:

@inproceedings{liu-etal-2019-knowledge,
    title = "Knowledge Aware Conversation Generation with Explainable Reasoning over Augmented Graphs",
    author = "Liu, Zhibin  and Niu, Zheng-Yu  and Wu, Hua  and Wang, Haifeng",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1187",
    pages = "1782--1792",
}

References

[1] (See et al., 2017) Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: summarization with pointergenerator networks. In Proceedings of ACL, pages 1073—-1083.

[2] (Seo et al., 2017) Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional attention flow for machine comprehension. In Proceedings of ICLR.

[3] (Zhou et al., 2018) Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of IJCAIECAI, pages 4623–4629.

[4] (Dinan et al., 2019) Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of wikipedia: knowledge-powered conversational agents. In Proceedings of ICLR.

[5] (Das et al., 2018) Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018. Go for a walk and arrive at the answer: reasoning over paths in knowledge bases using reinforcement learning. In Proceedings of ICLR, pages 1–18.

[6] (Moghe et al., 2018) Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M. Khapra. 2018. Towards exploiting background knowledge for building conversation systems. In Proceedings of EMNLP, pages 2322—2332.

[7] WeChat article about the paper (in Chinese): https://mp.weixin.qq.com/s/THt88QskJUFLWtH6USftxw