-
Weibo (zh):https://ai.tencent.com/ailab/nlp/dialogue/#datasets (Weibo Conversation Datasets)
-
Douban (zh): https://github.com/MarkWuNLP/MultiTurnResponseSelection
-
Douban-20k (zh): https://ai.tencent.com/ailab/nlp/dialogue/#datasets (Restoration-200K datasets)
-
Weibo Emotional Conversation Dataset (zh): http://coai.cs.tsinghua.edu.cn/hml/challenge2017/
-
Profile Consistency Dataset for Dialogue (zh): https://ai.tencent.com/ailab/nlp/en/dialogue/datasets/KvPI.zip (paper: https://arxiv.org/abs/2009.09680)
-
Grayscale Dataset for Dialogue (zh): https://ai.tencent.com/ailab/nlp/en/dialogue/datasets/grayscale_data_release.zip (https://arxiv.org/abs/2004.02421)
-
Gender-Specific Chat (zh): https://ai.tencent.com/ailab/nlp/en/dialogue/datasets/Stylistic_Dataset.zip (https://arxiv.org/abs/2004.02202)
-
Twitter (en):https://github.com/Marsan-Ma-zz/chat_corpus
-
DailyDialog (en): http://yanran.li/dailydialog.html
-
PersonaChat (en): https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json
-
OpenSubtitles (en): http://opus.nlpl.eu/OpenSubtitles.php
-
MultiWOZ (en): https://www.repository.cam.ac.uk/handle/1810/294507
-
Cornell (en): https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
-
Topical-Chat (en): https://github.com/alexa/alexa-prize-topical-chat-dataset
-
Switchboard (en): https://github.com/NathanDuran/Switchboard-Corpus
-
Dialogue NLI (en): https://wellecks.github.io/dialogue_nli/
-
Movie Dialog Reddit (en): https://research.fb.com/downloads/babi/
-
Ubuntu Dialogue (en) :https://github.com/rkadlec/ubuntu-ranking-dataset-creator
-
EmpatheticDialogues (en): https://github.com/facebookresearch/EmpatheticDialogues
-
Wizard of Wikipedia (en): https://parl.ai/projects/wizard_of_wikipedia/
-
Commonsense Conversation (en): http://coai.cs.tsinghua.edu.cn/file/commonsense_conversation_dataset.tar.gz
-
MuTual (en): https://github.com/Nealcly/MuTual
- ToTTo Dataset: https://github.com/google-research-datasets/ToTTo
-
poetry (zh): https://github.com/chinese-poetry/chinese-poetry
-
couplet (zh): https://github.com/wb14123/couplet-dataset
-
RAMDS (cuhk) : http://www.se.cuhk.edu.hk/~textmine/dataset/ra-mds/
-
LCSTS (zh): http://icrc.hitsz.edu.cn/Article/show/139.html
-
Gigaword (en) https://drive.google.com/file/d/0B6N7tANPyVeBNmlSX19Ld2xDU1E/view
-
CNN/Daily Mail (en): https://github.com/abisee/cnn-dailymail
-
scientific summarization (en): https://github.com/Santosh-Gupta/ScientificSummarizationDataSets
-
Newsroom (en): https://summari.es/download/
-
BigPatent (en): https://drive.google.com/uc?export=download&id=1mwH7eSh1kNci31xduR4Da_XcmTE8B8C3
-
XSum (en): http://kinloch.inf.ed.ac.uk/public/XSUM-EMNLP18-Summary-Data-Original.tar.gz
- ownthink (zh) : https://github.com/ownthink/KnowledgeGraphData