CLTS-plus-Dataset

CLTS+: A Chinese Long Text Summarization Dataset with Abstractive Summaries

Introduction

We have proposed CLTS, the Chinese Long Text Summarization Dataset. However, CLTS is an extractive dataset: extractive summaries frequently borrow words and phrases from their source text, which leads to the fact that models trained on CLTS will extract whole sentences from articles to form summaries when predicting.

In order to solve this problem, we propose CLTS+ dataset. The ground-truth in CLTS+ is the reference summaries in CLTS after paraphrasing. Meanwhile, some inconsistencies will inevitably occur during the process of paraphrasing; for example, people and place names in summaries after paraphrasing can’t be aligned with those in CLTS reference summaries. Therefore, we correct errors of factual inconsistencies to reduce the noise in the dataset and improve the prediction accuracy of models.

This work has been accepted by ICANN2022, we will update the paper link as soon as they published it.

Samples

We select some samples from CLTS+ and you can see them in samples.txt

Download

CLTS+ is available from the link. And the pass word is 7yvn.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
samples.txt		samples.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLTS-plus-Dataset

Introduction

Samples

Download

About

Releases

Packages

lxj5957/CLTS-plus-Dataset

Folders and files

Latest commit

History

Repository files navigation

CLTS-plus-Dataset

Introduction

Samples

Download

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages