Factual Error Correction for Abstractive Summarization Models

This directory contains code necessary to replicate the training and evaluation for the EMNLP 2020 paper "Factual Error Correction for Abstractive Summarization Models" by Meng Cao, Yue Dong, Jiapeng Wu and Jackie Chi Kit Cheung.

Directory Structure

Our code is organized into four subdirectories:

build_dataset: code for building the aritificial trianing & test dataset.
cnn-dailymail: directory for the cnn-dailymail summarization dataset.
K2019: directory for the manually annotated dataset by Kryscinski et al. (2019).
model: wrapper for the fariseq BART model for training.

(1) Build Dataset

To build the training dataset, first download the processed cnn-dailymail dataset from this link. Unzip and save the downloaded files in cnn-dailymail.

Then, run the data creation bash to build the training data:

cd build_dataset
sh create_data.sh

(2) Model Training

We use BART as our base model. To download and use BART model, follow the instructions here.

(3) K2019

The annotated cnn-dailymail test set from Kryscinski et al. 2019 ACL paper.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
K2019		K2019
build_dataset		build_dataset
model		model
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K2019

K2019

build_dataset

build_dataset

model

model

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Factual Error Correction for Abstractive Summarization Models

Directory Structure

(1) Build Dataset

(2) Model Training

(3) K2019

About

Releases

Packages

Contributors 2

Languages

mcao516/Factual-Error-Correction

Folders and files

Latest commit

History

Repository files navigation

Factual Error Correction for Abstractive Summarization Models

Directory Structure

(1) Build Dataset

(2) Model Training

(3) K2019

About

Resources

Stars

Watchers

Forks

Languages