Skip to content

mcao516/Factual-Error-Correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Factual Error Correction for Abstractive Summarization Models

This directory contains code necessary to replicate the training and evaluation for the EMNLP 2020 paper "Factual Error Correction for Abstractive Summarization Models" by Meng Cao, Yue Dong, Jiapeng Wu and Jackie Chi Kit Cheung.

Directory Structure

Our code is organized into four subdirectories:

  • build_dataset: code for building the aritificial trianing & test dataset.
  • cnn-dailymail: directory for the cnn-dailymail summarization dataset.
  • K2019: directory for the manually annotated dataset by Kryscinski et al. (2019).
  • model: wrapper for the fariseq BART model for training.

(1) Build Dataset

To build the training dataset, first download the processed cnn-dailymail dataset from this link. Unzip and save the downloaded files in cnn-dailymail.

Then, run the data creation bash to build the training data:

cd build_dataset
sh create_data.sh

(2) Model Training

We use BART as our base model. To download and use BART model, follow the instructions here.

(3) K2019

The annotated cnn-dailymail test set from Kryscinski et al. 2019 ACL paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published