Skip to content

ElmiraGhorbani/Abstract-Text-Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PEGASUS library

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model. The paper can be found on arXiv. ICML 2020 accepted. check code source from here.

'screen'

Prerequisites

Python 3+
tensorflow==2.2.0
sentencepiece
numpy

Usage

To run the summery, download pre-trained model on cnn_dailymail from here or gigaword from here. Unzip it and put it to model/.

python scripts/summery.py --article example_article --model_dir model/ --model_name cnn_dailymail

Finetuning Dataset

Two types of dataset format are supported: TensorFlow Datasets (TFDS) or TFRecords. The pn-summary dataset can be used for this purpose. pn-summary comprises numerous articles of various categories that have been crawled from six news agency websites. Each document (article) includes the long original text as well as a human-generated summary.

To Do

  • Collab demo
  • fine-tune on persian dataset

About

Abstract Text Summarization With PEGASUS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages