Skip to content

[ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

License

Notifications You must be signed in to change notification settings

swarnaHub/SummarizationPrograms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summarization Programs (ICLR 2023)

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Swarnadeep Saha, Shiyue Zhang, Peter Hase, and Mohit Bansal

image

image

Installation

This repository is tested on Python 3.8.12.
You should install SummarizationPrograms on a virtual environment. All dependencies can be installed as follows:

pip install -r requirements.txt

Dataset

We provide a small sample of the CNN/DM validation set in data folder. Each line contains the source document, the gold summary and unigram overlap percentages of each source sentence with respect to the summary. You can also pre-process your own Summarization dataset in the same format for running SP-Search.

For CNN/DM and XSum, we release the original samples and the searched programs (SP-Search) outputs here.

The documents folder contains the samples, pre-processed as discussed above. The SP-Search outputs are represented as follows.

Each line is a tab-separated entry consisting of the following:

  • Index (according to the original sample ID in documents folder)
  • Gold Summary (same as the summaries in documents folder)
  • SP_Search Summary (the searched summary that emulates the gold/human summary)
  • SP_Search program with intermediate generations (S1, S2, etc denote document sentences. I1, I2, etc denote intermediate generations after executing a neural module)
  • SP_Search program without intermediate generations (a more compact representation of the previous field. each tree is separated by square brackets)
  • ROUGE score between gold and SP_Search summary

RQ1: SP-Search

In order to identify Summarization Programs for human summaries, execute the following steps.

cd sp_search
python main.py

The pre-trained modules are available for download here. For paraphrase, we directly used the model available here. Download the other two modules and place them inside the modules directory.

Upon running the search, you will see outputs similar to what's there in the output folder. The sp_search.tsv file will save the Summarization Programs and the corresponding summaries. The folder sp_search will save the SPs in individual pdfs for visualization.

Compute ROUGE scores for the SP-Search summaries by running

cd scripts
python compute_spsearch_rouge.py

RQ2: SP Generation Models

Generating summaries via SP has three steps.

  • First, we fine-tune a BART model leveraging the SP-Search programs. Use the script scripts/train_sp_gen.sh. You need to process the training file in json format according to the HuggingFace transformers library.
  • Second, do inference from this model using sp_model/eval_sp.py to generate intermediate SPs.
  • Third, execute these generated programs via the pre-trained modules to obtain the final summaries. The script is sp_model/execute_sp.py.

Citation

@inproceedings{saha2023summarization,
  title={Summarization programs: Interpretable abstractive summarization with neural modular trees},
  author={Saha, Swarnadeep and Zhang, Shiyue and Hase, Peter and Bansal, Mohit},
  booktitle={ICLR},
  year={2023}
}

About

[ICLR 2023] PyTorch code of Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages