This repository contains code for the ACL 2017 paper "Selective Encoding for Abstractive Sentence Summarization"
About this code
The experiments in the paper were done with an in-house deep learning tool. Therefore, we re-implement this as a reference.
PyTorch version: This code requires PyTorch v0.3.x.
Python version: This code requires Python3.
How to run
Prepare the dataset and code
Make a folder for the code and data:
SEASS_HOME=~/workspace/seass mkdir -p $SEASS_HOME/code cd $SEASS_HOME/code git clone --recursive https://github.com/magic282/SEASS.git
Put the data in the folder
$SEASS_HOME/code/data/giga and organize them as:
seass ├── code │ └── SEASS │ └── seq2seq_pt └── data └── giga ├── dev ├── models └── train
Since the validation set is large, you can sample a small set from it.
Collect vocabulary using
Then put the vocab files in the
run.sh according to your setting and files.
Setup the environment
nltk scipy numpy pytorch
Warning: Older versions of NLTK have a bug in the PorterStemmer. Therefore, a fresh installation or update of NLTK is recommended.
A Docker image is also provided.
docker pull magic282/pytorch:0.3.1
run.sh is an example. Modify it according to your configuration.
bash $SEASS_HOME/code/SEASS/seq2seq_pt/run.sh $SEASS_HOME/data/giga $SEASS_HOME/code/SEASS/seq2seq_pt
nvidia-docker run --rm -ti -v $SEASS_HOME:/workspace magic282/pytorch:0.3.1
Then inside the docker:
bash code/SEASS/seq2seq_pt/run.sh /workspace/data/giga /workspace/code/SEASS/seq2seq_pt