Skip to content

helmertz/querysum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Querysum

Code for the model presented in the thesis Query-Based Abstractive Summarization Using Neural Networks by Johan Hasselqvist and Niklas Helmertz.

Requirements

  • Python 2.7 or 3.5
  • Tensorflow 1.1.0

Data

Instructions for acquiring the dataset released along with this model can be found at a separate repo.

Pre-trained embeddings can be downloaded at https://nlp.stanford.edu/projects/glove. For the thesis work, 100-dimensional embeddings trained on "Wikipedia 2014 + Gigaword 5" have been used.

Training

Replacing the parts in angle brackets, the model can be trained by running:

python querysum.py \
    <path to embeddings file> \
    <path to directory containing summary_vocabulary.txt and document_vocabulary.txt> \
    --mode train \
    --logdir <path to to where model data is saved> \
    --training_dir <path to training set root directory> \
    --validation_dir <path to validation set root directory> \
    --batch_size <the batch size, 30 by default>

Progress can be monitored using tensorboard by running:

tensorboard --logdir <path to logdir>

Generating summaries

From a trained model, summaries can be generated by running:

python querysum.py \
    <path to embeddings file> \
    <path to directory containing summary_vocabulary.txt and document_vocabulary.txt> \
    --mode decode \
    --logdir <path to logdir from a training session> \
    --decode_dir <path to dataset directory, containing documents and queries, to generate summaries for> \
    --decode_out_dir <path to directory where generated summaries are saved>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages