Skip to content
No description, website, or topics provided.
Python Other
  1. Python 99.9%
  2. Other 0.1%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Event-to-Sentence Ensemble

Code for the paper "Story Realization: Expanding Plot Events into Sentences" Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, and Mark O. Riedl


        title = "Story Realization: Expanding Plot Events into Sentences",
        author = "Ammanabrolu, Prithviraj and Tien, Ethan and Cheung, Wesley and Luo, Zhaochen and Ma, William and Martin, Lara J. and
          Riedl, Mark O.",
        archivePrefix = {arXiv},
        arxivId = {1909.03480}

Disclaimer: Code is not upkept


Dataset: The full generalized sci-fi dataset can be found here, as all-sci-fi-data.txt.

Data columns are split by '|||' and the columns on each line are:

  • List of original word events
  • List of generalized events
  • Original split and pruned sentence
  • Generalized split and pruned sentence

Each story ends with an <EOS> tag and is followed by a dictionary that contains all the named entities that were generalized by category for that particular story.

For convenience, the data is preprocessed into bitext for the purpose of training our models is also included as all-sci-fi-data-{train, val, test}_{input, output}.txt, with input/output representing the bitext aligned by line number.

Start RetEdit server:

cd E2S-Ensemble/RetEdit/
sudo bash

Takes a few minutes to set up, once you see flask output (says something like 'Running on') then proceed in a separate terminal.

Run Ensemble:

Edit or create a <config_file>.json (most recently used is config_drl_sents.json). Most likely the only thing that will need to be changed is the "test_src" entry under "data" to reflect a new input file. (Note: Gold standard dependencies have been removed)


cd E2S-Ensemble/
source activate e2s_ensemble
python --config <config_file>.json --sample --cuda --outf <outputfile>.txt

This will generate 3 output files:

<outputfile>.txt: Pure e2s output with ensemble thresholds

<outputfile>_verbose.txt: Each sentence is preceded by the model that generated the chosen sentence.

<outputfile>_more_verbose.tsv: Output of ALL 5 models, with their respective confidence scores (useful for quick threshold tuning)

After Ensemble

If no longer in use, ctrl+c out of the RetEdit flask instance, and run

sudo bash

to clean up the RetEdit docker instances in order to prevent excess memory consumption.

Slotfilling Code is in Slotfilling - edit to read in your data. Pass the event and the memory graph (responsible for keeping track of entities) through getAgentTurn.

Other potentially useful files:

python <outputfile>_more_verbose.tsv <reweighted_outf>.txt takes in a .tsv generated from and creates new output files based on the thresholds defined in (Generates <reweighted_outf>.txt and <reweighted_outf>_verbose.txt)

python <outputfile>_more_verbose.tsv <individual_outf>.txt takes in a .tsv generated from and creates 5 separate output files, one for each model and their respective outputs. (Generates <reweighted_outf>.txt and <reweighted_outf>_verbose.txt)

python <file>.txt calculates average number of words per sentence in the entire file.

python <outputfile>_verbose.txt takes in a <outputfile>_verbose.txt generated from and prints out the % utilization of each model in the output file.


Main code is under, which calls the seq2seq-based decoders (Vanilla, mcts, and FSM) that have been reimplemented under and the template decoder under, and also queries the Retreival-and-Edit flask instance, which must be previously instantiated.

It takes in an input event file, and runs 5 different threads, one for each model. There are tqdm progress bars for each model to see approxmiately how long each model will take (mcts will take the longest, followed by RetEdit).

Model files and data files have been (for the most part) omitted from this repo due to space issues, but to rebuild a working version of the ensemble from scratch, these are the steps that need to be followed:


Use e2s_ensemble.yml to create the correct conda environment by doing

conda env create -f e2s_ensemble.yml
source activate e2s_ensemble

Preparing the Retrival-and-Edit (RetEdit) model (based on this paper)


docker / nvidia-docker

How to train

Work in folder E2S-Ensemble/RetEdit/

  1. Prepare data ⋅⋅* create a dataset folder within /datasets ..* create tab-separated dataset: train.tsv, valid.tsv, and test.tsv ..* initialize word vectors within /word_vectors (using helper script ..* set up config file in /editor_code/configs/editor
  2. Run sudo bash
  3. Run export COPY_EDIT_DATA=$(pwd); export PYTHONIOENCODING=utf8; cd cond-editor-codalab; python {CONFIG-FILE.TXT}

How to use Either modify and run or (which calls to handle requests outside of the docker container)

Preparing the Sequence-to-Sequence based models (Vanilla, FSM, mcts)

Work in the folder E2S-Ensemble/mcts Modify or create a <config_file>.json to incorporate the intended training and testing sets for training. To run training, under the e2s_ensemble conda environment, run python --config config.json. To get output for the vanilla seq2seq model, run python --config config.json.

The FSM and mcts model needs to point to this seq2seq trained model, but does not require any further training.

Preparing the Templates model

This code adapted from the AWD-LSTM codebase. Work in the folder E2S-Ensemble/Templates

Although the dataset, is generalized, we generalize it a bit further in an attempt to yield better training by removing the numbers from the named entities. To do this, run python input.txt output.txt to create a file that has the numbers on the named entities removed.

Then, prepare a training/validation/test split, each named train.txt, valid.txt, test.txt respectively. To train the model, python --data data/scifi/ --model BiGRU --batch_size 20 --nlayers 1 --epochs 60 --save --lr 1

To incorpate for use in, update the parameters in the first few argument statements.

You can’t perform that action at this time.