Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

This repository contains software to do several things related to syntax - parsing, format conversion, and evaluation. For a full definition of the parsing algorithm, proofs of its properties, and results on the standard metrics, see:

If you want to understand the algorithm, my thesis provides a better explanation (the TACL paper focuses on proving properties):

If you use this code in your own work, please cite the TACL paper:

  author    = {Jonathan K. Kummerfeld and Dan Klein},
  title     = {Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation},
  journal   = {Transactions of the Association for Computational Linguistics},
  volume    = {5},
  year      = {2017},
  pages     = {},
  url       = {},
  software  = {},

What is this work about?

In the Penn Treebank, parses have (1) a core projective tree structure and (2) traces that represent control structures, wh-movement and more. However, most parsers and the standard evaluation metric (evalb) ignore the traces and all null elements, focusing entirely on the tree structure. These aspects of syntax are excluded not because of disagreements regarding theory, but rather because of the computational challenge of including them.

This work is about a new inference algorithm that efficiently finds the maximum scoring graph parse for a sentence when using a first-order model. Our approach involves (1) the algorithm, which builds upon the non-projective tree parsing algorithm of Pitler et al. (2013), and (2) a syntactic representation, similar to several prior approaches, particularly Carreras et al. (2008). For more details, see the paper and dissertation above.

This repository contains:

  • A parser that implements our algorithm
  • An evaluation for traces that processes standard PTB data, using the metric proposed by Johnson (2002)
  • An evaluation for parses that processes our representation
  • Tools to convert back-and-forth between our representation and the standard PTB representation
  • Tools to generate the inference rules for our algorithm from the templates we define in the paper

All of the scripts are in python and can be run without downloading or installing any additional resources (note, all should work with python 2.7, only some work with python 3). The parser is in scala, but can be downloaded as a jar that runs with java. For pruning we use a tagger written in C++ that depends on several libraries (we provide a makefile and information about what else needs to be installed).

For detailed information on each component, see the file in each directory.

Reproducing the results in the paper

You will need:

  • test.tok The test sentences, one tokenised sentence per line (not provided here)
  • The test parses, in Penn Treebank format (not provided here)
  • wsj23.pos.stanford Automatically generated POS tags for the test data, space separated, one sentence per line (e.g. we used output of the Stanford POS tagger with default settings. The file we used is here)
  • The parser and models: download here (494 Mb) as a single jar
  • Various scripts from this repository
  • Either (1) The output of the spine tagger (available here), or (2) the models for the spine tagger (available here), in which case the spine tagger needs to be built (see parser/nn-tagger/makefile).

Assuming you download the contents of this repo and have all of those file and put them in the same directory as this file, you will need to:

  1. Add IDs to the sentences
  2. [Only if running the tagger] Simplify the text
  3. [Only if running the tagger] Run the spine tagger
  4. Run the parser
  5. Fix parses where we failed to produce output
  6. Convert to standard PTB
  7. Evaluate

Each of these steps is one command below:

python3 parser/nn-tagger/ 200001 < test.tok > test-with-ids.tok
python3 parser/nn-tagger/ < test-with-ids.tok > test-with-ids.tok.simple
parser/nn-tagger/spine-tagger -word-dict dict.words -tag-dict dict.tags -model model.params -test test.tok -prefix wsj23.tagged
./ test.parser test-with-ids.tok wsj23.pos.stanford
python3 evaluation/ < test.parser.auto_all.localStageFinal > test.parser.shp
python format-conversion/ -i h -o o -e he < test.parser.shp > test.parser.ptb
python evaluation/ test.parser.ptb | tail -n 1
python evaluation/ test.parser.ptb --null_only | tail -n 1

The final two commands give performance on (1) traces and other null items, (2) nulls only:

all 2619 3893 3524 74.3189557321 67.2745954277 70.6215450991
all 3153 3864 3522 89.5229982964 81.599378882 85.3777416734

These numbers are (count of matching, gold total, test total, precision, recall, f-score).


If you find bugs or have questions about any of this software please either contact me ( or create an issue. Thank you!

Misc Notes

The dynamic program definitions in the TACL paper and my thesis are slightly different:

  • In response to reviewer feedback I changed the notation for parents.
  • The problematic structures in my thesis got a name (locked chain) and a formal definition.
  • The thesis discusses how to handle chains that are not locked, but I didn't include the necessary details in the dynamic program. The TACL version adds them, using the hat notation.


A range of tools related to one-endpoint crossing graphs - parsing, format conversion, and evaluation





No releases published
You can’t perform that action at this time.