GitHub

PAD

PAD is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

PAD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

Goal

Dependency parsers are fast, accurate, and produce easy-to-interpret results, but phrase-structure parses are nice too and are required input for many NLP tasks.

The PAD parser produces phrases-after-dependencies. Give it the output of a dependency parser and it will produce the optimal constrained phrase-structure parse.

Installation

cd src
make

How to Use

> ./dep_parser sents.txt | ./pad -m pad.model | head
(TOP  (SINV  (CC But)   (S  (NP  (PRP you) ) )   (MD ca)   (NP  (RB n't) )   (VP  (VB dismiss)   (S  (NP  (NP  (NP  (NNP Mr.) 
   (NNP Stoltzman)   (POS 's) )   (NN music) )   (CC or)   (NP  (PRP$ his)   (NNS motives) ) ) )   (PP  (RB as)   (ADJP  (RB m 
erely)   (JJ commercial)   (CC and)   (JJ lightweight) ) ) )   (. .) ) )

or

./pad --model model --sentences test.predicted.conll

>./pad --help

PAD: Phrases After Dependencies
  USAGE: pad [options]

  Options:
  --help:              Print this message and exit.
  --model, -m:         (Required) Model file.
  --sentences, -g:     CoNLL sentence file.
  --oracle, -o:        Run in oracle mode.
  --pruning, -p:        .
  --dir_pruning:        .

How to Train

To train a new model, you'll need a grammar file and gold annotations. The file formats are described below.

> ./padt --grammar rules --model model --annotations parts --conll train.conll --epochs 5 --simple_features

PADt takes the following options.

> ./padt --help

PADt: Phrases After Dependencies trainer
USAGE: padt [options]

Options:
--help:             Print this message and exit.
--grammar, -g:      (Required) Grammar file.
--conll, -c:        (Required) CoNLL sentence file.
--model, -m:        (Required) Model file to output.
--annotations, -a   (Required) Gold phrase structure file.
--epochs[=10], -e:  Number of epochs.
--lambda[=0.0001]:  L1 Regularization constant.
--simple_features   Use simple set of features.

We also provide python scripts for extracting a grammar and annotations from phrase-structure trees using the Collins head rules.

Please refers to python/README.md

Cite

@InProceedings{kong-15,
  author    = {Lingpeng Kong and Alexander M. Rush and Noah A. Smith},
  title     = {Transforming Dependencies into Phrase Structures},
  booktitle = {Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  month     = jun,
  year      = {2015},
  address   = {Denver, Colorado, USA},
  publisher = {Association for Computational Linguistics},
  sbooktitle = {NAACL-HLT~2015}
}

File Formats

The grammar file has two types of lines. For unary rules:

RULE# 0 X Y 0

For binary rules:

RULE# 1 X Y Z HEAD

The annotation file is only required for training. Each line is of the form:

#RULES
i j k h m r

Where i, j, k are the span of the rule, h is the head index, m is the modifier index, and r in the index of the rule from the grammar file.

There is no line break between sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
python		python
src		src
COPYING		COPYING
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python

python

src

src

COPYING

COPYING

README.md

README.md

Repository files navigation

PAD

Goal

Installation

How to Use

How to Train

Cite

File Formats

About

Releases

Packages

Languages

License

ikekonglp/PAD

Folders and files

Latest commit

History

Repository files navigation

PAD

Goal

Installation

How to Use

How to Train

Cite

File Formats

About

Resources

License

Stars

Watchers

Forks

Languages