ltagextract

extract lexicalized tree adjoin grammar from treebank

Introduction

This project intends to extract Tree Adjoining Grammars with semantics aligned from KBGen corpus.

Software depends

Leiningen 2.0.0: https://github.com/technomancy/leiningen. Note: We also provide executable jar file. So if you don't want to compile or use the REPL, it's not required. * Included in utilities/ (lein for UNIXes, lein.bat for Windows)
Python 2.7: http://www.python.org/download/releases/2.7/
Java 7.10: http://www.java.com/en/download/index.jsp
NLTK 2.0.4 is a leading platform for building Python programs to work with human language data. http://nltk.org/ * Included in utilities/, referenced automatically by run.sh * Earlier versions might not work!
Stanford parser 2.0.4 http://nlp.stanford.edu/software/lex-parser.shtml * Included in utilities/, used by the parse.sh script.

Howto

To reproduce our current result, you can either simply run bin/run.sh or follow the pipeline described below:

Deal with the conjunction occurred in the syntactic tree.
Parse sentences using Stanford parser. We use the unlexicalized parser with head information output.
Normalize the syntactic tree gotten from step 2.
Extract TAG from the output of step 3
Assign semantics to the output of step 4

Step 1

To do the coordination aggregation, run

java -jar bin/aggregation-0.1.1-SNAPSHOT-standalone.jar \
  input/triples/ output/aggregated/
`

Step 2

To parse the corpus using the Stanford parser, run

bin/parse.sh input/sentences/ output/parsed/

Step 3

To normalize the syntactic tree, run

java -jar bin/grook-0.1.0-SNAPSHOT-standalone.jar \
  output/parsed/ output/fixed/

Steps 4&5:

To extract the TAG with semantics aligned, run

PYTHONPATH="utilities/nltk-2.0.4/:$PYTHONPATH" python2 bin/extract/extractor.py \
  output/fixed/ input/alignments/ output/final.gram \
  --verbose output/grammar-verbose/

For more details, try running

python2 extractor.py -h
usage: extractor.py [-h] [--verbose VERBOSE] corpus alignment [outfile]

positional arguments:
  corpus             corpus path which should be a directroy
  alignment          alignment path which should be a directory
  outfile            outputfile for extracted grammar

optional arguments:
  -h, --help         show this help message and exit
  --verbose VERBOSE  output raw gammar extracted for each sentence. This
                     parameter should be a directory

to check the help.

Other

We also provide a small tool to help you visualize TAG extracted from step 4 or step 5, run

python2 grammarviewer.py -h
usage: grammarviewer.py [-h] [filename]

Draw the tree according to grammar file

positional arguments:
  filename    The name of grammar file, stdin will be used if left open

optional arguments:
  -h, --help  show this help message and exit

As a side product, our package provides a s-expression parser for python. You may want to use it to reconstruct ParentedTree(NLTK) from the plain text representation of TAG.

Description about the files

./bin contains all runnable programs and scripts
./src contains all the src code
./output contains the intermediate results generated by the programs.
./input contains the original corpus, annotated data
- ./input/alignment contains our annotation result
- ./input/heads-fixed
- ./input/aggregation
./report contains our report

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
bin		bin
input		input
notes		notes
output		output
report		report
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ltagextract

Introduction

Software depends

Howto

Step 1

Step 2

Step 3

Steps 4&5:

Other

Description about the files

About

Releases

Packages

Contributors 2

Languages

qiuwei/ltagextract

Folders and files

Latest commit

History

Repository files navigation

ltagextract

Introduction

Software depends

Howto

Step 1

Step 2

Step 3

Steps 4&5:

Other

Description about the files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages