No description, website, or topics provided.
C++ Perl Python Makefile Other
Switch branches/tags
Nothing to show
Latest commit 5901c34 Dec 9, 2016 @wammar wammar committed on GitHub Update README.md
Permalink
Failed to load latest commit information.
alignment more optimization and a bug fix Apr 23, 2015
cdec-utils 1) more features. 2) better numerical stability for the matrix tree t… May 2, 2014
core more optimization and a bug fix Apr 23, 2015
doc a lot of cleaning May 1, 2013
ducttape-files Merge branch 'online_em' of github.com:ldmt-muri/alignment-with-openf… Apr 19, 2015
edmonds-alg-1.1.2 matrix tree theorem and tarjan's implementation of chu-liu-edmonds Mar 24, 2014
parsing ducttape files & other_pos in latentCrfPosTagger Jul 16, 2014
parts-of-speech testing english Nov 16, 2014
wammar-utils @ 1ff70ce Merge branch 'master' of https://github.com/ldmt-muri/alignment-with-… Nov 23, 2014
.gitmodules fix makefiles and cross-file references Mar 12, 2014
Makefile-hmmAligner simulated annealing is no longer used. remove its files and references Mar 12, 2014
Makefile-hmmPosTagger simulated annealing is no longer used. remove its files and references Mar 12, 2014
Makefile-latentCrfAligner more efficient initialization and disk IO Apr 20, 2015
Makefile-latentCrfParser 1) more features. 2) better numerical stability for the matrix tree t… May 2, 2014
Makefile-latentCrfPosTagger fix configs Nov 18, 2014
Makefile-model1 almost reimplemented model 1, and fixed a lot of stuff in the hmm ali… Feb 12, 2013
Makefile-reorderingSemiring a lot of cleaning May 1, 2013
README.md Update README.md Dec 9, 2016
TODO bugfix Jun 25, 2013
remote-build.bash printing debug information Jul 8, 2014

README.md

#disclaimer: This is work in progress. If you encounter any problems while compiling or using it, it is likely our mistake not yours. Please contact wammar@cs.cmu.edu with questions, comments, and suggestions.

#description: This is an implementation of the CRF autoencoder framework for four tasks:

  • bitext word alignment
  • part-of-speech tagging
  • code switching
  • dependency parsing

Our NIPS 2014 paper describes the CRF autoencoder framework as well as the bitext word alignment and part-of-speech induction tasks in detail. Details on code-switching can be found in our EMNLP shared task paper.

#dependencies:

#how to build I'm assuming your default compiler is either gcc 4.6.3, clang 3.1-8 (or later "fingers crossed")

  • bitext word alignment: make -f Makefile-latentCrfAligner
  • part-of-speech tagging: make -f Makefile-latentCrfPosTagger
  • code switching: make -f Makefile-latentCrfPosTagger (this is not a typo)
  • dependency parsing: make -f Makefile-latentCrfParser (still in the works)

example invocations:

part of speech tagging:

  --output-prefix prefix # just a filename prefix for files generated during training
  --train-data sent-per-line-space-delimited-tokens.txt # example file below
  --feat LABEL_BIGRAM --feat PRECOMPUTED --feat EMISSION 
  --feat BOUNDARY_LABELS --feat PRECOMPUTED_XIM2 --feat PRECOMPUTED_XIM1 
  --feat PRECOMPUTED_XI --feat PRECOMPUTED_XIP1 --feat PRECOMPUTED_XIP2 
  --feat OTHER_ALIGNERS
  --min-relative-diff 0.001
  --optimizer adagrad --minibatch-size 8000
  --max-iter-count 50
  --cache-feats true                                                                                                       
  --wordpair-feats word-level-features```

for a list of all options: execute ``latentCrfAligner --help``

### snippet of the file ``sent-per-line-space-delimited-tokens.txt``

Ms. Haag plays Elianti . Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990 .


### snippet of the file ``word-level-features``

expects starts-with-e 1 starts-with-ex 1 ends-with-ts 1 ends-with-s 1 plays starts-with-p 1 starts-with-pl 1 ends-with-ys 1 ends-with-s 1


## using multiprocesses:
```mpirun 32 train-latentCrfAligner [options]```