Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
C++ Perl Python Other
branch: master

more optimization and a bug fix

latest commit aba97d5123
Waleed Ammar authored
Failed to load latest commit information.
alignment more optimization and a bug fix
cdec-utils 1) more features. 2) better numerical stability for the matrix tree t…
core more optimization and a bug fix
doc a lot of cleaning
ducttape-files Merge branch 'online_em' of…
edmonds-alg-1.1.2 matrix tree theorem and tarjan's implementation of chu-liu-edmonds
parsing ducttape files & other_pos in latentCrfPosTagger
parts-of-speech testing english
wammar-utils @ 1ff70ce Merge branch 'master' of…
.gitmodules fix makefiles and cross-file references
Makefile-hmmAligner simulated annealing is no longer used. remove its files and references
Makefile-hmmPosTagger simulated annealing is no longer used. remove its files and references
Makefile-latentCrfAligner more efficient initialization and disk IO
Makefile-latentCrfParser 1) more features. 2) better numerical stability for the matrix tree t…
Makefile-latentCrfPosTagger fix configs
Makefile-model1 almost reimplemented model 1, and fixed a lot of stuff in the hmm ali…
Makefile-reorderingSemiring a lot of cleaning Rename README to
TODO bugfix
remote-build.bash printing debug information


This is work in progress. If you encounter any problems while compiling or using it, it is likely our mistake not yours. Please contact with questions, comments, and suggestions.


This is an implementation of the CRF autoencoder framework for four tasks:

  • bitext word alignment
  • part-of-speech tagging
  • code switching
  • dependency parsing

Our NIPS 2014 paper describes the CRF autoencoder framework as well as the bitext word alignment and part-of-speech induction tasks in detail. Details on code-switching can be found in our EMNLP shared task paper.


how to build

I'm assuming your default compiler is either gcc 4.6.3, clang 3.1-8 (or later "fingers crossed")

  • bitext word alignment: make -f Makefile-latentCrfAligner
  • part-of-speech tagging: make -f Makefile-latentCrfPosTagger
  • code switching: make -f Makefile-latentCrfPosTagger (this is not a typo)
  • dependency parsing: make -f Makefile-latentCrfParser (still in the works)

example invocations:

part of speech tagging:

  --output-prefix prefix # just a filename prefix for files generated during training
  --train-data sent-per-line-space-delimited-tokens.txt # example file below
  --min-relative-diff 0.001
  --optimizer adagrad --minibatch-size 8000
  --max-iter-count 50
  --cache-feats true                                                                                                       
  --wordpair-feats word-level-features```

for a list of all options: execute ``latentCrfAligner --help``

### snippet of the file ``sent-per-line-space-delimited-tokens.txt``

Ms. Haag plays Elianti . Rolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990 .

### snippet of the file ``word-level-features``

expects starts-with-e 1 starts-with-ex 1 ends-with-ts 1 ends-with-s 1 plays starts-with-p 1 starts-with-pl 1 ends-with-ys 1 ends-with-s 1

## using multiprocesses:
```mpirun 32 train-latentCrfAligner [options]```

Something went wrong with that request. Please try again.