Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
cdec
examples
tests
README.md
setup.py.in

README.md

pycdec is a Python interface to cdec

Installation

Build and install pycdec:

python setup.py install

Alternatively, run python setup.py build_ext --inplace and add the python/ directory to your PYTHONPATH.

To re-build pycdec from the cython source, modify setup.py in the following ways:

  • Add this input statement: from Cython.Build import cythonize
  • Change the source file from cdec/_cdec.cpp to cdec/_cdec.pyx
  • Add language='c++' as a property to ext_modules (e.g. right after extra_link_args)
  • In the final setup block, change ext_modules=ext_modules to ext_modules=cythonize(ext_modules)

Then just build and install normally, as described above.

Grammar extractor

Compile a parallel corpus and a word alignment into a suffix array representation:

python -m cdec.sa.compile -f f.txt -e e.txt -a a.txt -o output/ -c extract.ini

Or, if your parallel corpus is in a single-file format (with source and target sentences on a single line, separated by a triple pipe |||), use:

python -m cdec.sa.compile -b f-e.txt -a a.txt -o output/ -c extract.ini

Extract grammar rules from the compiled corpus:

cat input.txt | python -m cdec.sa.extract -c extract.ini -g grammars/ -z

This will create per-sentence grammar files in the grammars directory and output annotated input suitable for translation with cdec.

Extract rules in stream mode:

python -m cdec.sa.extract -c extract.ini -t -z	

This will enable stdio interaction with the following types of lines:

Extract grammar:

context ||| sentence ||| grammar_file

Learn (online mode, specify context name):

context ||| sentence ||| reference ||| alignment

Drop (online mode, specify context name):

context ||| drop

Library usage

A basic demo of pycdec's features is available in examples/test.py. Other examples are given in the paper describing pycdec.

More documentation will come as the API becomes stable.


pycdec was contributed by Victor Chahuneau