Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
Java Other
branch: master

This branch is even with jweese:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
example
lib
scripts
src/edu/jhu/thrax
test/edu/jhu/thrax
.gitignore
AwsCredentials.properties
LICENSE.txt
README
build.xml
testng.xml

README

Thrax uses Apache hadoop (an open-source implementation of MapReduce) to
efficiently extract a synchronous context-free grammar translation model
for use in modern machine translation systems.

Thrax currently has support for both Hiero-style grammars (with a single
non-terminal symbol) and SAMT-style grammars (where non-terminal symbols are
calculated by projecting onto the span from a target-side parse tree).

COMPILING:

First, you need to set two environment variables:
$HADOOP should point to the directory where Hadoop is installed.
$AWS_SDK should point to the directory where the Amazon Web Services SDK
is installed.

To compile, type

    ant

This will compile all classes and package them into a jar for use on a 
Hadoop cluster.

At the end of the compilation, ant should report that the build was successful.

RUNNING THRAX:
Thrax can be invoked with

    hadoop jar $THRAX/bin/thrax.jar <configuration file>

Some example configuration files have been included with this distribution:

    example/hiero.conf
    example/samt.conf

COPYRIGHT AND LICENSE:
Copyright (c) 2010-13 by the Thrax team:
    Jonny Weese <jonny@cs.jhu.edu>
    Juri Ganitkevitch <juri@cs.jhu.edu>

See LICENSE.txt (included with this distribution) for the complete terms.
Something went wrong with that request. Please try again.