Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Joshua Statistical Machine Translation Toolkit

Fetching latest commit…

Cannot retrieve the latest commit at this time


Welcome to Joshua

Joshua is a statistical machine translation toolkit for both
phrase-based (new in version 6.0) and syntax-based decoding. It can be
run with pre-built language packs available for download, and can also
be used to build models for new language pairs. Among the many features of
Joshua are:

- Support for both phrase-based and syntax-based decoding models
- Translation of weighted input lattices
- [Thrax]( a Hadoop-based, scalable
  grammar extractor
- A [sparse feature architecture](
  supporting an arbitrary number of features

The latest release of Joshua is 6.0, released in January of 2014.

New in 6.0

Joshua 6.0 includes the following new features:

- A fast phrase-based decoder with the ability to read [Moses]( 
  phrase tables
- Large speed improvements compared to the previous syntax-based decoder
- Special input handling
- A host of bugfixes and stability improvements

Working with "language packs"

Joshua includes a number of "language packs", which are pre-built models that
allow you to use the translation system as a black box, without worrying too
much about how machine translation works. You can browse the models available
for download on the [Joshua

Building new models

Joshua includes a pipeline script that allows you to build new models, provided
you have training data.  This pipeline can be run (more or less) by invoking a
single command, which handles data preparation, alignment, phrase-table or
grammar construction, and tuning of the model parameters. See [the
for a walkthrough and more information about the many available options.

Quick start

To run the decoder in any form requires setting a few basic environment
variables: `$JAVA_HOME`, `$JOSHUA`, and potentially `$MOSES`.

    export JAVA_HOME=/path/to/java  # maybe /usr/java/home
    export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8

Then, compile Joshua by typing:

    cd $JOSHUA

The basic method for invoking the decoder looks like this:


Some example usage scenarios and scripts can be found in the `examples/`
Something went wrong with that request. Please try again.