Advanced Configurations

The advanced configuration of an MMT engine can be manually set through the XML file engine.xconf located in <your_mmt_home>/engines/<your_engine_name>/engine.xconf.

You will find below information on how to properly configure an engine through the engine.xconf file. Or you can just skip to some interesting configuration examples.

Configuration

The engine.xconf file is automatically generated during the engine creation and training (launched with the ./mmt create command). By default it looks like this and it already provides a valid configuration.

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="en" target-language="it" />
</node>

It is possible to configure properties by adding XML elements and attributes under the element “node”. Here is an example of a configured file:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="en" target-language="it" name="my_engine">
        <decoder threads="5" />
    </engine>
    <network>
         <api port="8000"/>
         <join>
              <member host="16.51.99.2" port="5015" />
         </join> 
    </network>
    <datastream embedded="false" host="16.51.99.2"/>
    <db embedded="false" host="16.51.99.2" />
</node>

Customizable properties include:

Engine Configuration
Network Configuration
Datastream Configuration
Database Configuration

Some of these settings can also be passed as command line arguments; in case of conflicts the command line arguments are considered to have higher priority. In case a property is defined neither in the configuration file nor by command line, its default value will be employed.

Please note that, if you launch again the ./mmt create command for a certain engine, the existing configuration of that engine will be overwritten with the basic default one.

Engine Configuration

The general features of the engine are described in the auto-generated "engine" XML element, child to the "node" XML element.

Here is an example of a fully configured "engine" XML element:

  <node>
    ...
    <engine source-language="en" target-language="it">
        <decoder enabled="true" gpus="1,3,5" />
        <aligner enabled="false" />
    </engine>
    ...
  </node>

Description	Valid Values
source-language	The original language to translate from. NOTE: this field must not be used if the child is present
target-language	The language to translate to

Languages

ModernMT supports multilingual engines, meaning that a single engine can handle multiple (unidirectional) language pairs.

If you want to enable multiple language pairs, you need to:

erase the default source-language and target-language attributes in your <engine> node;
add a new <languages> child to your your <engine> node;
add to <languages> as many <pair> nodes as the language pairs you want to enable, and set their source and target attributes to the corresponding language tags.

Pair

Each <pair> child under the <languages> node of <engine> represents a language pair that must be enabled for this engine. Valid pair attributes include: Valid decoder attributes include:

Attribute name	Description
source	The language tag of the source language of this pair
target	The language tag of the target language of this pair

Decoder

Adding a "decoder" element allows to set the features of the translation decoder to use. Valid decoder attributes include:

Attribute name	Description	Valid Values	Default value
enabled	It defines whether the engine should use a decoder or not	true or false	true
threads	The decoder will run on CPU using this amount of threads. NOTE: In order to specify that it must run on CPUs it is also mandatory to set the `gpus` attribute to "none".	--	(run on GPUs)
gpus	Comma-separated list of the ids of the GPUS that the neural decoder will use. Example: 1,3,5	A single GPU id or a comma-separated list of GPU ids; 'none' if no GPUs must be used	All the available GPUs

Aligner

Adding an "aligner" element allows to set the features of the aligner component to use. Valid aligner attributes include:

Attribute name	Description	Valid Values	Default value
enabled	It defines whether the engine should use an aligner for Tag Projection API.	true or false	false

Network Configuration

To define the network behaviour of the engine, add a "network" XML element under "node".

Here is an example of a fully configured "network" element.

<node>
    ...
    <network  host="10.5.10.237" port="5000" interface="eth0">
         <api port="8888" root="test" />
         <join>
              <member host="31.41.59.1" port="5015"/>
              <member host="31.41.59.2" port="5016"/>
              <member host="31.41.59.3" port="5017"/>
         </join> 
    </network>
    ...
</node>

The attributes in the "network" node can be used to specify the general network settings:

Attribute name	Description	Valid Values	Default value
host	The IP address that this machine must be reachable at by the other cluster nodes	--	The Ipv4 address of this machine
port	The cluster communications logic port	--	5016
interface	The network interface where this machine will listen to cluster communication messages	--	null

More specific network settings, such as the REST APIs and the cluster joining configurations, require the definition of specific XML elements under "network":

REST APIs server

The configuration of the REST Server used to expose APIs can be set in a new "api" XML element under "network". Valid attributes for element "api" include:

Attribute name	Description	Valid Values	Default value
enabled	It defines whether the engine should expose REST APIs or not	true: launch the REST server and expose APIs false: do not expose any REST APIs	true
port	the REST APIs port	--	8045
root	the path in the host where REST APIs must be exposed.	--	None

Join

Adding a "join" XML element under "network" allows the configuration of an MMT cluster. In "join" it is possible to specify a series of "member" children elements. Each member is a potential entry point to the cluster: this engine will contact them in order until one of them answers back. Each "member" element requires two attributes:

Attribute name	Description	Valid Values	Default Value
host	the current member IP address or hostname	--	--
port	the cluster communication port	--	--

Data Stream Configuration

To define the way the engine should connect to a data stream, add a "datastream" XML element, child to the "node" XML element.

Here is an example of a fully configured "datastream" element:

<node>
   ...
   <datastream enabled="true" embedded="false" host="31.41.59.1" port="9999"/>
   ...
</node>

Valid "datastream" attributes are:

Attribute name	Description	Valid Values	Default value
enabled	it defines whether this engine should use a data stream	true: the engine launches or connects to a data stream false: the engine does not use any data streams	true
embedded	it defines whether the data stream belongs to an MMT engine or is a separate process	true: the data stream server is embedded in an engine. If host is localhost, this engine will launch the data stream itself false: this engine will connect to a running, separate data stream server	true
host	the the data stream host IP address or hostname	--	localhost
port	the data stream port	--	9092
name	The name of the data stream this engine should interact with.	--	if embedded is true, the default name is an empty string `""`; if embedded is false the name is mandatory;

Database Configuration

To define the way the engine should connect to the Database, add a "db" XML element, child to the "node" XML element.

Here is an example of a fully configured "db" element:

<node>
   ...
   <db enabled="true" embedded="false" host="31.41.59.1" port="9444" type="mysql" name="mmtDB"/>
   ...
</node>

Valid "db" attributes are:

Attribute name	Description	Valid Values	Default Value
enabled	It defines whether the engine should connect with a DB or not	true: this engine must try to use a DB false: this engine must not try to use a DB	true
host	the database host IP address or hostname	--	localhost
port	the database port	--	9042
name	the name of the database this engine should interact with	--	if embedded is true, the default name is `"default"`; if embedded is false the database name is mandatory;
embedded	it defines whether the database belongs to an MMT engine or is a separate process	true: the database is embedded in an of the cluster. If `host` is localhost this engine will launch the database itself. Only Cassandra DBs can be embedded. false: this engine will connect to a running, separate database server	true
type	the type of the DB to interact with	mysql:try to connect to a MySQL DB. MySQL interaction is only allowed for external DBs. If `type` is "mysql", `embedded` must be false cassandra: try to connect to a Cassandra DB	cassandra

Interesting Examples

Here are some examples of how engine.xconf files can be used to configure nodes for various scenarios.

Example 1: Single Node

This is a sample configuration for an MMT engine named 'default' working alone and exposing its REST APIs on port 8045. During the execution of ./mmt start, the engine itself launches the database process with port 8042 and the data stream process with port 8092; during the execution of ./mmt stop, these processes are stopped as well.

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
</node>

As an alternative, you may want your MMT engine to use already launched database and data stream instances running on your machine.
You can set the database and datastream as not embedded and you can also specify their ports, that may be different from the default ones:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <datastream embedded="false" port="<your_data_stream_port>" />
    <db embedded="false" port="<your_db_port>" />
</node>

That's it! Of course, if a service is running as not embedded, it will not be stopped by the ./mmt stop command.

Example 2: Leader-Followers Cluster

In an MMT cluster with a Leader-Followers style:

the Leader is a node that hosts, in addition to an engine, both the database process and the data stream process.
the Followers join the cluster using any of its members as an entry point, and connect directly to the Leader's database and data stream.

Using an MMT cluster lets nodes propagate translation knowledge and jobs, leading to better scalability and fault-tolerance. Separate engine instances should run on different machines.

The Leader may be configured as shown in Example 1, and it should be started as first in order to make sure the Database and data stream processes are running when the Followers try to connect.

The Followers, on the contrary, require a slightly different configuration. For sake of simplicity let's say that all nodes use default ports and names, and that Followers will use as an entry point the Leader (this it's not mandatory: they may use any node that already is a cluster member):

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <network>
         <join>
              <member host="31.41.59.3" port="5016" />
         </join> 
    </network>
    <datastream host="31.41.59.3" />
    <db host="31.41.59.3" />
</node>

Note that the database and datastream, that are embedded in the Leader node, are considered embedded by the Followers too (and since the default "embedded" value is "true" it is not necessary to add that attribute).

As an alternative to the previous configuration, Followers can keep the default engine.xconf configuration and be started with the -join--leader options set to the Leader host: ./mmt start -join--leader 31.41.59.3

Example 3: Peer-to-Peer Cluster

In an MMT cluster with Peer-to-Peer style, the database and the data stream processes run in a cluster member, but separate machines. Therefore all nodes have the same role, and the Leader's single-point-of-failure is avoided. Moreover, the specified database and data stream hosts may hide replication and load balancing techniques, ensuring fault-tolerance of the system.

As before, for sake of simplicity let's use default ports and names; moreover, let's consider node 31.41.59.1 as the first node to start and the one that everyone tries to join to.

Note that when the first node is started there are no cluster members to join. As a consequence, the first node does not need any nodes, and all the other nodes may use the first one as an entry point to the cluster (again, this is not mandatory: they may use any node that already is a cluster member).

Here is the configuration of any node but the first:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <network>
         <join>
              <member host="31.41.59.1" port="5016" />
         </join> 
    </network>
    <datastream embedded="false" host="27.18.28.2" name="<your_data_stream_name>"/>
    <db embedded="false" host="27.18.28.1" name="<your_database_name>"/>
</node>

Note that, in opposition to the Example 2, the datastream and database are now specified as not embedded. It is thus necessary to set their names too.

Contents

Developer's Guide

CLI Documentation
- mmt create
- mmt clean
- mmt datagen
- mmt train
- mmt start
- mmt stop
- mmt status
- mmt translate
- mmt evaluate
- mmt memory
API Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly