Skip to content

Advanced Configurations

Georgy Chirkov edited this page Aug 19, 2019 · 42 revisions

The advanced configuration of an MMT engine can be manually set through the XML file engine.xconf located in <your_mmt_home>/engines/<your_engine_name>/engine.xconf.

You will find below information on how to properly configure an engine through the engine.xconf file. Or you can just skip to some interesting configuration examples.

Configuration

The engine.xconf file is automatically generated during the engine creation and training (launched with the ./mmt create command). By default it looks like this and it already provides a valid configuration.

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="en" target-language="it" />
</node>

It is possible to configure properties by adding XML elements and attributes under the element “node”. Here is an example of a configured file:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="en" target-language="it" name="my_engine">
        <decoder threads="5" />
    </engine>
    <network>
         <api port="8000"/>
         <join>
              <member host="16.51.99.2" port="5015" />
         </join> 
    </network>
    <datastream embedded="false" host="16.51.99.2"/>
    <db embedded="false" host="16.51.99.2" />
</node>

Customizable properties include:

Some of these settings can also be passed as command line arguments; in case of conflicts the command line arguments are considered to have higher priority. In case a property is defined neither in the configuration file nor by command line, its default value will be employed.

Please note that, if you launch again the ./mmt create command for a certain engine, the existing configuration of that engine will be overwritten with the basic default one.
 

Engine Configuration

The general features of the engine are described in the auto-generated "engine" XML element, child to the "node" XML element.

Here is an example of a fully configured "engine" XML element:

  <node>
    ...
    <engine source-language="en" target-language="it">
        <decoder enabled="true" gpus="1,3,5" />
        <aligner enabled="false" />
    </engine>
    ...
  </node>
Description Valid Values
source-language The original language to translate from.
NOTE: this field must not be used if the child is present
target-language The language to translate to

Languages

ModernMT supports multilingual engines, meaning that a single engine can handle multiple (unidirectional) language pairs.

If you want to enable multiple language pairs, you need to:

  • erase the default source-language and target-language attributes in your <engine> node;
  • add a new <languages> child to your your <engine> node;
  • add to <languages> as many <pair> nodes as the language pairs you want to enable, and set their source and target attributes to the corresponding language tags.

Pair

Each <pair> child under the <languages> node of <engine> represents a language pair that must be enabled for this engine. Valid pair attributes include: Valid decoder attributes include:

Attribute name Description
source The language tag of the source language of this pair
target The language tag of the target language of this pair
 

Decoder

Adding a "decoder" element allows to set the features of the translation decoder to use. Valid decoder attributes include:

Attribute name Description Valid Values Default value
enabled It defines whether the engine should use a decoder or not true or false true
threads The decoder will run on CPU using this amount of threads.
NOTE: In order to specify that it must run on CPUs it is also mandatory to set the gpus attribute to "none".
-- (run on GPUs)
gpus Comma-separated list of the ids of the GPUS that the neural decoder will use.
Example: 1,3,5
A single GPU id or a comma-separated list of GPU ids;
'none' if no GPUs must be used
All the available GPUs
 

Aligner

Adding an "aligner" element allows to set the features of the aligner component to use. Valid aligner attributes include:

Attribute name Description Valid Values Default value
enabled It defines whether the engine should use an aligner for Tag Projection API.
true or false false
 

Network Configuration

To define the network behaviour of the engine, add a "network" XML element under "node".

Here is an example of a fully configured "network" element.

<node>
    ...
    <network  host="10.5.10.237" port="5000" interface="eth0">
         <api port="8888" root="test" />
         <join>
              <member host="31.41.59.1" port="5015"/>
              <member host="31.41.59.2" port="5016"/>
              <member host="31.41.59.3" port="5017"/>
         </join> 
    </network>
    ...
</node>

The attributes in the "network" node can be used to specify the general network settings:

Attribute name Description Valid Values Default value
host The IP address that this machine must be reachable at by the other cluster nodes -- The Ipv4 address of this machine
port The cluster communications logic port -- 5016
interface The network interface where this machine will listen to cluster communication messages -- null

 

More specific network settings, such as the REST APIs and the cluster joining configurations, require the definition of specific XML elements under "network":  

REST APIs server

The configuration of the REST Server used to expose APIs can be set in a new "api" XML element under "network". Valid attributes for element "api" include:

    Attribute name Description Valid Values Default value
    enabled It defines whether the engine should expose REST APIs or not
    • true: launch the REST server and expose APIs
    • false: do not expose any REST APIs
    true
    port the REST APIs port -- 8045
    root the path in the host where REST APIs must be exposed. -- None

Join

Adding a "join" XML element under "network" allows the configuration of an MMT cluster. In "join" it is possible to specify a series of "member" children elements. Each member is a potential entry point to the cluster: this engine will contact them in order until one of them answers back. Each "member" element requires two attributes:

    Attribute name Description Valid Values Default Value
    host the current member IP address or hostname -- --
    port the cluster communication port -- --
 

Data Stream Configuration

To define the way the engine should connect to a data stream, add a "datastream" XML element, child to the "node" XML element.

Here is an example of a fully configured "datastream" element:

<node>
   ...
   <datastream enabled="true" embedded="false" host="31.41.59.1" port="9999"/>
   ...
</node>

Valid "datastream" attributes are:

Attribute name Description Valid Values Default value
enabled it defines whether this engine should use a data stream
  • true: the engine launches or connects to a data stream
  • false: the engine does not use any data streams
true
embedded it defines whether the data stream belongs to an MMT engine or is a separate process
  • true: the data stream server is embedded in an engine. If host is localhost, this engine will launch the data stream itself
  • false: this engine will connect to a running, separate data stream server
true
host the the data stream host IP address or hostname -- localhost
port the data stream port -- 9092
name The name of the data stream this engine should interact with. --
  • if embedded is true, the default name is an empty string "";
  • if embedded is false the name is mandatory;
 

Database Configuration

To define the way the engine should connect to the Database, add a "db" XML element, child to the "node" XML element.

Here is an example of a fully configured "db" element:

<node>
   ...
   <db enabled="true" embedded="false" host="31.41.59.1" port="9444" type="mysql" name="mmtDB"/>
   ...
</node>

Valid "db" attributes are:

Attribute name Description Valid Values Default Value
enabled It defines whether the engine should connect with a DB or not
  • true: this engine must try to use a DB
  • false: this engine must not try to use a DB
true
host the database host IP address or hostname -- localhost
port the database port -- 9042
name the name of the database this engine should interact with --
  • if embedded is true, the default name is "default";
  • if embedded is false the database name is mandatory;
embedded it defines whether the database belongs to an MMT engine or is a separate process
  • true: the database is embedded in an of the cluster.
    If host is localhost this engine will launch the database itself. Only Cassandra DBs can be embedded.
  • false: this engine will connect to a running, separate database server
true
type the type of the DB to interact with
  • mysql:try to connect to a MySQL DB.
    MySQL interaction is only allowed for external DBs. If type is "mysql", embedded must be false
  • cassandra: try to connect to a Cassandra DB
cassandra
     

Interesting Examples

Here are some examples of how engine.xconf files can be used to configure nodes for various scenarios.

 

Example 1: Single Node

This is a sample configuration for an MMT engine named 'default' working alone and exposing its REST APIs on port 8045. During the execution of ./mmt start, the engine itself launches the database process with port 8042 and the data stream process with port 8092; during the execution of ./mmt stop, these processes are stopped as well.

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
</node>

 

As an alternative, you may want your MMT engine to use already launched database and data stream instances running on your machine.
You can set the database and datastream as not embedded and you can also specify their ports, that may be different from the default ones:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <datastream embedded="false" port="<your_data_stream_port>" />
    <db embedded="false" port="<your_db_port>" />
</node>

That's it! Of course, if a service is running as not embedded, it will not be stopped by the ./mmt stop command.

 

Example 2: Leader-Followers Cluster

In an MMT cluster with a Leader-Followers style:

  • the Leader is a node that hosts, in addition to an engine, both the database process and the data stream process.
  • the Followers join the cluster using any of its members as an entry point, and connect directly to the Leader's database and data stream.

Using an MMT cluster lets nodes propagate translation knowledge and jobs, leading to better scalability and fault-tolerance. Separate engine instances should run on different machines.

The Leader may be configured as shown in Example 1, and it should be started as first in order to make sure the Database and data stream processes are running when the Followers try to connect.

The Followers, on the contrary, require a slightly different configuration. For sake of simplicity let's say that all nodes use default ports and names, and that Followers will use as an entry point the Leader (this it's not mandatory: they may use any node that already is a cluster member):

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <network>
         <join>
              <member host="31.41.59.3" port="5016" />
         </join> 
    </network>
    <datastream host="31.41.59.3" />
    <db host="31.41.59.3" />
</node>

 

Note that the database and datastream, that are embedded in the Leader node, are considered embedded by the Followers too (and since the default "embedded" value is "true" it is not necessary to add that attribute).

As an alternative to the previous configuration, Followers can keep the default engine.xconf configuration and be started with the -join--leader options set to the Leader host: ./mmt start -join--leader 31.41.59.3

 

Example 3: Peer-to-Peer Cluster

In an MMT cluster with Peer-to-Peer style, the database and the data stream processes run in a cluster member, but separate machines. Therefore all nodes have the same role, and the Leader's single-point-of-failure is avoided. Moreover, the specified database and data stream hosts may hide replication and load balancing techniques, ensuring fault-tolerance of the system.

As before, for sake of simplicity let's use default ports and names; moreover, let's consider node 31.41.59.1 as the first node to start and the one that everyone tries to join to.

Note that when the first node is started there are no cluster members to join. As a consequence, the first node does not need any nodes, and all the other nodes may use the first one as an entry point to the cluster (again, this is not mandatory: they may use any node that already is a cluster member).

Here is the configuration of any node but the first:

<node xsi:schemaLocation="http://www.modernmt.eu/schema/config mmt-config-1.0.xsd"
      xmlns="http://www.modernmt.eu/schema/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <engine source-language="<your_src_lang>" target-language="<your_trg_lang>" />
    <network>
         <join>
              <member host="31.41.59.1" port="5016" />
         </join> 
    </network>
    <datastream embedded="false" host="27.18.28.2" name="<your_data_stream_name>"/>
    <db embedded="false" host="27.18.28.1" name="<your_database_name>"/>
</node>

 

Note that, in opposition to the Example 2, the datastream and database are now specified as not embedded. It is thus necessary to set their names too.

Clone this wiki locally