Transformer Language training

This document has instructions for running Transformer Language training using Intel-optimized TensorFlow.

Datasets

Follow instructions to download and preprocess the WMT English-German dataset. Set DATASET_DIR to point out to the location of the dataset directory.

Quick Start Scripts

Script name	Description
`training_demo.sh`	Runs 100 training steps. The script runs in single instance mode by default, for multi instance mode set `MPI_NUM_PROCESSES`.
`training.sh`	Uses mpirun to execute 2 processes with 1 process per socket with a batch size of 5120 for the specified precision (fp32 or bfloat16). Logs for each instance are saved to the output directory.

Run the model

Setup your environment using the instructions below, depending on if you are using AI Tools:

Setup using AI Tools

Setup without AI Tools

To run using AI Tools you will need:

numactl
gcc-8 (for multi-instance only)
g++-8 (for multi-instance only)
libopenmpi-dev (for multi-instance only)
openmpi (for multi-instance only)
openssh (for multi-instance only)
horovod (for multi-instance only)
Activate the `tensorflow` conda environment
```
conda activate tensorflow
```

To run without AI Tools you will need:

Python 3
intel-tensorflow>=2.5.0
git
numactl
gcc-8 (for multi-instance only)
g++-8 (for multi-instance only)
libopenmpi-dev (for multi-instance only)
openmpi (for multi-instance only)
openssh (for multi-instance only)
horovod (for multi-instance only)

A clone of the AI Reference Models repo

git clone https://github.com/IntelAI/models.git

After you've completed the setup, export environment variables for the DATASET_DIR and an OUTPUT_DIR where log files will be written, then run a quickstart script from your AI Reference Models directory. To run multi-instance training, set the MPI_NUM_PROCESSES var to specify the number of instances to run (for example export MPI_NUM_PROCESSES=2 for 2 instances).

# cd to your AI Reference Models directory
cd models

# Set required environment variables
export DATASET_DIR=<path to the dataset>
export PRECISION=<set precision to fp32 or bfloat16>
export OUTPUT_DIR=<path to the directory where log files will be written>

# Optional: Specify the number of sockets to use for multi-instance training
export MPI_NUM_PROCESSES=<number of sockets to use>

# Run a quickstart script
./quickstart/language_translation/tensorflow/transformer_mlperf/training/cpu/<script name>.sh

Additional Resources

To run more advanced use cases, see the instructions for the available precisions FP32 BFloat16 for calling the launch_benchmark.py script directly.
To run the model using docker, please see the Intel® Developer Catalog workload container:
https://www.intel.com/content/www/us/en/developer/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-container.html.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transformer Language training

Datasets

Quick Start Scripts

Run the model

Additional Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transformer Language training

Datasets

Quick Start Scripts

Run the model

Additional Resources