This document has instructions for running Transformer Language training using Intel-optimized TensorFlow.
Follow instructions to download and preprocess the WMT English-German dataset.
Set DATASET_DIR
to point out to the location of the dataset directory.
Script name | Description |
---|---|
training_demo.sh |
Runs 100 training steps. The script runs in single instance mode by default, for multi instance mode set MPI_NUM_PROCESSES . |
training.sh |
Uses mpirun to execute 2 processes with 1 process per socket with a batch size of 5120 for the specified precision (fp32 or bfloat16). Logs for each instance are saved to the output directory. |
Setup your environment using the instructions below, depending on if you are using AI Tools:
Setup using AI Tools | Setup without AI Tools |
---|---|
To run using AI Tools you will need:
|
To run without AI Tools you will need:
|
After you've completed the setup, export environment variables for the DATASET_DIR
and an OUTPUT_DIR
where log files will be written, then run a
quickstart script from your AI Reference Models directory. To run
multi-instance training, set the MPI_NUM_PROCESSES
var to specify the number of
instances to run (for example export MPI_NUM_PROCESSES=2
for 2 instances).
# cd to your AI Reference Models directory
cd models
# Set required environment variables
export DATASET_DIR=<path to the dataset>
export PRECISION=<set precision to fp32 or bfloat16>
export OUTPUT_DIR=<path to the directory where log files will be written>
# Optional: Specify the number of sockets to use for multi-instance training
export MPI_NUM_PROCESSES=<number of sockets to use>
# Run a quickstart script
./quickstart/language_translation/tensorflow/transformer_mlperf/training/cpu/<script name>.sh
- To run more advanced use cases, see the instructions for the available precisions FP32 BFloat16 for calling the
launch_benchmark.py
script directly. - To run the model using docker, please see the Intel® Developer Catalog
workload container:
https://www.intel.com/content/www/us/en/developer/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-container.html.