Skip to content

Latest commit

 

History

History
97 lines (81 loc) · 4.09 KB

File metadata and controls

97 lines (81 loc) · 4.09 KB

Transformer Language training

This document has instructions for running Transformer Language training using Intel-optimized TensorFlow.

Datasets

Follow instructions to download and preprocess the WMT English-German dataset. Set DATASET_DIR to point out to the location of the dataset directory.

Quick Start Scripts

Script name Description
training_demo.sh Runs 100 training steps. The script runs in single instance mode by default, for multi instance mode set MPI_NUM_PROCESSES.
training.sh Uses mpirun to execute 2 processes with 1 process per socket with a batch size of 5120 for the specified precision (fp32 or bfloat16). Logs for each instance are saved to the output directory.

Run the model

Setup your environment using the instructions below, depending on if you are using AI Tools:

Setup using AI Tools Setup without AI Tools

To run using AI Tools you will need:

  • numactl
  • gcc-8 (for multi-instance only)
  • g++-8 (for multi-instance only)
  • libopenmpi-dev (for multi-instance only)
  • openmpi (for multi-instance only)
  • openssh (for multi-instance only)
  • horovod (for multi-instance only)
  • Activate the `tensorflow` conda environment
    conda activate tensorflow

To run without AI Tools you will need:

  • Python 3
  • intel-tensorflow>=2.5.0
  • git
  • numactl
  • gcc-8 (for multi-instance only)
  • g++-8 (for multi-instance only)
  • libopenmpi-dev (for multi-instance only)
  • openmpi (for multi-instance only)
  • openssh (for multi-instance only)
  • horovod (for multi-instance only)
  • A clone of the AI Reference Models repo
    git clone https://github.com/IntelAI/models.git

After you've completed the setup, export environment variables for the DATASET_DIR and an OUTPUT_DIR where log files will be written, then run a quickstart script from your AI Reference Models directory. To run multi-instance training, set the MPI_NUM_PROCESSES var to specify the number of instances to run (for example export MPI_NUM_PROCESSES=2 for 2 instances).

# cd to your AI Reference Models directory
cd models

# Set required environment variables
export DATASET_DIR=<path to the dataset>
export PRECISION=<set precision to fp32 or bfloat16>
export OUTPUT_DIR=<path to the directory where log files will be written>

# Optional: Specify the number of sockets to use for multi-instance training
export MPI_NUM_PROCESSES=<number of sockets to use>

# Run a quickstart script
./quickstart/language_translation/tensorflow/transformer_mlperf/training/cpu/<script name>.sh

Additional Resources