Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

This is the code to produce the TensorFlow benchmark on this website

Here are also some related blog posts:

Tested Environment:

  • OS: Ubuntu 18.04
  • TensorFlow version: 1.15.4 or 2.3.1
  • CUDA Version 10.0
  • CUDNN Version 7.6.5

You can use Lambda stack which system-wise install the above software stack. If you have CUDA 10.0 installed, you can also create a Python virtual environment by following these steps:

virtualenv -p /usr/bin/python3.6 venv
. venv/bin/activate

pip install matplotlib

# TensorFlow 1.15.4
pip install tensorflow-gpu==1.15.4

# TensorFlow 2.3.1
pip install tensorflow-gpu==2.3.1

Step One: Clone benchmark repo

git clone --recursive

Step Two: Run benchmark with thermal profiler

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./ \
min_num_gpus max_num_gpus \
num_runs num_batches_per_run \
thermal_sampling_frequency \

Notice if min_num_gpus is set to be different from max_num_gpus, then multiple benchmarks will be launched multiple times. One for each case between min_num_gpus and max_num_gpus.

This is an example of benchmarking 4 GPUs (min_num_gpus=4 and max_num_gpus=4) for a single run (num_runs=1) of 100 batches (num_batches_per_run=100), measuring thermal every 2 seconds (thermal_sampling_frequency=2) and using the config file config/config_resnet50_replicated_fp32_train_syn.

TF_XLA_FLAGS=--tf_xla_auto_jit=2 \
./ 4 4 \
1 100 \
2 \

The config file sets up a training throughput test for resnet50, using replicated mode for parameter update, use fp32 as the precision, and uses synthetic (syn) data:


You can find more examples of configrations in the config folder.

Step Three: Report Results

This is the command to gather results in logs folder into a CSV file:

python tools/ --precision fp32 
python tools/ --precision fp16

The gathered results are saved in tf-train-throughput-fp16.csv, tf-train-throughput-fp32.csv, tf-train-bs-fp16.csv and tf-train-bs-fp32.csv.

Add your own log to the list_system dictionary in tools/, so they can be included in the generated csv.

You can also dispaly the throughput v.s. time and GPU temperature v.s. time graph using this command:

python path-to-thermal.log --thermal_threshold

For example, this is the command to display the graphs of a ResNet50 training using 8x2080Ti:

python tools/ \
logs/Gold_6230-GeForce_RTX_2080_Ti_XLA_trt_TF2_2.logs/syn-replicated-fp16-8gpus/resnet50-128/thermal/1 \
--thermal_threshold 89

Synthetic Data V.S. Real Data

Set DATA_MODE="syn" in the config file uses synthetic data in the benchmarks. In which case images of random pixel colors were generated on GPU memory to avoid overheads such as I/O and data augmentation.

You can also benchmark with real data. To do so, simply set DATA_MODE="real" in the config file. You also need to have imagenet tfrecords. For the purpose of benchmark training throughput, you can download and unzip this mini portion of ImageNet(1.3 GB) to your home directory.


Follow the guidance here

alias drun='sudo docker run \
      -it \
      --network=host \
      --device=/dev/kfd \
      --device=/dev/dri \
      --ipc=host \
      --shm-size 16G \
      --group-add video \
      --cap-add=SYS_PTRACE \
      --security-opt seccomp=unconfined \
      -v $HOME/dockerx:/dockerx'

drun rocm/tensorflow:rocm3.5-tf2.1-dev

#installed these two in the container

cd /home/dockerx
git clone --recursive

# Run a quick resnet50 test in FP32
./ 1 1 1 100 2 config_resnet50_replicated_fp32_train_syn

# Run full test for all models, FP32 and FP16, training and inference
./ 1 1 1 100 2 config_all