Skip to content
Permalink
Branch: master
Find file Copy path
1 contributor

Users who have contributed to this file

54 lines (38 sloc) 1.91 KB

Benchmarks

512-GPU Benchmark

The above benchmark was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network. Horovod achieves 90% scaling efficiency for both Inception V3 and ResNet-101, and 68% scaling efficiency for VGG-16.

To reproduce the benchmarks:

  1. Install Horovod using the instructions provided on the Horovod on GPU page.

  2. Clone https://github.com/tensorflow/benchmarks:

$ git clone https://github.com/tensorflow/benchmarks
$ cd benchmarks
  1. Run the benchmark. Examples below are for Open MPI.

    $ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 \
        python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
            --model resnet101 \
            --batch_size 64 \
            --variable_update horovod
  2. At the end of the run, you will see the number of images processed per second:

total images/sec: 1656.82

Real data benchmarks

The benchmark instructions above are for the synthetic data benchmark.

To run the benchmark on a real data, you need to download the ImageNet dataset and convert it using the TFRecord preprocessing script.

Now, simply add --data_dir /path/to/imagenet/tfrecords --data_name imagenet --num_batches=2000 to your training command:

$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 \
    python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
        --model resnet101 \
        --batch_size 64 \
        --variable_update horovod \
        --data_dir /path/to/imagenet/tfrecords \
        --data_name imagenet \
        --num_batches=2000
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.