Skip to content

Commit

Permalink
add test and docs for -hostfile option
Browse files Browse the repository at this point in the history
Signed-off-by: Lin Yuan <apeforest@gmail.com>
  • Loading branch information
apeforest committed Jul 22, 2019
1 parent de4ec80 commit 2cc6356
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .buildkite/gen-pipeline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,13 @@ run_all() {
":muscle: Test MXNet MNIST (${test})" \
"bash -c \"OMP_NUM_THREADS=1 \\\$(cat /mpirun_command) python /horovod/examples/mxnet_mnist.py\""

if [[ ${test} == *"openmpi"* ]]; then
run_test "${test}" "${queue}" \
":muscle: Test Horovodrun (${test})" \
"echo 'localhost slots=2' > hostfile" \
"horovodrun -np 2 -hostfile hostfile python /horovod/examples/mxnet_mnist.py"
fi

# tests that should be executed only with the latest release since they don't test
# a framework-specific functionality
if [[ ${test} == *"tf1_14_0"* ]]; then
Expand Down
16 changes: 16 additions & 0 deletions docs/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,22 @@ To run on 4 machines with 4 GPUs each:
$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py
Host nodes can also be specified in a host file:
For example,

$ cat myhostfile
aa slots=2
bb slots=2
cc slots=2

Here, we list both the host names (aa, bb, and cc) but also how many "slots" there are for each.
Slots indicate how many processes can potentially execute on a node.
This format is the same as in mpirun command,
see `this page <https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php#toc6>`_.

To run on hosts specified in a hostfile:
.. code-block:: bash
$ horovodrun -np 6 -hostfile myhostfile python train.py
Failures due to SSH issues
~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down

0 comments on commit 2cc6356

Please sign in to comment.