Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

32 lines (18 sloc) 908 Bytes

Horovod in LSF

This page includes examples for running Horovod in a LSF cluster. horovodrun will automatically detect the host names and GPUs of your LSF job. If the LSF cluster supports jsrun, horovodrun will use it as launcher otherwise it will default to mpirun.

Inside a LSF batch file or in an interactive session, you just need to use:

horovodrun python train.py

Here, Horovod will start a process per GPU on all the hosts of the LSF job.

You can also limit the run to a subset of the job resources. For example, using only 6 GPUs:

horovodrun -np 6 python train.py

You can still pass extra arguments to horovodrun. For example, to trigger CUDA-Aware MPI:

horovodrun --mpi-args="-gpu" python train.py
You can’t perform that action at this time.