Update documentation #10

alsrgv · 2017-08-21T17:48:15Z

Add motivation in the beginning.
Clarify how processes are assigned GPUs with visible_devices_list.
Add quick guide to install Open MPI. Add missing mpicxx in PATH to troubleshooting.
Add -x LD_LIBRARY_PATH and other useful env vars to multi-node mpirun example
Add quick guide to install NCCL, link to nv_peer_mem for GPUDirect, /etc/init.d/nv_peer_mem start. Add missing NCCL error to troubleshooting.
Add Travis CI & license links.
Open MPI & IB should use -mca btl_openib_receive_queues P,128,32:P,2048,32:P,12288,32:P,131072,32, greatly improves performance.

1. Add motivation in the beginning. 2. Clarify how processes are assigned GPUs with visible_devices_list. 3. Add quick guide to install Open MPI. Add missing mpicxx in PATH to troubleshooting. 4. Add -x LD_LIBRARY_PATH and other useful env vars to multi-node mpirun example 5. Add quick guide to install NCCL, link to nv_peer_mem for GPUDirect, /etc/init.d/nv_peer_mem start. Add missing NCCL error to troubleshooting. 6. Add Travis CI & license links. 7. Open MPI & IB should use -mca btl_openib_receive_queues P,128,32:P,2048,32:P,12288,32:P,131072,32, greatly improves performance.

sblotner · 2017-08-21T18:06:16Z

README.md

 Horovod is a distributed training framework for TensorFlow. The goal of Horovod is to make distributed Deep Learning
 fast and easy to use.

+# Why not traditional Distributed TensorFlow?
+
+The primary motivation for this project is to make it easy to take single GPU TensorFlow program and successfully train


take a single GPU

sblotner · 2017-08-21T18:06:29Z

README.md

+The primary motivation for this project is to make it easy to take single GPU TensorFlow program and successfully train
+it on many GPUs faster. This has two aspects:
+
+1. How much modifications does one have to make to program to make it distributed, and how easy is it to run it.


to a program

sblotner · 2017-08-21T18:06:43Z

README.md

+1. How much modifications does one have to make to program to make it distributed, and how easy is it to run it.
+2. How much faster would it run in distributed mode?
+
+Internally at Uber we found that it's much easier for people to understand MPI model that requires minimal changes to


a MPI model

sblotner · 2017-08-21T18:07:34Z

README.md

+To give some perspective on that, [this commit](https://github.com/alsrgv/benchmarks/commit/86bf2f9269dbefb4e57a8b66ed260c8fab84d6c7) 
+into our fork of TF Benchmarks shows how much code can be removed if one doesn't need to worry about towers and manually
+averaging gradients across them, `tf.Server()`, `tf.ClusterSpec()`, `tf.train.SyncReplicasOptimizer()`, 
+`tf.train.replicas_device_setter()` and etc. If none of this things makes sense to you - don't worry, you don't have to 


replace "etc." with "so on".

Also, "If none of these things"

sblotner · 2017-08-21T18:08:10Z

README.md

+learn them if you use Horovod.
+
+While installing MPI itself may seem like an extra hassle, it only needs to be done once and by one group of people,
+while everyone else in the company who are building the models can enjoy simplicity of training them at scale.


"who are building" ==> "who builds"

sblotner · 2017-08-21T18:08:41Z

README.md

@@ -53,7 +91,8 @@ To use Horovod, make the following additions to your program:
 1. Run `hvd.init()`.

 2. Pin a server GPU to be used by this process using `config.gpu_options.visible_device_list`.
-    With the typical setup of one GPU per process, this can be set to *local rank*.
+    With the typical setup of one GPU per process, this can be set to *local rank*. In that case, first process on the


the first process

sblotner · 2017-08-21T18:09:23Z

README.md

@@ -53,7 +91,8 @@ To use Horovod, make the following additions to your program:
 1. Run `hvd.init()`.

 2. Pin a server GPU to be used by this process using `config.gpu_options.visible_device_list`.
-    With the typical setup of one GPU per process, this can be set to *local rank*.
+    With the typical setup of one GPU per process, this can be set to *local rank*. In that case, first process on the
+    server will be allocated first GPU, second process will be allocated second GPU and so forth.


allocated the first GPU, the second process will be allocated the second GPU , and so forth.

sblotner · 2017-08-21T18:10:12Z

README.md

+
+1. Is MPI in PATH?
+
+If you see error message below, it means `mpicxx` was not found in PATH. Typically `mpicxx` is located in the same


see the error

sblotner · 2017-08-21T18:10:24Z

README.md

+1. Is MPI in PATH?
+
+If you see error message below, it means `mpicxx` was not found in PATH. Typically `mpicxx` is located in the same
+directory as `mpirun`. Please add directory containing `mpicxx` to PATH before installing Horovod.


Please add a directory

sblotner · 2017-08-21T18:10:33Z

README.md

+
+### NCCL 2 is not found
+
+If you see error message below, it means NCCL 2 was not found in standard libraries location. If you have directory


see the error

If you have a directory

alsrgv requested review from thepaulm and sblotner August 21, 2017 17:48

alsrgv self-assigned this Aug 21, 2017

sblotner approved these changes Aug 21, 2017

View reviewed changes

alsrgv added 2 commits August 21, 2017 11:23

Grammar fixes

3ce97c6

Add --no-cache-dir to make sure horovod is rebuilt

43024b5

alsrgv merged commit 1a43074 into master Aug 21, 2017

alsrgv deleted the update-readme branch August 21, 2017 18:46

alsrgv mentioned this pull request Aug 21, 2017

motivation for this project #9

Closed

heliangliang91 mentioned this pull request Mar 10, 2018

Segmentation fault (11) in the worker with rank=0 #107

Closed

wangzhimingchn mentioned this pull request Feb 26, 2019

horovod with pytorch produces seg fault #761

Closed

PiseyYou mentioned this pull request Sep 24, 2019

Mismatched ALLREDUCE CPU/GPU #748

Closed

johnkim126 mentioned this pull request Nov 6, 2019

OpenMPI 3.0.0 hangs initialize step in SGE #1500

Closed

anweshpanda mentioned this pull request Jul 15, 2020

Error while trying to use gradient compression #2108

Closed

dingdingbin mentioned this pull request Aug 20, 2020

When I used Horovod with Pytorch to distribute train DLRM on CPU nodes(two nodes), the result shown 100x slower than single node Pytorch #2192

Closed

vanillar7 mentioned this pull request Dec 16, 2020

the meaning of the log prefix #2527

Closed

weberxie mentioned this pull request May 28, 2021

Horovod will hang forever when run it with data parallel model (one process multiple GPUs) #2944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation #10

Update documentation #10

alsrgv commented Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017

sblotner Aug 21, 2017


		1. Is MPI in PATH?

		If you see error message below, it means `mpicxx` was not found in PATH. Typically `mpicxx` is located in the same


		### NCCL 2 is not found

		If you see error message below, it means NCCL 2 was not found in standard libraries location. If you have directory

Update documentation #10

Update documentation #10

Conversation

alsrgv commented Aug 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment