Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs migration to Sphinx #1088

Merged
merged 12 commits into from May 29, 2019

Conversation

3 participants
@IgorWilbert
Copy link
Contributor

commented May 19, 2019

The bloated README really needs to be divided, but right now we have links on it that don't work because they point to non .rst files. To fix it, I'll update them to the .rst syntax.

Roadmap

  • Concepts.md

  • Benchmarks.md

  • Docker.md

  • Gpus.md

  • Inference.md

  • Running.md

  • Spark.md

  • Tensor-fusion.md

  • Timeline.md

  • Troubleshooting.md

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch 2 times, most recently from 7c54924 to d8ca675 May 19, 2019

@alsrgv
Copy link
Collaborator

left a comment

Thanks for the PR! Sorry, I am out on a business trip and didn't have a chance to look before.

@sblotner, could you take a look as well?

README.rst Outdated
@@ -69,7 +69,7 @@ servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:
:alt: 512-GPU Benchmark

Horovod achieves 90% scaling efficiency for both Inception V3 and ResNet-101, and 68% scaling efficiency for VGG-16.
See the `Benchmarks <docs/benchmarks.md>`_ page to find out how to reproduce these numbers.
See the `Benchmarks <benchmarks_include.html>`_ page to find out how to reproduce these numbers.

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 24, 2019

Collaborator

These .html links break GitHub user experience. What if you point to docs/benchmarks.rst file?

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Yes, pointing to benchmarks.rst should work (same recommendation for the .html links below)

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Pointing to <benchmarks_include.html>breaks Github user experience, while pointing to <docs/benchmarks.rst> breaks the documentation (Page not found). I couldn't find a way how to put two addresses on the same anchor or have some sort of fallback, so I decided to point to https://github.com/horovod/horovod/blob/master/docs/benchmarks.rst, which is also used in Petastorm. I tested with existing equivalent links, so they should work just fine once merged.

To run the benchmark on a real data, you need to download the `ImageNet dataset <http://image-net.org/download-images>`__
and convert it using the TFRecord `preprocessing script <https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh>`__.

Now, simply add `--data_dir /path/to/imagenet/tfrecords --data_name imagenet --num_batches=2000` to your training command:

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 24, 2019

Collaborator

Turn single back-ticks to double back-ticks?

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Yes, in .rst double back ticks is monospace, single back ticks is for italic

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

I think I got them all, thanks!


**Building**

Before building, you can modify `Dockerfile` to your liking, e.g. select a different CUDA, TensorFlow or Python version.

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 24, 2019

Collaborator

Replace all single backticks with double backticks (except in links).

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

I think I got them all, thanks!


.. code-block:: bash
host1$ nvidia-docker run -it --network=host -v /mnt/share/ssh:/root/.ssh horovod:latest

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 24, 2019

Collaborator

In the previous refactoring, you have removed xxx$ from bash excerpts. Since it looks like it's often useful, can you bring them back in other .. code-block:: bash?

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

My bad, I was trying random things to make it work. Now it is fixed, thanks!

@alsrgv alsrgv requested a review from sblotner May 24, 2019

@sblotner
Copy link
Collaborator

left a comment

Thank you, Igor!! Added some comments about .rst syntax and a few edits.

Regarding links, I recommend you check that they open as expected (ex: not opening benchmarks instead benchmark_include.rst)

README.rst Outdated
@@ -69,7 +69,7 @@ servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:
:alt: 512-GPU Benchmark

Horovod achieves 90% scaling efficiency for both Inception V3 and ResNet-101, and 68% scaling efficiency for VGG-16.
See the `Benchmarks <docs/benchmarks.md>`_ page to find out how to reproduce these numbers.
See the `Benchmarks <benchmarks_include.html>`_ page to find out how to reproduce these numbers.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Yes, pointing to benchmarks.rst should work (same recommendation for the .html links below)

:alt: 512-GPU Benchmark


The above benchmark was done on 128 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network. Horovod

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

connected by --> connected by a

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Done, thanks!


To reproduce the benchmarks:

1. Install Horovod using the instructions provided on the `Horovod on GPU <gpus.md>`__ page.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

gpus.md --> gpus.rst

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Made it a little different, I hope it works. Thanks!

To run the benchmark on a real data, you need to download the `ImageNet dataset <http://image-net.org/download-images>`__
and convert it using the TFRecord `preprocessing script <https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh>`__.

Now, simply add `--data_dir /path/to/imagenet/tfrecords --data_name imagenet --num_batches=2000` to your training command:

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Yes, in .rst double back ticks is monospace, single back ticks is for italic

=================

To streamline the installation process on GPU machines, we have published the reference `Dockerfile <../Dockerfile>`__ so

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

remove blank line

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Done, thanks!

2. The second part defines a Keras model and performs a distributed training
of the model using Horovod in Spark.
3. The third part performs prediction using the best model and creates
a submission file.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

same as above

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Done, thanks!

a submission file.
To run the example, please install the following dependencies:
* `pyspark`

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Add a blank line before the list

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Yes, this was missing to make it work. Thank you!


1. **Negotiation** - a phase when all workers send to rank 0 signal that they're ready to reduce the given tensor.

* Each worker reporting readiness is represented by a tick under the *NEGOTIATE_ALLREDUCE* bar, so you can see which workers were early and which were late.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Indent 3 spaces

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

I changed many things, so I'm not sure if it is fixed. Right now it renders as below. Let me know if I should change anything else, thanks!
Github:
sample2
Docs:
sample


* Each worker reporting readiness is represented by a tick under the *NEGOTIATE_ALLREDUCE* bar, so you can see which workers were early and which were late.

* Immediately after negotiation, rank 0 sends all other workers signal to start reducing the tensor.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Indent 3 spaces

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Same as above.


2. **Processing** - a phase when the operation actually happens. It is further subdivided into multiple sub-phases:

* *WAIT_FOR_DATA* indicates time taken to wait for GPU to finish computing input to the *allreduce*, *allgather*, or *broadcast* operations. This happens because TensorFlow tries to smartly interleave scheduling and GPU computation. This is only applicable to situations where the Horovod operation is placed on GPU.

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

Indent 3 spaces

This comment has been minimized.

Copy link
@sblotner

sblotner May 24, 2019

Collaborator

same for all bullets below

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 26, 2019

Author Contributor

Same as above.

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch 3 times, most recently from 55eee51 to d2e1893 May 26, 2019

@IgorWilbert

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2019

No problem, thank you for the reviews @alsrgv and @sblotner ! The major change was the links, that now point to github files from master. Using Sphinx, I could not find a way how to make a link that navigates to other page in the docs, if you are using the docs, or navigates to other github file, if you are on github. I don't think it is worth to duplicate the files just because of this. If you have any suggestion or want more changes, please let me know. Thanks!

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch from d2e1893 to 20b50bb May 26, 2019

@alsrgv

This comment has been minimized.

Copy link
Collaborator

commented May 26, 2019

Thanks, @IgorWilbert !

Could you look at this thread esp8266/Arduino#4409? It seems they found a way to make GitHub and Sphinx links compatible.

Also, could you rebase on master to solve buildkite failure?

@IgorWilbert

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2019

@alsrgv thank you for the link! I did some tests and something like Docker <https://github.com/horovod/horovod/blob/master/docs/docker.rst>__ or Docker <docs/docker.rst>__ works for Github. Docker <docs/docker.rst>__ should also work for Sphinx, but while running local it does not work because it is fetching the page from docs/_build/html/docker_include.html and not docs/docker.rst. Based on the issue you provided I guess Docker <docs/docker.rst>__ will also work for Sphinx once it is hosted somewhere. Which format do you want me to use, Docker <https://github.com/horovod/horovod/blob/master/docs/docker.rst>__ or Docker <docs/docker.rst>__ ?

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch from 20b50bb to 0809e76 May 27, 2019

@alsrgv

This comment has been minimized.

Copy link
Collaborator

commented May 27, 2019

I think the suggestion in that thread is to install nbsphinx extension which rewrites links in html. So you could have link to rst file in .rst, which will get mapped to html link in html output. Can you try that?

@IgorWilbert

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2019

Now I understood, thanks! I installed nbsphinx, made some tests in my last commit and the link rewrite works, but now I have a different problem. Because README.rst is outside the docs folder, if I use Benchmarks <benchmarks.rst>__ it works for Sphinx (mapping to /horovod/docs/_build/html/benchmarks.html) but not for Github (mapping to https://github.com/IgorWilbert/horovod/blob/documentation-docs-migration-rst/benchmarks.rst). However, if I use Benchmarks <docs/benchmarks.rst>__ , it works for Github (mapping to https://github.com/IgorWilbert/horovod/blob/documentation-docs-migration-rst/docs/benchmarks.rst) but not for Sphinx (mapping to /horovod/docs/_build/html/docs/benchmarks.rst). Do you know if there is any Sphinx config file where I can fix this?
If it gets too complicated, we could leave README.rst where it is with the right links for Github and have a copy of it with a different name inside docs folder with the right links for Sphinx. Then in future refactors we would reduce the content of README for Sphinx by moving its content to other files, until we eliminate it altogether. What do you think? I say this because the project from that thread does not have a README inside docs.

@alsrgv

This comment has been minimized.

Copy link
Collaborator

commented May 27, 2019

I think in the long term we'll have different README on Github and landing page in the ReadTheDocs.

README on Github should just be a teaser and point to ReadTheDocs for actual info.

With that in mind, let's finish building out ReadTheDocs site by copying Readme, and then we can drastically reduce top level README used on Github.

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch 4 times, most recently from 69b0a12 to c979432 May 28, 2019

IgorWilbert added some commits May 19, 2019

rst migration: concepts
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: running horovod
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: benchmarks
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: docker
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: gpus
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: inference
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: spark
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: tensor-fusion
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: timeline
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
rst migration: troubleshooting
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
Changes requested in review 1
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch from c979432 to a8bd2a2 May 28, 2019

@IgorWilbert

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

@alsrgv I separated "README" for Github from "README" for Sphinx, fixed the links on both and did a rebase. Let me know if you need any other change, thanks!

@alsrgv

This comment has been minimized.

Copy link
Collaborator

commented May 29, 2019

Changes requested in review 2
Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

@IgorWilbert IgorWilbert force-pushed the IgorWilbert:documentation-docs-migration-rst branch from a8bd2a2 to f827d1b May 29, 2019

@IgorWilbert

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

@alsrgv you are right, thank you! Let me know if something else is missing.

@alsrgv

alsrgv approved these changes May 29, 2019

Copy link
Collaborator

left a comment

LGTM, thanks!

@alsrgv alsrgv merged commit 6f40001 into horovod:master May 29, 2019

2 checks passed

DCO DCO
Details
buildkite/horovod/pr Build #383 passed (44 minutes, 29 seconds)
Details
total images/sec: 1656.82
**Real data benchmarks**

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 30, 2019

Collaborator

I just noticed that all these bold titles don't produce links that could be referenced. Can you replace them with actual subsections (~~~)?

This comment has been minimized.

Copy link
@IgorWilbert

IgorWilbert May 30, 2019

Author Contributor

Sure. I suppose you want me to create a new Pull Request for this?

This comment has been minimized.

Copy link
@alsrgv

alsrgv May 30, 2019

Collaborator

Yes, please :-)

shirosankaku added a commit to SmorkalovME/horovod that referenced this pull request May 30, 2019

Docs migration to Sphinx (horovod#1088)
* rst migration: concepts

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: running horovod

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: benchmarks

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: docker

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: gpus

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: inference

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: spark

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: tensor-fusion

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: timeline

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: troubleshooting

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* Changes requested in review 1

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* Changes requested in review 2

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
Signed-off-by: Yana Shchyokotova <yana.shchyokotova@intel.com>

zsh-thu added a commit to zsh-thu/horovod that referenced this pull request Jun 3, 2019

Docs migration to Sphinx (horovod#1088)
* rst migration: concepts

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: running horovod

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: benchmarks

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: docker

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: gpus

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: inference

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: spark

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: tensor-fusion

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: timeline

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* rst migration: troubleshooting

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* Changes requested in review 1

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>

* Changes requested in review 2

Signed-off-by: Igor Wilbert <igor.wilbert@gmail.com>
Signed-off-by: Sihan Zeng <zsh@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.