Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix develop/editable install mode #3074

Merged
merged 6 commits into from Aug 6, 2021
Merged

Conversation

nvcastet
Copy link
Collaborator

@nvcastet nvcastet commented Aug 3, 2021

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Fixes develop/editable mode to support direct changes in the source code and incremental C++/CUDA builds.

Note: On the first rebuild, I have noticed the CUDA code being rebuilt when not modified. It is potentially an issue with the add_cuda_library cmake code that generates the dependency file. On the second rebuild, the issue disappears.
This issue may disappear with using cudatoolkit (newer version of cmake) instead of cuda cmake module. Something we could try.
@romerojosh FYI ^^
Fixes #1520.

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>

.. code-block:: bash

$ python setup.py develop -u
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this relate to pip install .[dev,test] above? Should we mention either or both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EnricoMi this uninstall method is when you install the package in dev mode -e since the standard pip uninstall does not work in that case.

The part about running pip install .[dev,test] is a bit redundant. Should we just remove it and modify line 56 with:

HOROVOD_WITH_PYTORCH=1 HOROVOD_WITH_TENSORFLOW=1 pip install -v -e .[dev,test]

So when installing in dev mode, it checks for dev and test dependencies?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread your comment in the changes, python setup.py develop -u is to uninstall an earlier dev installation. What is wrong about pip uninstall horovod?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [dev] adds a specific version for all ML frameworks to the dependencies and [test] adds test-specific dependencies as well. So either users install the frameworks in the desired version or simply use [dev] to get recommended versions.

Copy link
Collaborator Author

@nvcastet nvcastet Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip uninstall horovod did not work with the editable mode. But i just tried it using a virtual environment (instead of just the system environment), and it works. I will update the doc since we recommend a venv.
I can mention the dependencies and remove the first call to pip install.

@github-actions
Copy link

github-actions bot commented Aug 3, 2021

Unit Test Results

     703 files  ±0       703 suites  ±0   6h 13m 52s ⏱️ ±0s
     652 tests ±0       610 ✔️ ±0       42 💤 ±0  0 ❌ ±0 
14 867 runs  ±0  11 720 ✔️ ±0  3 147 💤 ±0  0 ❌ ±0 

Results for commit cf9f5dd. ± Comparison against base commit cf9f5dd.

♻️ This comment has been updated with latest results.

Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this. This is going to remove a lot of time from the edit-build-debug loop during development.

Default (release) builds went fine for me, but I faced some trouble trying to get an editable debug install.

docs/contributors.rst Outdated Show resolved Hide resolved
docs/contributors.rst Outdated Show resolved Hide resolved
Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

docs/contributors.rst Show resolved Hide resolved

.. code-block:: bash
Set ``HOROVOD_GPU_OPERATIONS=NCCL`` to run on GPUs with NCCL.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't reference any env var other than HOROVOD_WITH[OUT]_[FRAMEWORK] and HOROVOD_DEBUG here, because there are so many. Rather than mentioning the HOROVOD_GPU_OPERATIONS env var here, what about adding a link to https://github.com/horovod/horovod/blob/master/docs/install.rst#environment-variables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that.

Signed-off-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
@nvcastet nvcastet requested a review from EnricoMi August 6, 2021 15:10
@@ -226,6 +226,7 @@ Optional environment variables that can be set to configure the installation pro

Possible values are given in curly brackets: {}.

* ``HOROVOD_DEBUG`` - {1}. Install a debug build of Horovod with checked assertions, disabled compiler optimizations etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the {1}. a copy-paste artefact?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EnricoMi No, see the line above and the other environmental variables. This is the possible value for the variable.


In develop mode, you can edit the Horovod source directly in the repo folder. For Python code, the changes will take effect
immediately. For **C++/CUDA code**, the ``... pip install -v -e .`` command needs to be invoked again to perform an incremental build.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't pip skip the installation saying it is already installed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for dev mode.

@nvcastet nvcastet requested a review from EnricoMi August 6, 2021 15:32
Copy link
Collaborator

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nvcastet nvcastet merged commit cf9f5dd into horovod:master Aug 6, 2021
@nvcastet nvcastet deleted the dev_mode branch August 6, 2021 18:43
@github-actions
Copy link

github-actions bot commented Aug 7, 2021

Unit Test Results (with flaky tests)

     751 files  ±0       751 suites  ±0   6h 21m 55s ⏱️ ±0s
     652 tests ±0       610 ✔️ ±0       42 💤 ±0  0 ❌ ±0 
15 971 runs  ±0  12 506 ✔️ ±0  3 465 💤 ±0  0 ❌ ±0 

Results for commit cf9f5dd. ± Comparison against base commit cf9f5dd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Incremental build support
3 participants