Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HOROVOD_GPU_OPERATIONS installation variable #1960

Merged
merged 7 commits into from May 18, 2020
Merged

Conversation

tgaddair
Copy link
Collaborator

This simplifies enabling NCCL support for all GPU operations.

Signed-off-by: Travis Addair taddair@uber.com

…ng NCCL support for all GPU operations

Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
setup.py Outdated Show resolved Hide resolved
setup.py Outdated
if gpu_operations and gpu_operation not in options:
options_str = ', '.join(options)
raise DistutilsError(f'{op_variable_name}={gpu_operation} is invalid, '
f'supported values are: {{{options_str}}}.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why braces in supported values are: {NCCL, MPI}?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind I was thinking set notation, but removed since it's unclear.

Signed-off-by: Travis Addair <taddair@uber.com>
setup.py Outdated Show resolved Hide resolved
Signed-off-by: Travis Addair <taddair@uber.com>
Copy link
Collaborator

@nvcastet nvcastet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, that simplifies a lot the build command line.


The full documentation and an API reference are published at https://horovod.readthedocs.io/en/latest.
- `stable <https://horovod.readthedocs.io/en/stable>`_
- `master <https://horovod.readthedocs.io/en/latest>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to call this "latest"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just have the 1 link (not sure when someone would need to see the /stable version)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, I think we actually want to direct users to /stable first (stable is the latest released version, correct?). In a lot of cases, users get confused because they pip install horovod, but the docs do not match with the version they have installed (because latest is always the unreleased master branch).

Ideally, I think it would be good if we could have specific versioned sets of documentation like /0.19.2 to make it more clear for users. For now, stable seems to be the closest proxy to that, but the name isn't very informative. Maybe "released" or "production".

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, definitely want to point to the right version to reduce that confusion. Looking into how it works, I think /stable is the recommended version, as it's kept up to date with the highest release version (for tagged releases): https://docs.readthedocs.io/en/stable/versions.html

"Latest release" could work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Changed to "Latest Release" instead.

Copy link
Collaborator

@EnricoMi EnricoMi May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, latest is a synonym for stable or latest release, master is a synonym for git head or latest source. Ideally, documentation of each released version should be available at horovod.readthedocs.io. master is a very versioning-specific term.

@@ -0,0 +1,10 @@
55,58d54
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be included?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because the summary page is already part of Read The Docs, so it seems it would be kind of circular. Do you feel it makes sense to include it there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that there is the automatic check to have the README and Summary page stay in sync (https://github.com/horovod/horovod/blob/master/docs/summary.rst.patch)

I agree that this content would be an exception and wouldn't need to be exactly the same between the two pages, but think the automatic sync check is useful in general, so maybe best to leave it in for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, definitely want to keep the check. This just adds these lines as exceptions.

Signed-off-by: Travis Addair <taddair@uber.com>
@tgaddair tgaddair merged commit ffd9310 into master May 18, 2020
@tgaddair tgaddair deleted the nccl-fix branch May 18, 2020 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants