New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HOROVOD_GPU_OPERATIONS installation variable #1960
Conversation
…ng NCCL support for all GPU operations Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
setup.py
Outdated
if gpu_operations and gpu_operation not in options: | ||
options_str = ', '.join(options) | ||
raise DistutilsError(f'{op_variable_name}={gpu_operation} is invalid, ' | ||
f'supported values are: {{{options_str}}}.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why braces in supported values are: {NCCL, MPI}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind I was thinking set notation, but removed since it's unclear.
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, that simplifies a lot the build command line.
|
||
The full documentation and an API reference are published at https://horovod.readthedocs.io/en/latest. | ||
- `stable <https://horovod.readthedocs.io/en/stable>`_ | ||
- `master <https://horovod.readthedocs.io/en/latest>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to call this "latest"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just have the 1 link (not sure when someone would need to see the /stable version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, I think we actually want to direct users to /stable first (stable is the latest released version, correct?). In a lot of cases, users get confused because they pip install horovod
, but the docs do not match with the version they have installed (because latest is always the unreleased master branch).
Ideally, I think it would be good if we could have specific versioned sets of documentation like /0.19.2 to make it more clear for users. For now, stable seems to be the closest proxy to that, but the name isn't very informative. Maybe "released" or "production".
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, definitely want to point to the right version to reduce that confusion. Looking into how it works, I think /stable is the recommended version, as it's kept up to date with the highest release version (for tagged releases): https://docs.readthedocs.io/en/stable/versions.html
"Latest release" could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Changed to "Latest Release" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, latest is a synonym for stable or latest release, master is a synonym for git head or latest source. Ideally, documentation of each released version should be available at horovod.readthedocs.io. master is a very versioning-specific term.
docs/summary.rst.patch
Outdated
@@ -0,0 +1,10 @@ | |||
55,58d54 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because the summary page is already part of Read The Docs, so it seems it would be kind of circular. Do you feel it makes sense to include it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that there is the automatic check to have the README and Summary page stay in sync (https://github.com/horovod/horovod/blob/master/docs/summary.rst.patch)
I agree that this content would be an exception and wouldn't need to be exactly the same between the two pages, but think the automatic sync check is useful in general, so maybe best to leave it in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, definitely want to keep the check. This just adds these lines as exceptions.
Signed-off-by: Travis Addair <taddair@uber.com>
This simplifies enabling NCCL support for all GPU operations.
Signed-off-by: Travis Addair taddair@uber.com