Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rccl p2p variable #3654

Merged
merged 1 commit into from Aug 16, 2022
Merged

add rccl p2p variable #3654

merged 1 commit into from Aug 16, 2022

Conversation

fsx950223
Copy link
Contributor

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Fixes # (issue).

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

@maxhgerlach
Copy link
Collaborator

maxhgerlach commented Aug 15, 2022

Hi @fsx950223,

thanks for your contribution! Could you please sign off your commit for DCO? See comments in https://github.com/horovod/horovod/pull/3654/checks?check_run_id=7836563465

If this enables hvd.alltoall() for you, we should be able to merge it quickly. As it stands, we do not have any automated tests with non-Nvida GPUs.

Signed-off-by: FuAsZero <Gary.Fu@amd.com>
Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I am going to merge immediately.

@maxhgerlach maxhgerlach merged commit ed32fdc into horovod:master Aug 16, 2022
@romerojosh
Copy link
Collaborator

@fsx950223 Do all versions of RCCL support P2P? Shouldn't rccl.h define some equivalent to NCCL_P2P_SUPPORTED that we can depend on rather than defining this directly within Horovod?

@maxhgerlach
Copy link
Collaborator

For context: NCCL_P2P_SUPPORTED is defined in Horovod's nccl_operations.h:

#if HAVE_CUDA
#include <nccl.h>
#if NCCL_VERSION_CODE >= NCCL_VERSION(2, 7, 0)
#define NCCL_P2P_SUPPORTED
#endif
#elif HAVE_ROCM
// ..

We could add a similar version check for RCCL if Horovod should be compatible with older versions without P2P support.

@romerojosh
Copy link
Collaborator

Ah that is right, NCCL_P2P_SUPPORTED is our own define, just under a version check. I agree that a similar version check should be added for RCCL assuming there are versions that don’t support P2P operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants