Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor operations into separate components by framework #826

Merged
merged 15 commits into from Mar 9, 2019

Conversation

4 participants
@tgaddair
Copy link
Collaborator

tgaddair commented Feb 13, 2019

This change pulls out the various framework-specific preprocessor conditional logic into a class hierarchy that selects the appropriate operations (CUDA, NCCL, DDL, Hierarchical) based on available libraries and parameter selections.

Additionally, it abstracts much of the MPI-specific concepts into a CommunicationContext interface that can be substituted for a different framework for orchestration as desired. This PR still retains coupling between operations.cc and MPI, but subsequent changes in the roadmap will further decouple operations.cc to use the CommunicationContext interface.

@tgaddair tgaddair requested a review from alsrgv Feb 13, 2019

Show resolved Hide resolved horovod/common/message/mpi_message.cc Outdated
Show resolved Hide resolved horovod/common/ops/cuda_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/cuda_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/cuda_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/cuda_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/mpi_operations.h Outdated
Show resolved Hide resolved horovod/common/ops/nccl_operations.cc Outdated

@tgaddair tgaddair force-pushed the op_refactor branch 2 times, most recently from a45e2e8 to 8a6226e Feb 18, 2019

Show resolved Hide resolved horovod/common/operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/nccl_operations.cc Outdated
Show resolved Hide resolved horovod/common/message.cc Outdated
Show resolved Hide resolved horovod/common/ops/mpi_operations.h Outdated
Show resolved Hide resolved horovod/common/ops/mpi_operations.h Outdated
@alsrgv
Copy link
Collaborator

alsrgv left a comment

It may also help to introduce this change more gradually via a series of pull requests. E.g., first do the change to rename mpi_message.cc and its MPI* classes to message.cc, then do the change to replace #define XXX_CHECK and other macros with exception mechanism and/or functions, then try to extract certain complicated pieces (e.g. fusion buffering, offset computations for allgather, etc) into helper classes or functions. That way it's easier to check correctness along the way.

@tgaddair tgaddair force-pushed the op_refactor branch 3 times, most recently from e94c2b2 to 55be58a Feb 19, 2019

@alsrgv

This comment has been minimized.

Copy link
Collaborator

alsrgv commented Feb 23, 2019

cc @karakusc @romerojosh @apeforest @yuxihu, can you guys review this PR as well and provide feedback? We want to make it easier to contribute enhancements to Horovod core operations.

}
}

void MPI_Allgatherv(MPIContext& ctx, const void* sendbuf, int sendcount, DataType sendtype,

This comment has been minimized.

@alsrgv

alsrgv Feb 23, 2019

Collaborator

Add to utilities files instead, instead of *_operations.cc/h

This comment has been minimized.

@romerojosh

romerojosh Feb 25, 2019

On a related note, I find the MPI wrappers in this file to be a bit confusing generally. Are these wrappers intended to be used only in the files defining the various collective operation implementations, and not intended to be used to wrap MPI elsewhere (i.e. for coordination in operations.cc)? Additionally, leaving the wrapper names unmodified from the existing unwrapped MPI API calls leads to some confusion as well.

Could you clear up what the intent/expected use of these wrappers is?

This comment has been minimized.

@karakusc

karakusc Feb 26, 2019

Contributor

I agree about the confusion related the the MPI wrappers. What is the motivation for the wrappers, is it to clean up the op implementation from the MPI error handling logic? Or is it simply to standardize access to the MPI communicators through MPIContext?

I think if the wrappers are used, the exposed API should be identical to the underlying core MPI API (with maybe a standard modification on it, something like MPI_Bcast_(...) instead of MPI_Bcast(...) ), to make things more consistent.

This comment has been minimized.

@tgaddair

tgaddair Mar 2, 2019

Author Collaborator

I went ahead and removed the wrappers.

For context, the wrappers were originally added as an abstraction above MPI, in case we wanted to sub in another framework that supports similar collective primitives (for NCCL). However, the design evolved so that NCCL became coupled to MPI for now, so the wrappers no longer have much use beyond the convenience of not having to convert an error op code into an exception.

Given that they provide little benefit in the current design, and that most devs working in this codebase are accustomed to using MPI functions directly, I went ahead and removed them.

return; \
}
OperationManager* CreateOperationManager(HorovodGlobalState& state) {
// Order of these operations is very important. Operations will be checked sequentially from the first

This comment has been minimized.

@karakusc

karakusc Feb 26, 2019

Contributor

Would it make sense to have an explicit decision function to choose an implementation for each op dynamically? I feel like the sequential ordering makes the decision a bit implicit, and might make it harder to write more complex logic later, for example dynamically choosing an algorithm on a per-tensor basis.

This comment has been minimized.

@tgaddair

tgaddair Mar 2, 2019

Author Collaborator

Personally, I think the sequencing approach reduces the complexity (but perhaps I'm biased from writing so many behavior trees over the years).

There's some complication that arises from using an explicit rule, which is that some of the ops available at runtime will be dynamic (based on what frameworks you have installed: NCCL, DDL, etc.), while others will be available given some parameter state, and still others (as you mention) will be available on a per-tensor basis (which is available to the Enabled function via the Response parameter).

Using an explicit rule, the operation manager becomes very complicated with preprocessor code, needing to couple itself to what frameworks are installed (because certain ops will not exist at compile time). Using the sequential approach, all of that is isolated to a single function in operations.cc, which I think makes the separation between "what ops are available at compile time" and "what ops are available at runtime" more clear.

This comment has been minimized.

@karakusc

karakusc Mar 5, 2019

Contributor

Makes sense. Would it be easy to override the decision logic implied by this list if needed? Is it sufficient to call execute() for the desired op in OperationManager::ExecuteAllreduce, or would it be more complicated?

This comment has been minimized.

@tgaddair

tgaddair Mar 5, 2019

Author Collaborator

Yes, it should be a pretty simple change to enforce the use a certain operation under certain conditions. It would just require giving the OperationManager a direct pointer to the operation or having a map (right now, we don't directly reference the operations, only store them in the list).

Show resolved Hide resolved horovod/common/ops/mpi_operations.h Outdated
}
}

void MPI_Allgatherv(MPIContext& ctx, const void* sendbuf, int sendcount, DataType sendtype,

This comment has been minimized.

@karakusc

karakusc Feb 26, 2019

Contributor

I agree about the confusion related the the MPI wrappers. What is the motivation for the wrappers, is it to clean up the op implementation from the MPI error handling logic? Or is it simply to standardize access to the MPI communicators through MPIContext?

I think if the wrappers are used, the exposed API should be identical to the underlying core MPI API (with maybe a standard modification on it, something like MPI_Bcast_(...) instead of MPI_Bcast(...) ), to make things more consistent.

Show resolved Hide resolved horovod/common/message.h Outdated
Show resolved Hide resolved horovod/common/ops/collective_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/collective_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/cuda_operations.cc Outdated
Show resolved Hide resolved horovod/common/timeline.cc Outdated
Show resolved Hide resolved horovod/common/ops/mpi_operations.cc Outdated
Show resolved Hide resolved horovod/common/ops/nccl_operations.cc Outdated

tgaddair added some commits Feb 20, 2019

Refactored operations into separate components by framework
Signed-off-by: Travis Addair <taddair@uber.com>
Restored messages
Signed-off-by: Travis Addair <taddair@uber.com>
Removed channel concept
Signed-off-by: Travis Addair <taddair@uber.com>
Fixed docs
Signed-off-by: Travis Addair <taddair@uber.com>
Rearchitected ops to define their own execution strategy and only rel…
…y on help functions from parents

Signed-off-by: Travis Addair <taddair@uber.com>
Allgather and Broadcast refactor
Signed-off-by: Travis Addair <taddair@uber.com>
Removed MPI wrappers
Signed-off-by: Travis Addair <taddair@uber.com>
Moved MPIContext into separate file from operations
Signed-off-by: Travis Addair <taddair@uber.com>
Moved allgather fusion code into parent class
Signed-off-by: Travis Addair <taddair@uber.com>
Added comments explaining CUDA queue
Signed-off-by: Travis Addair <taddair@uber.com>
Addressed comments
Signed-off-by: Travis Addair <taddair@uber.com>
Put the MPI CUDA ops into mpi_cuda_operations
Signed-off-by: Travis Addair <taddair@uber.com>

@tgaddair tgaddair force-pushed the op_refactor branch from a55df08 to c85f1e6 Mar 7, 2019

@alsrgv alsrgv self-requested a review Mar 7, 2019

@alsrgv

alsrgv approved these changes Mar 7, 2019

Copy link
Collaborator

alsrgv left a comment

LGTM once tests pass

tgaddair added some commits Mar 8, 2019

Fixed bug in allgather
Signed-off-by: Travis Addair <taddair@uber.com>
Fixed NCCL
Signed-off-by: Travis Addair <taddair@uber.com>
Fixed DDL
Signed-off-by: Travis Addair <taddair@uber.com>

@tgaddair tgaddair merged commit aa605d6 into master Mar 9, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@tgaddair tgaddair deleted the op_refactor branch Mar 9, 2019

@alsrgv

This comment has been minimized.

Copy link
Collaborator

alsrgv commented Mar 9, 2019

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.