Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details #1632

jeffdaily · 2020-01-03T22:14:35Z

add ROCm and ROCm TensorFlow support to setup.py
Renamed symbols and files
- CUDA -> GPU
- Cuda -> Gpu
- cuda -> gpu
- HAVE_CUDA -> HAVE_GPU --- iff generically referring to any GPU device
CUDAContext becomes GPUContext and uses pimpl pattern to decouple
CUDA and HIP implementations of GPU interface.
New GPUContext methods
- StreamCreate
- StreamSynchronize
- GetDevice, SetDevice
- MemcpyAsyncD2D, MemcpyAsyncH2D, MemcpyAsyncD2H

Signed-off-by: Jeff Daily jeff.daily@amd.com

jeffdaily · 2020-01-07T20:30:33Z

It's not clear to me whether the current test failures are related to the changes in this PR.

tgaddair · 2020-01-08T16:56:41Z

Hey @jeffdaily, thanks for the new PR. Overall, I like the approach to abstracting GPU operations in this way.

One thing I noticed is it looks like the DDL codepaths haven't been updated to use the new gpu_operations.h. Those aren't currently under unit test, but the changes should be fairly straightforward.

There was also a PR landed yesterday that added NCCL broadcast support which will require a rebase.

For the unit tests, it looks like at least one test was failing because the Horovod MXNet plugin failed to compile, so we may need to dig into why that is. I'm rerunning some of the other tests to see if it was a transient error.

I'll try and put together inline feedback as well. I think it would be good to have @nvcastet and @romerojosh take a look at this PR if they have the bandwidth.

jeffdaily · 2020-01-08T17:56:50Z

Rebased to pick up NCCL broadcast change. Also updated the DDL sources, though I did not test them.

nvcastet

This PR looks good to me.

horovod/common/ops/gpu_operations.h

jeffdaily · 2020-01-10T22:22:36Z

@tgaddair are the last two test failures transient or do they indicate a real issue? Also, this PR currently does not integrate any kind of CI build for ROCm yet. What is your acceptance criteria?

tgaddair · 2020-01-11T03:04:20Z

@jeffdaily one of the errors appears to be real:
https://buildkite.com/horovod/horovod/builds/1844#c24a6237-3f92-4cc5-9310-bcf714008444

The MXNet test for mixed install is consistently failing because the Horovod MXNet plugin did not compile successfully.

Once we figure that out, I think we should be safe to land.

jeffdaily · 2020-01-13T19:17:19Z

@tgaddair I tried building the docker container for the mixed install so I could see how horovod is failing to build the mxnet extension, and I'm seeing the following error:

  CMake Error: The source directory "/tmp/pip-req-build-7pquo6gf/third_party/gloo" does not exist.
  Specify --help for usage, or press the help button on the CMake GUI.
  INFO: Unable to build MXNet plugin, will skip it.
...
  error: None of TensorFlow, PyTorch, or MXNet plugins were built. See errors above.
  Building wheel for horovod (setup.py): finished with status 'error'
  ERROR: Failed building wheel for horovod

jeffdaily · 2020-01-13T20:26:13Z

@tgaddair disregard previous comment concerning the dockerfile build. I forgot to fetch submodules prior to attempting the dockerfile build.

I was able to reproduce the mxnet build failure. Fix will be ready shortly.

jeffdaily · 2020-01-13T23:00:27Z

@tgaddair fixed the MXNet test for mixed install. Other errors have popped up now that were passing in the previous attempt.

tgaddair · 2020-01-15T18:23:03Z

Hey @jeffdaily, looks like test failures may be transient. Rerunning them now.

horovod/common/ops/adasum_gpu_operations.cc

horovod/common/ops/nccl_operations.cc

tgaddair · 2020-01-15T19:31:56Z

Hey @jeffdaily, took a closer look over the PR and discussed it with some other Horovod contributors. Overall, I think the abstraction layer is good.

The one thing I think we should still address is the possibility for the ROCm/RCCL and CUDA/NCCL APIs to diverge in the future. With the pimpl pattern, we're locked in to the API exposed by the GPUContext. In this design how would a contributor add functionality specific to one of these frameworks but not the other?

Also, are you familiar with / can comment on the current state of support for GPU MPI with AMD hardware?

jeffdaily · 2020-01-16T22:34:35Z

I am rather at a loss to suggest how to address a future ROCm/RCCL and CUDA/NCCL API divergence. With respect to HIP (ROCm) versus CUDA APIs, parity is intended by design. I am not privy to the CUDA development roadmap, so I can't predict new or deprecated APIs. Are there any plans for Horovod to use less common CUDA APIs?

Concerning RCCL versus NCCL, again parity is by design in order to make it easier to support AMD hardware through the same interface.

I think if there ever exists some CUDA or NCCL feature that isn't reflected by HIP or RCCL, we have the option of separating implementation details within the same source file using #if HAVE_ROCM, or a source file that is compiled for CUDA/NCCL only. It would need to be documented as Horovod feature X is not supported for ROCm builds.

GPUDirect, or ROCmRDMA, is very well supported with AMD hardware.

jeffdaily · 2020-01-16T23:17:48Z

Rebased again to resolve conflicts from d4d54bc

setup.py

tgaddair · 2020-01-17T01:02:54Z

@jeffdaily I'm fine punting on the API divergence thing for now. I think the right longterm solution is to create separate NCCL vs RCCL derived classes with separate CUDA and ROCM implementations of the GPUContext. But I think perhaps we can cross that bridge when we come to it, or doing it in a followup PR.

nvcastet · 2020-01-17T14:11:18Z

@jeffdaily @tgaddair Using derived classes would also cause issues because even if we use the base type through the code, we will still need to construct the object with the derived type and to avoid compilation errors, we would need have to have a bunch of "ifdef" which is not really clean code in modern C++. Were you thinking of another way to do that?
OOP polymorphism seems to work better when everything can get built and also in this case, both APIs will not be used in the same build.
The C++11 version of the pimpl pattern seems to be a good fit.
With any interface/encapsulation design, we have will have the potentiality of divergence in implementations. If one implementation does not support a feature, we can always document it and throw a not_implemented exception printing a clear error message.
Is anyone thinking of a better pattern? @jeffdaily is pimpl the main design pattern AMD has been using in open source projects where a single GPU API existed? I would assume so for C++ code.

jeffdaily · 2020-01-17T15:38:28Z

@nvcastet unfortunately, there is not one main design pattern for supporting AMD hardware in open source projects.

As one might expect, long-established open source projects tend to resist adding HIP APIs, and certainly not fully converting code bases to use HIP alone even though HIP can compile to CUDA. That said, I've been on the teams to support TensorFlow, PyTorch, and now Horovod -- and each project maintainer has dictated a different approach.

TensorFlow already had a device abstraction layer to differentiate between CPU and GPU devices. The GPU was itself further abstracted as a stream executor. That was all done prior to our involvement, but it made the most sense to integrate our ROCm stack as an implementation of the stream executor abstract interface. Even so, there are plenty of cases where APIs or device features are not the same. For example, CUDNN has either different behavior or a different API interface compared to our MIOpen, and so such code must be protected with #if TENSORFLOW_USE_ROCM. In summary, TensorFlow uses both this preprocessor macro as well as separately compiled sources even though attempts were made to use abstract interfaces for the GPU device and libraries (BLAS, CUDNN, etc).

PyTorch has mandated that the code base is "hipified" during the build, so a"hipify" python script was developed and maintained as part of the PyTorch code base. There are very few pure-HIP source files; the rest of the PyTorch source looks like any other CUDA-based project. But, macros can still be used to project CUDA-only code and features. We have further been mandated to use this hipify script for building all ROCm PyTorch extensions. Those changes are prepared in a separate branch that will become a new PR if this current PR is accepted.

tgaddair · 2020-01-17T16:41:51Z

@nvcastet Currently we use preprocessor macros to determine which set of ops to use here. Otherwise, the code is isolated into its own classes / files by framework so it's easier for developers to reason about the code paths.

I think that if the APIs ever diverged, we would necessarily need to either do something similar, or mix the preprocessor macros into the nccl_operations.cc file, the latter of which I think is less ideal from a maintainability standpoint.

So long as the APIs are identical, I agree the pimpl pattern is cleaner. And since there isn't any indication that they will diverge at the moment, I also agree it's the better solution for now.

- add ROCm and ROCm TensorFlow support to setup.py - Renamed symbols and files - CUDA -> GPU - Cuda -> Gpu - cuda -> gpu - HAVE_CUDA -> HAVE_GPU --- iff referring generically to any GPU - CUDAContext becomes GPUContext and uses pimpl idiom to decouple CUDA and HIP implementations of GPU interface. - New GPUContext methods - StreamCreate - StreamSynchronize - GetDevice, SetDevice - MemcpyAsyncD2D, MemcpyAsyncH2D, MemcpyAsyncD2H Signed-off-by: Jeff Daily <jeff.daily@amd.com>

tgaddair

LGTM, thanks @jeffdaily! Will land once tests pass.

…tion from PR horovod#1632 Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

jeffdaily mentioned this pull request Jan 3, 2020

[ROCm] initial ROCm support #1546

Closed

jeffdaily force-pushed the hip-pimpl branch 3 times, most recently from ec5614e to 72d0c4a Compare January 7, 2020 19:12

tgaddair requested a review from romerojosh January 8, 2020 16:56

jeffdaily force-pushed the hip-pimpl branch from 72d0c4a to 0b875f2 Compare January 8, 2020 17:53

jeffdaily force-pushed the hip-pimpl branch from a0253e6 to ba2ec5d Compare January 10, 2020 20:14

nvcastet approved these changes Jan 10, 2020

View reviewed changes

horovod/common/ops/gpu_operations.h Show resolved Hide resolved

jeffdaily force-pushed the hip-pimpl branch from ba2ec5d to 76dbc41 Compare January 13, 2020 21:16

tgaddair reviewed Jan 15, 2020

View reviewed changes

horovod/common/ops/adasum_gpu_operations.cc Outdated Show resolved Hide resolved

tgaddair reviewed Jan 15, 2020

View reviewed changes

horovod/common/ops/nccl_operations.cc Outdated Show resolved Hide resolved

jeffdaily force-pushed the hip-pimpl branch from 76dbc41 to e8c04c0 Compare January 16, 2020 20:47

jeffdaily force-pushed the hip-pimpl branch from e8c04c0 to f5bff34 Compare January 16, 2020 23:17

tgaddair reviewed Jan 17, 2020

View reviewed changes

setup.py Outdated Show resolved Hide resolved

jeffdaily force-pushed the hip-pimpl branch from f5bff34 to d0c0c10 Compare January 20, 2020 16:18

tgaddair approved these changes Jan 20, 2020

View reviewed changes

tgaddair changed the title ~~rename CUDA to GPU et al, pimpl for GPU details~~ Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details Jan 20, 2020

tgaddair merged commit a385b64 into horovod:master Jan 20, 2020

jeffdaily mentioned this pull request Jan 20, 2020

Any doc/guide for the installation of hovorod for tf-rocm? ROCm/tensorflow-upstream#702

Closed

maxhgerlach added a commit to maxhgerlach/horovod that referenced this pull request Dec 2, 2021

Update reducescatter GPU operations according to CUDA/ROCM generaliza…

d091a77

…tion from PR horovod#1632 Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details #1632

Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details #1632

jeffdaily commented Jan 3, 2020 •

edited

jeffdaily commented Jan 7, 2020

tgaddair commented Jan 8, 2020

jeffdaily commented Jan 8, 2020

nvcastet left a comment

jeffdaily commented Jan 10, 2020

tgaddair commented Jan 11, 2020

jeffdaily commented Jan 13, 2020

jeffdaily commented Jan 13, 2020

jeffdaily commented Jan 13, 2020

tgaddair commented Jan 15, 2020

tgaddair commented Jan 15, 2020 •

edited

jeffdaily commented Jan 16, 2020

jeffdaily commented Jan 16, 2020

tgaddair commented Jan 17, 2020

nvcastet commented Jan 17, 2020 •

edited

jeffdaily commented Jan 17, 2020

tgaddair commented Jan 17, 2020

tgaddair left a comment

Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details #1632

Initial support for ROCm: rename CUDA to GPU et al, pimpl for GPU details #1632

Conversation

jeffdaily commented Jan 3, 2020 • edited

jeffdaily commented Jan 7, 2020

tgaddair commented Jan 8, 2020

jeffdaily commented Jan 8, 2020

nvcastet left a comment

Choose a reason for hiding this comment

jeffdaily commented Jan 10, 2020

tgaddair commented Jan 11, 2020

jeffdaily commented Jan 13, 2020

jeffdaily commented Jan 13, 2020

jeffdaily commented Jan 13, 2020

tgaddair commented Jan 15, 2020

tgaddair commented Jan 15, 2020 • edited

jeffdaily commented Jan 16, 2020

jeffdaily commented Jan 16, 2020

tgaddair commented Jan 17, 2020

nvcastet commented Jan 17, 2020 • edited

jeffdaily commented Jan 17, 2020

tgaddair commented Jan 17, 2020

tgaddair left a comment

Choose a reason for hiding this comment

jeffdaily commented Jan 3, 2020 •

edited

tgaddair commented Jan 15, 2020 •

edited

nvcastet commented Jan 17, 2020 •

edited