Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Caffe2] Enabling AMD GPU Backend for Caffe2 #7566

Merged
merged 61 commits into from
May 23, 2018

Conversation

petrex
Copy link
Contributor

@petrex petrex commented May 15, 2018

The goal of this PR is to enable AMD GPU backend for Caffe2.

Major changes include :

  • Add AMD GPU device to protocol buffer
  • Makefile scaffolding for AMD software stack (ROCM)
  • Implement Caffe2 core/test for AMD GPU backend

@bddppq bddppq self-requested a review May 15, 2018 06:18
@ezyang
Copy link
Contributor

ezyang commented May 15, 2018

CC @bddppq @Jorghi12

@Jorghi12 Jorghi12 self-requested a review May 15, 2018 20:40
@@ -59,6 +59,7 @@ cmake_dependent_option(
USE_GLOO "Use Gloo" ON
"BUILD_CAFFE2" OFF)
option(USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed support" OFF) # New option
option(USE_HIP "Use HIP" ON)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@dzhulgakov dzhulgakov requested a review from ajtulloch May 16, 2018 07:02
FIND_PACKAGE(HIP 1.0 REQUIRED)
FIND_PACKAGE(HIP 1.0)

IF(HIP_FOUND)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -1,42 +1,86 @@
set(PYTORCH_FOUND_HIP FALSE)

This comment was marked as off-topic.

This comment was marked as off-topic.

set(Caffe2_HIP_INCLUDES
${hip_INCLUDE_DIRS} ${rocrand_INCLUDE_DIRS} ${hiprand_INCLUDE_DIRS} ${rocblas_INCLUDE_DIRS} ${miopen_INCLUDE_DIRS} ${Caffe2_HIP_INCLUDES} ${thrust_INCLUDE_DIRS})
set(Caffe2_HIP_DEPENDENCY_LIBS
${rocrand_LIBRARIES} ${hiprand_LIBRARIES} ${PYTORCH_HIP_HCC_LIBRARIES} ${PYTORCH_MIOPEN_LIBRARIES})

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -121,6 +121,10 @@ ENDIF()
# Find the HIP package, set the HIP paths, load the HIP CMake.
IF(WITH_ROCM)
include(LoadHIP)
if (NOT PYTORCH_FOUND_HIP)

This comment was marked as off-topic.

This comment was marked as off-topic.

bddppq and others added 2 commits May 21, 2018 12:12
…e2_core_hip

* 'caffe2_core_hip' of github.com:petrex/pytorch: (40 commits)
  [auto] Update onnx to 52f7528 - add more shape inference tests (onnx/onnx#971) onnx/onnx@52f7528
  JIT cleanup (pytorch#7631)
  fix to build sleef when using cmake 3.11.1 (pytorch#7679)
  Fix typo in document (pytorch#7725)
  [auto] Update onnx to 6f4b1b1 - Tests for Gemm operator (onnx/onnx#885) onnx/onnx@6f4b1b1
  [auto] Update onnx to c6c6aad - Enhance the 1-element broadcast case (onnx/onnx#902) onnx/onnx@c6c6aad
  serialization for torch.device (pytorch#7713)
  Fix compile flags for MSVC (pytorch#7703)
  Fix exporting Sum to onnx (pytorch#7685)
  Renanme ZFNet to ZFNet512 (pytorch#7723)
  Implement __reduce__ for torch.dtype (pytorch#7699)
  Remove unnecessary include in vec256_float.h (pytorch#7711)
  Update from facebook (pytorch#7696)
  fix for cuda 9.2 builds (pytorch#7709)
  make BatchSampler subclass of Sampler, and expose (pytorch#7707)
  Dont emit warning for ABI incompatibility when PyTorch was built from source (pytorch#7681)
  remove index from python bindings (fixes: pytorch#7639) (pytorch#7690)
  Update _torch_docs.py (pytorch#7700)
  Fix the wrong usage of environment variables detection in cmake
  Changes from D7881937 and D7963936 plus an edit (pytorch#7605)
  ...
@bddppq
Copy link
Contributor

bddppq commented May 22, 2018

@Jorghi12 Do my explanations make senses to you?

@soumith @ezyang Since I have changed two cmake files outside of the caffe2 subdirectories, I need you guys' stamp.

@ezyang
Copy link
Contributor

ezyang commented May 23, 2018

If we're working around a bug in the upstream HIP files, we should say so in the code that is implementing the workaround, so that when HIP fixes their cmake we know what to eliminate.

@bddppq
Copy link
Contributor

bddppq commented May 23, 2018

@ezyang @Jorghi12 Ok let me explain here again, PYTORCH_HIP_HCC_LIBRARIES and PYTORCH_MIOPEN_LIBRARIES are workaround upstream cmake files bug and I do have put two TODO comments at the bottom of cmake/public/LoadHIP.cmake with explanations. PYTORCH_FOUND_HIP is not a workaround, it's because we have extra logic (and there will be more in the future) on top of the native find HIP, so it's worth to have its own name.

Copy link
Member

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamping approval

@bddppq
Copy link
Contributor

bddppq commented May 23, 2018

@petrex Let's first get this initial version in so we can parallel the work of polishing the core and adding hip ops

…e2_core_hip

* 'caffe2_core_hip' of github.com:petrex/pytorch: (24 commits)
  Allow empty storage for the 'Edge' class. (pytorch#7595)
  Process group base class and Gloo implementation (pytorch#7628)
  _LRSchedulers getstate include optimizer info (pytorch#7757)
  [PyTorch] [gradcheck] change backward() to grad() (pytorch#7710)
  Update test_nn.py (pytorch#7787)
  Define general default scheduler for TBB and fix ppc64le bug (pytorch#7761)
  Add support for accepting Tensor as input in clip_grad_*  functions. (pytorch#7769)
  [Easy] Remove unused code (pytorch#7782)
  Update tbb (pytorch#7734)
  Add @generated annotation (pytorch#7780)
  fix legacy comment after variable tensor merge (pytorch#7771)
  Revert pytorch#7750 and pytorch#7762 to fix Windows CI on master (pytorch#7772)
  Temporarily disable build env check (pytorch#7768)
  Add missing brace (pytorch#7762)
  [C++ API] Add backward() to Tensor and Variable  (pytorch#7750)
  [auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005) onnx/onnx@d43b550
  [auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675) onnx/onnx@ea1aa13
  include cudnn_h (pytorch#7749)
  [C++ API] Using new registration mechanism (pytorch#7663)
  [auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000) onnx/onnx@5dd68e6
  ...
@petrex
Copy link
Contributor Author

petrex commented May 23, 2018

@bddppq Just reverted change for the operators. Let's keep this PR for Caffe2 core and CI only.

@bddppq bddppq merged commit 2ebcf4b into pytorch:master May 23, 2018
bddppq added a commit that referenced this pull request May 24, 2018
orionr pushed a commit that referenced this pull request May 24, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679da.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e7.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4b.
petrex added a commit to petrex/pytorch that referenced this pull request May 31, 2018
* origin:
  [Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566)
  Call grad_mode.py context managers as decorators (pytorch#7737)
  catch CPU tensors in checkSameGPU (fixes pytorch#7689) (pytorch#7767)
  Mark stack as non-executable in NNPACK (pytorch#7752)
  small fixes in fusion_compiler (pytorch#7776)
  Run clang-format on c10d (pytorch#7791)
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4.
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679da.

* Revert "Add BiasCHW fallback for GPU (pytorch#7738)"

This reverts commit 14ad2e7.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566)"

This reverts commit 2ebcf4b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants