[ROCm] Added support for pytorch extensions to use HIP #32669

ashishfarmer · 2020-01-27T22:55:43Z

This pull request has changes for:

Enabling a torch module with HIP code to be compiled by cpp_extensions.py
Fixes for hipify module to be able to be used by a torch extension

cc: @ezyang @iotamudelta @jeffdaily

This fixes runtime errors of _join_rocm_home() not being defined before it is used.

kostmo · 2020-01-27T23:22:04Z

💊 CircleCI build failures summary and remediations

As of commit c994da2:

Commit c994da2 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 25 times.

ezyang · 2020-01-28T01:45:49Z

@zou3519 Do you want to review this? You made major changes to cpp extensions recently.

t-vi · 2020-01-28T08:20:46Z

torch/utils/cpp_extension.py

-CUDA_HOME = _find_cuda_home()
-CUDNN_HOME = os.environ.get('CUDNN_HOME') or os.environ.get('CUDNN_PATH')
+ROCM_HOME = _find_rocm_home()
+CUDA_HOME = (ROCM_HOME if ROCM_HOME else _find_cuda_home())


Can we please use torch.version.hip to check if we want to use ROCm?

t-vi · 2020-01-28T10:25:32Z

Awesome! I think the detection when to use HIP instead of CUDA needs fixing but other than that it looks reasonable at first sight.

zou3519 · 2020-01-28T15:45:30Z

@zou3519 Do you want to review this? You made major changes to cpp extensions recently.

Sure, I can review this.

At a glance I think this change is probably going to conflict a lot with #32495

zou3519

We should try to add some tests if possible

torch/utils/cpp_extension.py

zou3519 · 2020-01-28T15:49:11Z

torch/utils/cpp_extension.py

                                                  "'-fPIC'"] + cflags + _get_cuda_arch_flags(cflags)
+                elif ROCM_HOME:


If possible we should enable some tests in test_cpp_extension to run in our ROCM CI (I assume they don't run right now).

@zou3519 - I just removed test_cpp_extensions_aot_no_ninja from ROCM_BLACKLIST. However, the CI still skipped it. Is there anywhere else as well I should change?

I don't know how the mechanism works. @iotamudelta do you know how we can trigger the rocm tests on this PR?

This is a deficiency (well, probably on purpose, but not good here) in the definition exclude_test that causes prefixes in the blacklist to match:

pytorch/test/run_test.py

Line 404 in 44af8ee

if test.startswith(exclude_test):

A quick fix could be to allow the blacklist to move to either move to regular expressions or allow "foo$" manually to mean "=='foo'" instead of "startswith(foo)".
Or one could rename the tests, no_ninja to the normal one and suffix the ninja one...

@t-vi, That would be a change with implications throughout the pytorch test suite. Would you recommend not touching it for this pull request and look at it separately?

Upon further inspection, I'm not so sure whether the prefix matching even is the intended behaviour (e.g. non-x64 windows blacklists both cpp_extension_aot and cpp_extension_aot_no_ninja).
I think that:

we really do want the test,

renaming the ninja-enabled test to _ninja that is the prefix is probably best and I would say it's legitimate to do it in this PR. Changing the overall behaviour less so.

Note that renaming the test also needs updating the windows blacklist etc.

Oh, so that's the problem. I agree with @t-vi that renaming the ninja-enabled test is the easiest thing to do for now. Long-term we should file an issue and figure out if the prefix matching behavior is intended behavior or not; afaict it isn't documented in the test runner.

Thank you for the direction. Renamed the tests, the CI run should now have cpp_extensions test enabled for ROCm

torch/utils/cpp_extension.py

zou3519 · 2020-01-28T15:56:50Z

I'm not done reading through this yet, but I think it might be easier if we merge #32495 (which is still WIP by me) in first because that does a refactor of how cpp extensions do building. I'm happy to help fix up this PR to match the style of the refactor after that.

torch/utils/cpp_extension.py

iotamudelta · 2020-02-18T20:18:29Z

I see that @t-vi kindly has already answered some of the questions. In general, the difference CUDA - HIP/ROCm is small as possible. This means the frontend is shared and most of the backend is shared as well.

Concerning testing: ROCm test targets are accessible via pytorchbot.

…ipext

test/run_test.py

zou3519 · 2020-02-20T15:55:03Z

test/test_cpp_extensions_aot.py

-    NOTE: run_test.py's test_cpp_extensions_aot_no_ninja target
-    also runs this test case, but with ninja disabled. If you are debugging
+    NOTE: run_test.py's test_cpp_extensions_aot_ninja target
+    also runs this test case, but with ninja enabled. If you are debugging


Ditto, this should read either neutral or read as if "test_cpp_extensions_aot (with ninja)" is the default. Developers of features for cpp extensions should run test_cpp_extensions_aot (with ninja) because it is faster

zou3519 · 2020-02-20T16:05:20Z

torch/utils/cpp_extension.py

+                        cflags = COMMON_HIPCC_FLAGS + cflags + _get_rocm_arch_flags(cflags)
+                    else:
+                        cflags = unix_cuda_flags(cflags)
+                elif is_hip_extension:


nit: I think this is slightly easier to read if we have the following:

if _is_cuda_file(src): ... elif isinstance(flags,dict): cflags = cflags['cxx'] if is_hip_extension: cflags = COMMON_HIPCC_FLAGS + cflags

so that we don't have to duplicate the isinstance(cflags, dict) check logic

Simplified the condition

torch/utils/cpp_extension.py

zou3519

The contents of this look good to me. I think we should rename the test files so that there is no "bias" toward which test (test_cpp_extension_aot vs test_cpp_extension_aot_ninja) is the "default" one.

…ipext

zou3519

One last thing, otherwise lgtm

zou3519 · 2020-02-20T22:39:15Z

torch/utils/cpp_extension.py

@@ -259,8 +308,8 @@ def build_extensions(self):
            self._define_torch_extension_name(extension)
            self._add_gnu_cpp_abi_flag(extension)

-        # Register .cu and .cuh as valid source extensions.
-        self.compiler.src_extensions += ['.cu', '.cuh']
+        # Register .cu, .cuh and .hip as valid source extensions.


(putting this here because github won't let me put it elsewhere)

We should toss a check for use_ninja to BuildExtension.__init__. If use_ninja is True and we are building a ROCm extension, we should error out gracefully so that ROCm users aren't confused about the extension not building. (I am not sure when we'll be able to build ROCm extensions with ninja; I know that's next on your list but this check is for just in case we don't get to that by the next release).

Thank you for the catch. Just added a fallback in the __init__ similar to the behavior when ninja is not present in the system

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-02-22T05:14:37Z

@zou3519 merged this pull request in 616beb1.

Summary: This pull request has changes for: 1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py 2. Fixes for hipify module to be able to be used by a torch extension cc: ezyang iotamudelta jeffdaily Pull Request resolved: pytorch#32669 Differential Revision: D20033893 Pulled By: zou3519 fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008

ashishfarmer and others added 13 commits January 27, 2020 12:17

initial commit for hipextensions

2cc54fa

remove cudart depencency for HIP sources

17ba6a1

add ROCM_HOME, _find_rocm_home(), _join_rocm_home()

c308913

revert conditional change to c++14 compile flag

14bdbaf

move definition of _join_rocm_home()

64de8d8

This fixes runtime errors of _join_rocm_home() not being defined before it is used.

fix error in _find_rocm_home()

b69be38

added flag to identify pytorch extensions

3979742

removed HIP_COMP

0e6af1d

add _get_rocm_arch_flags(), use platform macro

ad5e86e

fix cpp_extension.py when compiling for ROCm but cflags is a dict

ccf421c

move flag -fno-gpu-rdc from common to rocm arch flags

e70e162

fix libraries to link for hipext

4f47ab4

added compilation support for 908

899d91f

ashishfarmer requested review from ezyang, fmassa, goldsborough and soumith as code owners January 27, 2020 22:55

pytorchbot added the open source label Jan 27, 2020

iotamudelta mentioned this pull request Jan 27, 2020

[ROCm] Initial port of CUDAExtensions to HIP #22091

Closed

ezyang requested a review from zou3519 January 28, 2020 01:45

t-vi reviewed Jan 28, 2020

View reviewed changes

zou3519 requested changes Jan 28, 2020

View reviewed changes

jeffdaily mentioned this pull request Jan 28, 2020

(0) Invalid argument: Input to reshape is a tensor with 0 values, but the requested shape has 169344 ROCm/tensorflow-upstream#767

Closed

yf225 added module: rocm AMD GPU support for Pytorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 29, 2020

zou3519 reviewed Feb 18, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

zou3519 reviewed Feb 18, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

zou3519 reviewed Feb 18, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

ashish added 6 commits February 19, 2020 10:28

Merge branch 'master' of https://github.com/pytorch/pytorch into af/h…

d41916b

…ipext

removed unnecessary macro define list

7b1d922

added helper variable for better redability

852aff3

add .hip extension conditionally to valid extensions for device code

b0e3158

rename ninja enabled tests for cpp extensions

6fd145f

fixed the test_module name for ninja enabled cpp ext test

3c9e533

zou3519 reviewed Feb 20, 2020

View reviewed changes

test/run_test.py Show resolved Hide resolved

zou3519 reviewed Feb 20, 2020

View reviewed changes

torch/utils/cpp_extension.py Outdated Show resolved Hide resolved

zou3519 reviewed Feb 20, 2020

View reviewed changes

ashish added 4 commits February 20, 2020 09:21

rename tests to explictly say ninja or no_ninja

1dd959f

simplified the condition

f84f34d

capitalize global variable

1504ad7

Merge branch 'master' of https://github.com/pytorch/pytorch into af/h…

c7049fd

…ipext

zou3519 approved these changes Feb 20, 2020

View reviewed changes

ashish added 2 commits February 20, 2020 16:52

default use_ninja to false for HIP extensions

bb2ab25

fall back to distutils for HIP ext when use_ninja is true

c994da2

zou3519 approved these changes Feb 21, 2020

View reviewed changes

facebook-github-bot reviewed Feb 21, 2020

View reviewed changes

facebook-github-bot closed this in 616beb1 Feb 21, 2020

facebook-github-bot added the merged label Feb 22, 2020

mruberry added the Merged label Oct 28, 2020

		"'-fPIC'"] + cflags + _get_cuda_arch_flags(cflags)
		elif ROCM_HOME:

[ROCm] Added support for pytorch extensions to use HIP #32669

[ROCm] Added support for pytorch extensions to use HIP #32669

Uh oh!

Conversation

ashishfarmer commented Jan 27, 2020

Uh oh!

kostmo commented Jan 27, 2020 • edited by dr-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Uh oh!

ezyang commented Jan 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-vi commented Jan 28, 2020

Uh oh!

zou3519 commented Jan 28, 2020

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 commented Jan 28, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iotamudelta commented Feb 18, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 22, 2020

Uh oh!

Uh oh!

kostmo commented Jan 27, 2020 •

edited by dr-ci bot

Loading