Add ability to enable/disable MIOpen at runtime #33118

jithunnair-amd · 2020-02-08T08:46:14Z

Set torch._C.has_cudnn to True for ROCm
Make MIOpen invocations respect value of cudnn_enabled or at::globalContext().userEnabledCuDNN()
torch/backends/cudnn/__init__.py: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff)

jithunnair-amd · 2020-02-08T08:47:53Z

ezyang · 2020-02-10T18:30:27Z

A high level question... unlike most of the hipified source code, miopen has bindings directly written (not generated by hipification). So, by analogy, shouldn't it have its own backends file, torch/backends/miopen?

iotamudelta · 2020-02-10T21:21:37Z

@ezyang thanks for looking at this. Also, I think this is a legitimate and very good question. As you know, the frontend in PyTorch (i.e., the Python layer) assumes cuda == rocm if on ROCm. Hence why we decided to have this also override the respective cuDNN functionality. Also, this will mean that scripts/functionality relying on the cudnn syntax will mostly just work TM) on ROCm.

Now, you are absolutely right that MIOpen's interface and feature set is somewhat different from cuDNN, so there is no perfect 1:1 mapping. I think this is also somewhat reflected in this PR, some things we support on ROCm, some things we don't (yet?).

Ultimately, I believe both choices (MIOpen == cuDNN or MIOpen != cuDNN) are valid from an engineering PoV and at least for me there is no clear winner.

So, what do you think? :-)

bddppq

I like the idea of allowing user to enable/disable miopen at runtime. And the interface for it should be cudnn, since at python level, we want user code to be able to "magically" switch to use AMD gpus without any code changes.

torch/backends/cudnn/__init__.py

jithunnair-amd · 2020-02-13T17:25:10Z

@bddppq Please address my responses to let me know which changes I need to make.

ezyang · 2020-02-14T04:43:05Z

I'm gonna let bddppq shepherd this one through

dr-ci · 2020-02-14T23:17:41Z

💊 CircleCI build failures summary and remediations

As of commit f4894cd:

1/1 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | pattern match details)

RuntimeError: test_jit_fuser failed!

 
---------------------------------------------------------------------- 
Ran 46 tests in 11.883s 
 
FAILED (errors=4, skipped=10) 
Traceback (most recent call last): 
  File "run_test.py", line 486, in <module> 
    main() 
  File "run_test.py", line 479, in main 
    raise RuntimeError(message) 
RuntimeError: test_jit_fuser failed! 
 
(base) circleci@PACKER-5E29F737 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 24 times.

torch/backends/cudnn/__init__.py

… cudnn APIs: they'll error out anyway when running on HIP/MIOpen build

…ng to stderr if user tries to set it to False and override value to True.

torch/csrc/Module.cpp

torch/backends/cudnn/__init__.py

bddppq

This looks good now. Thanks for your patience to address all the review comments.

facebook-github-bot

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

torch/csrc/Module.cpp

facebook-github-bot

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-02-20T20:50:18Z

@bddppq merged this pull request in 718c538.

Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: pytorch#33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad

…33573) Summary: Test needs ability to toggle cuDNN/MIOpen at runtime (enabled in PR #33118) Pull Request resolved: #33573 Differential Revision: D21360260 Pulled By: mrshenli fbshipit-source-id: 6e26edc0932efb5d278c2ffc919979b8eb089216

pytorchbot added the open source label Feb 8, 2020

jithunnair-amd requested review from ezyang and xw285cornell February 10, 2020 17:16

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 11, 2020

jithunnair-amd mentioned this pull request Feb 11, 2020

Use C10_WARP_SIZE to fix functionality on HIP vs CUDA for batch_norm_backward_reduce #33098

Closed

jithunnair-amd requested a review from bddppq February 11, 2020 17:40

bddppq suggested changes Feb 11, 2020

View reviewed changes

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

ezyang removed their request for review February 14, 2020 04:42

bddppq added the module: rocm AMD GPU support for Pytorch label Feb 14, 2020

bddppq reviewed Feb 14, 2020

View reviewed changes

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

bddppq reviewed Feb 14, 2020

View reviewed changes

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

jithunnair-amd added 6 commits February 17, 2020 23:10

Add ability to enable/disable MIOpen at runtime

c43ba3a

Lint fixes

a7ccb6a

Update error message

d7e6561

Remove use of 'hip' variable; Remove cuda guard for classes that call…

2689b74

… cudnn APIs: they'll error out anyway when running on HIP/MIOpen build

Remove unneeded cuda check

3ea6d64

Set default value of 'benchmark' to True for ROCm/MIOpen. Print warni…

f305976

…ng to stderr if user tries to set it to False and override value to True.

jithunnair-amd force-pushed the toggle_miopen_support_at_runtime branch from f7993d5 to f305976 Compare February 17, 2020 23:11

bddppq reviewed Feb 18, 2020

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

torch/backends/cudnn/__init__.py Outdated Show resolved Hide resolved

jithunnair-amd added 2 commits February 18, 2020 17:02

Address review comments

bb1795b

Restore the default value of benchmark parameter to False for ROCm

453aaa3

jithunnair-amd requested a review from bddppq February 19, 2020 19:53

bddppq approved these changes Feb 19, 2020

View reviewed changes

facebook-github-bot reviewed Feb 19, 2020

View reviewed changes

bddppq reviewed Feb 19, 2020

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

Use the correct define for C++

5d93dee

bddppq reviewed Feb 19, 2020

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

Use the correct define for C++

f4894cd

facebook-github-bot reviewed Feb 20, 2020

View reviewed changes

facebook-github-bot closed this in 718c538 Feb 20, 2020

jithunnair-amd mentioned this pull request Feb 20, 2020

Enable test_DistributedDataParallel_SyncBatchNorm_2D_Input unit test #33573

Closed

facebook-github-bot added the merged label Feb 20, 2020

peterbell10 mentioned this pull request Feb 24, 2020

Stop using ctypes to interface with CUDA libraries. #33678

Closed

mruberry added the Merged label Oct 28, 2020

Add ability to enable/disable MIOpen at runtime #33118

Add ability to enable/disable MIOpen at runtime #33118

Uh oh!

Conversation

jithunnair-amd commented Feb 8, 2020

Uh oh!

jithunnair-amd commented Feb 8, 2020

Uh oh!

ezyang commented Feb 10, 2020

Uh oh!

iotamudelta commented Feb 10, 2020

Uh oh!

bddppq left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd commented Feb 13, 2020

Uh oh!

ezyang commented Feb 14, 2020

Uh oh!

dr-ci bot commented Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Detailed failure analysis

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bddppq left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

bddppq left a comment •

edited

Loading

dr-ci bot commented Feb 14, 2020 •

edited

Loading