Skip to content

Conversation

@jithunnair-amd
Copy link
Collaborator

  1. Set torch._C.has_cudnn to True for ROCm
  2. Make MIOpen invocations respect value of cudnn_enabled or at::globalContext().userEnabledCuDNN()
  3. torch/backends/cudnn/__init__.py: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff)

@jithunnair-amd
Copy link
Collaborator Author

cc @iotamudelta

@ezyang
Copy link
Contributor

ezyang commented Feb 10, 2020

A high level question... unlike most of the hipified source code, miopen has bindings directly written (not generated by hipification). So, by analogy, shouldn't it have its own backends file, torch/backends/miopen?

@iotamudelta
Copy link
Contributor

@ezyang thanks for looking at this. Also, I think this is a legitimate and very good question. As you know, the frontend in PyTorch (i.e., the Python layer) assumes cuda == rocm if on ROCm. Hence why we decided to have this also override the respective cuDNN functionality. Also, this will mean that scripts/functionality relying on the cudnn syntax will mostly just work TM) on ROCm.

Now, you are absolutely right that MIOpen's interface and feature set is somewhat different from cuDNN, so there is no perfect 1:1 mapping. I think this is also somewhat reflected in this PR, some things we support on ROCm, some things we don't (yet?).

Ultimately, I believe both choices (MIOpen == cuDNN or MIOpen != cuDNN) are valid from an engineering PoV and at least for me there is no clear winner.

So, what do you think? :-)

@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 11, 2020
Copy link
Contributor

@bddppq bddppq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of allowing user to enable/disable miopen at runtime. And the interface for it should be cudnn, since at python level, we want user code to be able to "magically" switch to use AMD gpus without any code changes.

@jithunnair-amd
Copy link
Collaborator Author

@bddppq Please address my responses to let me know which changes I need to make.

@ezyang ezyang removed their request for review February 14, 2020 04:42
@ezyang
Copy link
Contributor

ezyang commented Feb 14, 2020

I'm gonna let bddppq shepherd this one through

@dr-ci
Copy link

dr-ci bot commented Feb 14, 2020

💊 CircleCI build failures summary and remediations

As of commit f4894cd:

  • 1/1 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | pattern match details)

RuntimeError: test_jit_fuser failed!
 
---------------------------------------------------------------------- 
Ran 46 tests in 11.883s 
 
FAILED (errors=4, skipped=10) 
Traceback (most recent call last): 
  File "run_test.py", line 486, in <module> 
    main() 
  File "run_test.py", line 479, in main 
    raise RuntimeError(message) 
RuntimeError: test_jit_fuser failed! 
 
(base) circleci@PACKER-5E29F737 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 24 times.

@bddppq bddppq added the module: rocm AMD GPU support for Pytorch label Feb 14, 2020
@jithunnair-amd jithunnair-amd force-pushed the toggle_miopen_support_at_runtime branch from f7993d5 to f305976 Compare February 17, 2020 23:11
Copy link
Contributor

@bddppq bddppq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. Thanks for your patience to address all the review comments.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@bddppq merged this pull request in 718c538.

ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
1. Set `torch._C.has_cudnn` to `True` for ROCm
2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()`
3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff)
Pull Request resolved: pytorch#33118

Differential Revision: D19977719

Pulled By: bddppq

fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad
facebook-github-bot pushed a commit that referenced this pull request May 2, 2020
…33573)

Summary:
Test needs ability to toggle cuDNN/MIOpen at runtime (enabled in PR #33118)
Pull Request resolved: #33573

Differential Revision: D21360260

Pulled By: mrshenli

fbshipit-source-id: 6e26edc0932efb5d278c2ffc919979b8eb089216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants