Initial integration of ZenDNN as backend into PyTorch #76242

ratan-prasad · 2022-04-22T16:30:41Z

This PR covers following

Added ZenDNN as third-party library
- ZenDNN is a submodule for pytorch
- modified pytorch build infra to build with ZenDNN when USE_ZENDNN is set
  to 1
- pytorch build reverts to default behavior if USE_ZENDNN environmental
  variable is set to zero or unset
Added AOCL-blis as dependency of ZenDNN
- AOCL-blis is a submodule for pytorch
- AOCL-blis is built if USE_ZENDNN is set to 1
Added ZenDNN as backend to pytorch
- added support for user level control to disable or enable zendnn backend
Added following changes
- added zendnn layout and ZendnnCPU dispatch key.
- tensor conversion between dense and zendnn layouts
- zendnn convolution as a new ConvBackend
  - when built with zendnn (USE_ZENDNN=1) and gradient calculations are
    disabled (torch.no_grad) convolution is computed using ZenDNN library)
- backward propagation is not supported.
- support is only added for fp32 data type
Added few unit tests for pytorch + zendnn

Signed-off-by: Ratan Prasad ratan.prasad2@amd.com

Fixes #ISSUE_NUMBER

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

facebook-github-bot · 2022-04-22T16:30:47Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/76242
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 67503f7 (more details on the Dr. CI page):

Expand to see more

Commit 67503f7 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

malfet · 2022-05-04T22:47:19Z

@ratan-prasad can you please share some performance numbers in PR description? Also, what is the expected binary footprint increase as result of this change?

soumith · 2022-06-01T15:13:49Z

also, why is ZenDNN not just part of the rocm backend integration?

ratan-prasad · 2022-06-02T16:52:15Z

@ratan-prasad can you please share some performance numbers in PR description? Also, what is the expected binary footprint increase as result of this change?
@malfet This PR contains only one op integrated to validate the procedure ahead of submitting the remaining PRs integrating the rest of the ZenDNN ops. To showcase the potential uplift using ZenDNN vs. Intel MKL-DNN on AMD CPU, we have benchmarked our full integrated version on PT v1.11.
PyTorch v1.11 Throughput(images/sec) BS:640, Milan 64C NPS4

Models	Improvement
ResNet50	6.49%
GoogleNet	118.83%
vgg11	18.17%
Inception-v4	69.96%
Alexnet	17.99%

PyTorch v1.11 Latency(ms) BS:1 Milan 64C NPS4

Models	Improvement
ResNet50	6.49%
GoogleNet	118.83%
vgg11	18.17%
Inception-v4	69.96%
Alexnet	17.99%

Wheel file size comparison with and without ZenDNN build as below

Build	PR1 (size in MB)
Default build	105
Build with ZenDNN	107

ratan-prasad · 2022-06-02T16:53:19Z

also, why is ZenDNN not just part of the rocm backend integration?

@soumith ZenDNN is CPU inference library on CPU and both have different release cycles based on customer needs as well as different CPU vs. GPU product release cycles and the corresponding need for timing-aligned software releases. This is the reason ZenDNN is not part of rocm integration.

huiguoo · 2022-06-17T17:50:10Z

@ratan-prasad Are the performance numbers exactly the same for batch size 1 and batch size 640? Have you tried other batch sizes?

ratan-prasad · 2022-07-05T05:55:36Z

@ratan-prasad Are the performance numbers exactly the same for batch size 1 and batch size 640? Have you tried other batch sizes?

Hi @huiguoo Sorry for late reply. We have benchmarked for other batch sizes also. There was some typo for batch size 1 in previous reply. We observed that conda installation of PyTorch v1.11.0 performs better than pip based installation for oneDNN. Hence we have regenerated benchmarking data. Please find the updated improvements info as below.

Please find the benchmarking details below

PyTorch v1.11 Throughput(images/sec) BS:640, Milan 64C, NPS4

Models	Improvement
ResNet50	11.09%
GoogleNet	113.91%
vgg11	17.57%
Inception-v4	70.13%
Alexnet	5.05%

PyTorch v1.11 Throughput(images/sec) BS:128, Milan 64C, NPS4

Models	Improvement
ResNet50	4.14%
GoogleNet	107.84%
vgg11	13.66%
Inception-v4	63.66%
Alexnet	14.61%

PyTorch v1.11 Throughput(images/sec) BS:1, Milan 64C, NPS4

Models	Improvement
ResNet50	12.38%
GoogleNet	66.27%
vgg11	154.63%
Inception-v4	47.40%
Alexnet	324.45%

malfet

Proposed solution makes zennDNN and BLIS a mandatory submodules of PyTorch even though dependence itself will be used very sporadically (as currently there are no CI/CD support)

Can we please solve mutual exclusivity issue or modify the PR to make those optional dependencies(checked out only if USE_ZENDNN is set to true) and reuse the same DNNCPU dispatch key

malfet · 2022-07-06T15:33:08Z

CMakeLists.txt

@@ -128,9 +128,11 @@ endif()

 set(CPU_AARCH64 OFF)
 set(CPU_INTEL OFF)
+set(CPU_AMD OFF)


Can you please explain, what's the purpose of CPU_AMD here? You can replace CPU_INTEL with CPU_X64 if you want.

Thanks for the suggestion. We wanted to keep the existing code untouched as much as possible.
We will submit this correction in another patch on the same pull request.

malfet · 2022-07-06T15:34:09Z

CMakeLists.txt

+
+
+cmake_dependent_option(
+  USE_ZENDNN "Use ZENDNN. Only available on x86 and x86_64." "${CPU_AMD}"


It should not be gated on new CPU_AMD but rather on generic X86 architecture.

Thanks for the suggestion. We will submit this correction in another patch on the same pull request.

malfet · 2022-07-06T15:39:28Z

c10/core/Backend.h

  MkldnnCPU,
+  ZendnnCPU,


Hmm, as those are mutually exclusive right now, what's the harm of reusing the same dispatch key? (rename it to neutral DNNCPU)

If we use same dispatch key we are anticipating some challenges in ZenDNN integration.

E.g.
How to update “aten/src/ATen/native/native_functions.yaml” for mkldnn and zendnn integrations?

Currently a relu function entry in “aten/src/ATen/native/native_functions.yaml” looks as follows

func: relu(Tensor self) -> Tensor
device_check: NoCheck # TensorIterator
variants: function, method
dispatch:
CPU, CUDA: relu
MkldnnCPU: mkldnn_relu
ZendnnCPU: zendnn_relu
QuantizedCPU: relu_quantized_cpu

How to make the same dnnCPU dispatch key to point to mkldnn_relu or zendnn_relu based on CMAKE Level flags? Can you please point us to any example for this?

Note:- ATEN calls mkldnn using IDEEP API and IDEEP library. ATEN will be calling ZenDNN directly using ZenDNN API without IDEEP API/library

@malfet I think one issue with sharing the same key is that we have to build separate packages for AMD and Intel which isn't ideal. Ideally we can use the same package and at runtime pick different backend.

@ratan-prasad curious about the mutually exclusive part: why is that the case? Is it symbol conflict or something?

@xw285cornell Yes, ZenDNN uses AMD-BLIS (a CBLAS API library) for optimal matrix operations on AMD hardware. AMD-BLIS has symbol clashes with MKL. That's why we are planning for mutually exclusive builds.
Can you help us with sample code or existing similar integrations where same dispatch key shared between two different libraries?

malfet · 2022-07-06T15:40:17Z

c10/core/TensorImpl.h

+    constexpr auto sparse_and_sparsecsr_and_mkldnn_ks_and_zendnn_ks =
+        c10::sparse_ks | c10::sparse_csr_ks | c10::mkldnn_ks | c10::zendnn_ks;
+    if (!key_set_.has_any(sparse_and_sparsecsr_and_mkldnn_ks_and_zendnn_ks)) {


Same as above: it there would be the same key, no need to add additional check here

Same answer as previous.

ratan-prasad · 2022-07-11T05:20:01Z

Proposed solution makes zennDNN and BLIS a mandatory submodules of PyTorch even though dependence itself will be used very sporadically (as currently there are no CI/CD support)

Can we please solve mutual exclusivity issue or modify the PR to make those optional dependencies(checked out only if USE_ZENDNN is set to true) and reuse the same DNNCPU dispatch key

Should we modify “CMakeLists.txt” or “cmake/Dependencies.cmake” so that ZenDNN and BLIS repositories will be downloaded during setup.py execution ? (when USE_ZENDNN=1 is exported).

Note: We are also looking forward to enable CI runs for ZenDNN builds.

This PR covers following - Added ZenDNN as third-party library - ZenDNN is a submodule for pytorch - modified pytorch build infra to build with ZenDNN when USE_ZENDNN is set to 1 - pytorch build reverts to default behavior if USE_ZENDNN environmental variable is set to zero or unset - Added blis (dependency of ZenDNN) as third-party library - blis is a submodule for pytorch - modified pytorch build infra to build with blis when BLAS is set to BLIS - blis is built if USE_ZENDNN is set to 1 - Added ZenDNN as backend to pytorch - added support for user level control to disable or enable zendnn backend - Added following changes - added zendnn layout and ZendnnCPU dispatch key. - tensor conversion between dense and zendnn layouts - zendnn convolution as a new ConvBackend - when built with zendnn (USE_ZENDNN=1) and gradient calculations are disabled (torch.no_grad) convolution is computed using ZenDNN library) - backward propagation is not supported. - support is only added for fp32 data type - Added few unit tests for pytorch + zendnn Signed-off-by: Ratan Prasad <ratan.prasad2@amd.com>

Fixing following CI errors: -lint errors and warnings -failure in test_autograd.py -failure in test_mkldnn.py Signed-off-by: ratan-prasad <ratan.prasad2@amd.com>

Fixed following CI issue: - failure in test_public_bindings.py - failure in test_overrides.py - failure in test_sparse_csr.py - failure in documentation building Signed-off-by: ratan-prasad <ratan.prasad2@amd.com> Signed-off-by: Naveen Kumar T <NAVEEN.THANGUDU@amd.com>

Fixed following - Fixed android buid issue - ZenDNN uses AOCL-blis library as dependency. AOCL-blis is used by ZenDNN independent of BLAS setting - BLAS=BLIS build changes is restored as system default - fixed failure in test_profiler.py for USE_ZENDNN=1 build Signed-off-by: ratan-prasad <ratan.prasad2@amd.com>

b0noI · 2022-08-31T22:02:42Z

is there RFC for this? If no we probably should proceed with the review but for the next big changes like this we probably should have RFC first. If there is RFC reviewed and approved already please link it.

ratan-prasad · 2022-09-15T15:36:57Z

is there RFC for this? If no we probably should proceed with the review but for the next big changes like this we probably should have RFC first. If there is RFC reviewed and approved already please link it.

Hi @b0noI,
There is no RFC for this. We had an meeting/email communication earlier where it was suggested to submit a small PR first before rest of the PRs are submitted. Please find the link of issue raised here:
#76244
Once this patch is merged, we would submit a RFC first and then rest of the PRs.

facebook-github-bot · 2022-10-04T01:07:44Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T01:07:47Z

✅ login: ratan-prasad / name: Ratan Prasad (9409da7, af721fb)
❌ The commit (6e60038, 67503f7). This user is missing the User's ID, preventing the EasyCLA check. Consult GitHub Help to resolve.For further assistance with EasyCLA, please submit a support request ticket.

Willian-Zhang · 2022-11-11T17:16:24Z

Very excited to see this is going on rapidly. Just saw Zen 4 EPYC release today and huge performance boost along with it, can't wait to get one myself playing around.

github-actions · 2023-01-10T17:34:24Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ratan-prasad · 2023-01-11T04:08:34Z

HI @xw285cornell, @malfet, @bdhirsh, @robieta can you please help us in removing the Stale tag?

malfet · 2023-01-11T04:26:22Z

@ratan-prasad i'm not sure, why it should be removed. PR in it's current form reached a stalemate and as such can not be merged.

jgong5 · 2023-01-11T06:38:04Z

I took a quick look at the ZenDNN code: https://github.com/amd/ZenDNN. It seems a lot of code was just a clone of oneDNN (a.k.a. MKLDNN). Have you considered to contribute optimizations to oneDNN directly? This can make PyTorch side simpler. You don't have to clone the oneDNN integration again here. cc @vpirogov

naveenthangudu · 2023-01-12T12:49:27Z

@ratan-prasad i'm not sure, why it should be removed. PR in it's current form reached a stalemate and as such can not be merged.

Can you please let us what can be done to bypass this stalemate?

soumith · 2023-01-13T00:52:42Z

what would definitely remove the stalemate is to upstream your patches to OneDNN.
I didn't realize ZenDNN is a fork of OneDNN with a bunch of duplicated code -- but after @jgong5 pointed it out, I went and looked and it seems true.
It seems completely wasteful and bug-prone to fork it.

If you wish to continue maintaining the fork because of company dynamics, etc., then as Nikita stated above, this PR needs to:

be rebased onto latest master commit
made a completely optional cmake flag (and no submodules should be committed to the pytorch repo), as @ratan-prasad described in this comment: Initial integration of ZenDNN as backend into PyTorch #76242 (comment)
Because ZenDNN is a fork of OneDNN with minor API changes, similar to how ROCm does a regex of s/cuda/hip/, the ZenDNN patch can regex s/mkldnn/zendnn/ to keep the changes to PyTorch core minimal (instead of having ~3000 lines enter PyTorch core that we'll have to maintain)

naveenthangudu · 2023-02-09T12:18:07Z

@malfet @soumith Thank you for inputs to avoid stalemate on ZenDNN integration.

1. be rebased onto latest `master` commit

With substitution approach, integration changes will be different compared to this PR. Hence, we are planning to raise new PR.

2. made a completely optional cmake flag (and no submodules should be committed to the pytorch repo), as @ratan-prasad described in this comment: [Initial integration of ZenDNN as backend into PyTorch #76242 (comment)](https://github.com/pytorch/pytorch/pull/76242#issuecomment-1179977899)

Can we use cmake API(“ExternalProject_Add”) for downloading ZenDNN repos?

3. Because ZenDNN is a fork of OneDNN with minor API changes, similar to how ROCm does a regex of `s/cuda/hip/`, the ZenDNN patch can regex `s/mkldnn/zendnn/` to keep the changes to PyTorch core minimal (instead of having ~3000 lines enter PyTorch core that we'll have to maintain)

currently working on a POC of this approach. Should I create an RFC for this approach?

Note: PyTorch doesnt call oneDNN directly. It calls oneDNN using another library ideep

gottbrath · 2023-02-10T00:31:29Z

Because ZenDNN is a fork of OneDNN with minor API changes, similar to how ROCm does a regex of s/cuda/hip/, the ZenDNN patch can regex s/mkldnn/zendnn/ to keep the changes to PyTorch core minimal (instead of having ~3000 lines enter PyTorch core that we'll have to maintain)
currently working on a POC of this approach. Should I create an RFC for this approach?

I think a POC and an RFC is a great idea for something of this magnitude/potential complexity.

gottbrath · 2023-03-24T17:08:48Z

see pytorch/rfcs#52

ratan-prasad requested review from a team, bdhirsh, albanD and soulitzer as code owners April 22, 2022 16:30

facebook-github-bot added the cla signed label Apr 22, 2022

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Apr 22, 2022

pytorchbot added the open source label Apr 22, 2022

ratan-prasad mentioned this pull request Apr 22, 2022

Initial integration of ZenDNN as backend into PyTorch #76244

Open

mrshenli added module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 24, 2022

ratan-prasad force-pushed the local_pt_pr_1 branch from e998e5e to d67a093 Compare April 26, 2022 11:53

albanD removed their request for review April 26, 2022 13:07

soulitzer removed their request for review April 28, 2022 16:24

ratan-prasad force-pushed the local_pt_pr_1 branch 2 times, most recently from 2eca46d to 63cec6c Compare May 5, 2022 15:57

ratan-prasad force-pushed the local_pt_pr_1 branch from 63cec6c to 9fbf0c2 Compare May 16, 2022 05:30

ratan-prasad force-pushed the local_pt_pr_1 branch 2 times, most recently from c3b22d4 to 67604e3 Compare June 16, 2022 12:38

malfet requested changes Jul 6, 2022

View reviewed changes

ratan-prasad and others added 3 commits August 9, 2022 12:15

Fixing CI error with ZenDNN backend

af721fb

Fixing following CI errors: -lint errors and warnings -failure in test_autograd.py -failure in test_mkldnn.py Signed-off-by: ratan-prasad <ratan.prasad2@amd.com>

ratan-prasad force-pushed the local_pt_pr_1 branch from 67604e3 to 67503f7 Compare August 9, 2022 13:09

ratan-prasad requested a review from robieta as a code owner August 9, 2022 13:09

github-actions bot added the Stale label Jan 10, 2023

pytorch-bot bot added the release notes: jit release notes category label Jan 10, 2023

kit1980 removed the Stale label Jan 11, 2023

kit1980 added the Stale label Jan 11, 2023

github-actions bot closed this Mar 12, 2023

gottbrath mentioned this pull request Mar 24, 2023

RFC-0031: ZenDNN Integration pytorch/rfcs#52

Open



		cmake_dependent_option(
		USE_ZENDNN "Use ZENDNN. Only available on x86 and x86_64." "${CPU_AMD}"

Initial integration of ZenDNN as backend into PyTorch #76242

Initial integration of ZenDNN as backend into PyTorch #76242

Uh oh!

Conversation

ratan-prasad commented Apr 22, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

malfet commented May 4, 2022

Uh oh!

soumith commented Jun 1, 2022

Uh oh!

ratan-prasad commented Jun 2, 2022

Uh oh!

ratan-prasad commented Jun 2, 2022

Uh oh!

huiguoo commented Jun 17, 2022

Uh oh!

ratan-prasad commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratan-prasad commented Jul 11, 2022

Uh oh!

b0noI commented Aug 31, 2022

Uh oh!

ratan-prasad commented Sep 15, 2022

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022

Uh oh!

Willian-Zhang commented Nov 11, 2022

Uh oh!

github-actions bot commented Jan 10, 2023

Uh oh!

ratan-prasad commented Jan 11, 2023

Uh oh!

malfet commented Jan 11, 2023

Uh oh!

jgong5 commented Jan 11, 2023

Uh oh!

naveenthangudu commented Jan 12, 2023

Uh oh!

soumith commented Jan 13, 2023

Uh oh!

naveenthangudu commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gottbrath commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gottbrath commented Mar 24, 2023

Uh oh!

ratan-prasad commented Apr 22, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Apr 22, 2022 •

edited

Loading

ratan-prasad commented Jul 5, 2022 •

edited

Loading

naveenthangudu commented Feb 9, 2023 •

edited

Loading

gottbrath commented Feb 10, 2023 •

edited

Loading