Update oneDNN submodule to v3.3.2 #112700

Xia-Weiwen · 2023-11-02T06:28:45Z

Update oneDNN submodule to v3.3.2.
Add a macro to check the version of third_party/ideep.
Since we have versioning now, the changes won't break any pipeline even if third_party/ideep is not updated at the same time.

The commit bf74e92 fixes a versioning issue with older oneDNN. It separates the versioning of oneDNN and ideep, so that even oneDNN is down graded, the versioning checks still guard the condition we expect.
Test plan for this commit:
Without this fix, tests about qconv will fail if third_party/ideep (and oneDNN) is downgraded. So, the test plan is to

Go to third_party/ideep and downgrade it to pytorch-rls-v3.1.1
run git submodule update inside third_party/ideep
Go up to the source root directory of PyTorch and rebuild PyTorch
Run python test/test_quantization.py

Performance benchmark

Inference throughput after vs. before oneDNN upgrade
FP32 training after vs. before oneDNN upgrade
eager mode throughput geomean ratio: 0.988

Test scope: torchbench
Test platform: c7i.16xlarge instance

Notes about what this oneDNN update addresses for PyTorch:

Fixed a crash bug in SDPA backward (Memory corruption in test_transformers #115253).
Fixed a performance regression (40% speedup) for the inductor workload doctr_det_predictor on x86 (see [inductor][cpu] Perf regression #108324 (comment)).
A few improvements for ARM CPU: ACL fixed format kernels (allows for in-place weights updates), weights pre-packing optimizations for torch.compile(), depthwise convolutions and dilated tensor support.

Known issues with this update (will be addressed in the next patch release):

A performance regression with 5.8% perf drop from pytorch_stargan-train (see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))
Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen

pytorch-bot · 2023-11-02T06:28:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112700

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bf74e92 with merge base 624f202 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5 · 2023-11-03T08:05:58Z

aten/src/ATen/native/quantized/cpu/qconv.cpp

@@ -1473,7 +1473,24 @@ static at::Tensor _quantized_convolution_onednn(
  // Scales of ONEDNN and PyTorch are reciprocal
  const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0 / act_scale);

-#if defined(IDEEP_VERSION_MAJOR) && IDEEP_VERSION_MAJOR>=3 && defined(IDEEP_VERSION_REVISION) && IDEEP_VERSION_REVISION == 0
+#if defined(IDEEP_VERSION_MAJOR) && IDEEP_PREREQ(3, 1, 0, 1)


Since IDEEP_PREREQ uses not only IDEEP_VERSION_MAJOR, suggest to check all the needed macros. Maybe you can just check these macros inside IDEEP_PREREQ?

Thanks for the suggestion. The checks are moved there.
Could you please also review the related PR in ideep: intel/ideep#253 ? Currently this PR does not include oneDNN update. After the ideep PR is merged, this PR will include ideep & oneDNN update and be marked as ready for review. Thanks.

leslie-fang-intel · 2023-11-05T00:18:10Z

aten/src/ATen/native/mkldnn/MKLDNNCommon.h

@@ -5,6 +5,11 @@

 #if AT_MKLDNN_ENABLED()
 #include <ideep.hpp>
+#ifndef IDEEP_PREREQ
+#define IDEEP_PREREQ(major, minor, patch, revision) \
+  (((IDEEP_VERSION_MAJOR << 32) + (IDEEP_VERSION_MINOR << 16) + (IDEEP_VERSION_PATCH << 8) + (DNNL_VERSION_PATCH)) \


Can we add a comment here for the semantic of IDEEP_VERSION_MAJOR , IDEEP_VERSION_MINOR, IDEEP_VERSION_PATCH and DNNL_VERSION_PATCH? Should IDEEP_VERSION_MAJOR , IDEEP_VERSION_MINOR be the same as OneDNN major version and minor version?

Thanks for the comment. The definitions of these macros are here: https://github.com/intel/ideep/blob/ideep_pytorch/include/ideep.hpp

From above link, looks like IDEEP_VERSION_PATCH is same as DNNL_VERSION_PATCH?

Yes. We check ideep API change by IDEEP_VERSION_REVISION. A comment is added and a typo is also fixed there.

ezyang · 2023-11-09T13:38:16Z

@pytorchbot merge

pytorchmergebot · 2023-11-09T13:40:01Z

This PR updates submodules third_party/ideep

If those updates are intentional, please add "submodule" keyword to PR title/description.

ezyang · 2023-11-09T20:53:53Z

@pytorchbot merge

pytorchmergebot · 2023-11-09T20:56:00Z

Merge failed

Reason: Approval needed from one of the following:
yulin0077, satgera, pvtuan10, 842974287, digantdesai, ...

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

atalman · 2023-12-04T20:08:54Z

Hi @Xia-Weiwen looks like multiple quantization tests are failed in test_quantization :

pytorch/test/quantization/core/test_quantized_module.py

Line 472 in 4cb7dd0

self._test_conv_api_impl(

malfet · 2023-12-04T22:24:42Z

@Xia-Weiwen have you tested that PyTorch works with with both previous and current versions of oneDNN?
As from the failures it looks like the following check was not executed as planned using older oneDNN:

pytorch/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp

Lines 382 to 386 in afbaa0c

    
           #if IDEEP_PREREQ(3, 1, 0, 1) 
        
               wgt_scales = ideep::scale_t(1, weight.q_scale()); 
        
           #elif IDEEP_PREREQ(3, 1, 0, 0) 
        
               wgt_scales = ideep::scale_t(1, 1.0/weight.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal 
        
           #else

Xia-Weiwen · 2023-12-05T01:50:24Z

Hi @atalman @malfet Thanks for the feedback. May I know how old your oneDNN was? For the checks, we assume that oneDNN is >= 3.1.0.

Xia-Weiwen · 2023-12-05T05:14:02Z

Hi @atalman @malfet I have updated the PR to fix the issue with older versions of oneDNN. Could you please try again? Thanks!

facebook-github-bot · 2023-12-05T15:25:52Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

atalman · 2023-12-05T17:49:59Z

@pytorchbot merge -f "like internal test_quantization are now passing"

pytorchmergebot · 2023-12-05T17:51:41Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman · 2023-12-05T18:14:31Z

@Xia-Weiwen could you please include the details of the last commit in PR description: Fix versioning issue with older versions of oneDNN
Also please include test plan, how exactly was the versioning change validated on your end.

Xia-Weiwen · 2023-12-06T01:27:16Z

@Xia-Weiwen could you please include the details of the last commit in PR description: Fix versioning issue with older versions of oneDNN Also please include test plan, how exactly was the versioning change validated on your end.

Hi @atalman I have updated the description at the top #112700 (comment). Could you please review? Thanks!

Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman

This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))

Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman

This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))

Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman

This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))

Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman

This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700. Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2). 1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see pytorch/benchmark#2076 (comment)) Validation results with this patch: Latency increased by 0.60% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 metrics-1484287.json { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 418.851717 } } oneDNN v3.3.4 { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 421.381313 } } ``` 2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 3.23% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 (inductor speedup over eager mode) 2.876x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0 oneDNN v3.3.4 (inductor speedup over eager mode) 3.003x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0 ``` 3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 0.85% ``` Tested on an AWS spr metal instance oneDNN v3.1.1 (inductor speedup over eager mode) 1.120x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4 oneDNN v3.3.4 (inductor speedup over eager mode) 1.134x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4 ``` The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues. - #120211 - #120406 - #120547 ----- Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found. I. *torchbench CPU userbenchmark test* Suite | Speedup -- | -- eager_throughtput_bf16_infer | 1.001848 eager_throughtput_fp32_infer | 1.000257 eager_throughtput_fx_int8 | 1.003069 jit_llga_throughtput_amp_bf16 | 1.000682 jit_llga_throughtput_fp32 | 1.000313 eager_throughtput_bf16_train | 0.998222 eager_throughtput_fp32_train | 1.003384 II. *Inductor FP32/AMP inference tests* i. FP32 static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.09 timm_models | tinynet_a | multiple | 128 | 1.14 ii. FP32 dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | alexnet | multiple | 128 | 1.08 torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98 torchbench | timm_efficientnet | multiple | 64 | 1.08 iii. AMP static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | hf_distil_whisper | multiple | 1 | 1.18 torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | BartForConditionalGeneration | multiple | 2 | 1.19 timm_models | eca_halonext26ts | multiple | 128 | 1.13 timm_models | nfnet_l0 | multiple | 128 | 1.13 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | spnasnet_100 | multiple | 128 | 1.15 timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22 timm_models | tinynet_a | multiple | 128 | 1.49 torchbench | hf_Bert_large | single | 1 | 1.16 huggingface | XLNetLMHeadModel | single | 1 | 1.07 iv. AMP dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14 timm_models | nfnet_l0 | multiple | 128 | 1.15 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | tinynet_a | multiple | 128 | 1.34 huggingface | XLNetLMHeadModel | single | 1 | 1.09 ----- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: #120767 Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman

pytorch-bot bot added the release notes: quantization release notes category label Nov 2, 2023

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 2, 2023

Xia-Weiwen requested review from jgong5 and leslie-fang-intel November 2, 2023 06:29

Xia-Weiwen mentioned this pull request Nov 2, 2023

Update oneDNN to v3.3 intel/ideep#253

Merged

pytorchbot added the open source label Nov 2, 2023

jgong5 requested changes Nov 3, 2023

View reviewed changes

leslie-fang-intel reviewed Nov 5, 2023

View reviewed changes

leslie-fang-intel approved these changes Nov 6, 2023

View reviewed changes

Xia-Weiwen force-pushed the update_onednn_to_v3.3 branch 2 times, most recently from 305eb78 to 4879d80 Compare November 6, 2023 05:32

Xia-Weiwen requested a review from jgong5 November 7, 2023 00:39

github-actions bot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Nov 7, 2023

Xia-Weiwen marked this pull request as ready for review November 7, 2023 09:47

Xia-Weiwen requested review from jerryzh168, salilsdesai, kimishpatel, digantdesai and jianyuh as code owners November 7, 2023 09:47

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 9, 2023

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 9, 2023

ezyang changed the title ~~Update oneDNN to v3.3~~ Update oneDNN submodule to v3.3 Nov 9, 2023

pytorchmergebot added the merging label Nov 9, 2023

pytorchmergebot removed the merging label Nov 9, 2023

Xia-Weiwen marked this pull request as draft November 9, 2023 22:37

Xia-Weiwen restored the update_onednn_to_v3.3 branch December 5, 2023 01:50

Xia-Weiwen reopened this Dec 5, 2023

Fix versioning issue with older versions of oneDNN

bf74e92

pytorchmergebot added the merging label Dec 5, 2023

pytorchmergebot closed this in daf89b4 Dec 5, 2023

pytorchmergebot removed the merging label Dec 5, 2023

malfet mentioned this pull request Dec 9, 2023

python -c "import torch;print(torch.nn.GELU()(torch.rand(2)))" crashes on aarch64 #115482

Closed

Xia-Weiwen mentioned this pull request Feb 19, 2024

Update oneDNN submodule to v3.3.4 #117007

Closed

chunyuan-w mentioned this pull request Feb 26, 2024

permute + conv2d crash/memory corruption on CPU in torch 2.2 #120211

Closed

Xia-Weiwen mentioned this pull request Mar 1, 2024

Upgrade submodule onednn to v3.3.5 #120767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update oneDNN submodule to v3.3.2 #112700

Update oneDNN submodule to v3.3.2 #112700

Xia-Weiwen commented Nov 2, 2023 •

edited

Loading

pytorch-bot bot commented Nov 2, 2023 •

edited

Loading

jgong5 Nov 3, 2023

Xia-Weiwen Nov 6, 2023

leslie-fang-intel Nov 5, 2023

Xia-Weiwen Nov 6, 2023

leslie-fang-intel Nov 6, 2023

Xia-Weiwen Nov 6, 2023

ezyang commented Nov 9, 2023

pytorchmergebot commented Nov 9, 2023

ezyang commented Nov 9, 2023

pytorchmergebot commented Nov 9, 2023

atalman commented Dec 4, 2023

malfet commented Dec 4, 2023

Xia-Weiwen commented Dec 5, 2023

Xia-Weiwen commented Dec 5, 2023

facebook-github-bot commented Dec 5, 2023

atalman commented Dec 5, 2023

pytorchmergebot commented Dec 5, 2023

atalman commented Dec 5, 2023

Xia-Weiwen commented Dec 6, 2023

Update oneDNN submodule to v3.3.2 #112700

Update oneDNN submodule to v3.3.2 #112700

Conversation

Xia-Weiwen commented Nov 2, 2023 • edited Loading

pytorch-bot bot commented Nov 2, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112700

✅ No Failures

jgong5 Nov 3, 2023

Choose a reason for hiding this comment

Xia-Weiwen Nov 6, 2023

Choose a reason for hiding this comment

leslie-fang-intel Nov 5, 2023

Choose a reason for hiding this comment

Xia-Weiwen Nov 6, 2023

Choose a reason for hiding this comment

leslie-fang-intel Nov 6, 2023

Choose a reason for hiding this comment

Xia-Weiwen Nov 6, 2023

Choose a reason for hiding this comment

ezyang commented Nov 9, 2023

pytorchmergebot commented Nov 9, 2023

ezyang commented Nov 9, 2023

pytorchmergebot commented Nov 9, 2023

Merge failed

atalman commented Dec 4, 2023

malfet commented Dec 4, 2023

Xia-Weiwen commented Dec 5, 2023

Xia-Weiwen commented Dec 5, 2023

facebook-github-bot commented Dec 5, 2023

atalman commented Dec 5, 2023

pytorchmergebot commented Dec 5, 2023

Merge started

atalman commented Dec 5, 2023

Xia-Weiwen commented Dec 6, 2023

Xia-Weiwen commented Nov 2, 2023 •

edited

Loading

pytorch-bot bot commented Nov 2, 2023 •

edited

Loading