-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update oneDNN submodule to v3.3.2 #112700
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112700
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit bf74e92 with merge base 624f202 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -1473,7 +1473,24 @@ static at::Tensor _quantized_convolution_onednn( | |||
// Scales of ONEDNN and PyTorch are reciprocal | |||
const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0 / act_scale); | |||
|
|||
#if defined(IDEEP_VERSION_MAJOR) && IDEEP_VERSION_MAJOR>=3 && defined(IDEEP_VERSION_REVISION) && IDEEP_VERSION_REVISION == 0 | |||
#if defined(IDEEP_VERSION_MAJOR) && IDEEP_PREREQ(3, 1, 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since IDEEP_PREREQ
uses not only IDEEP_VERSION_MAJOR
, suggest to check all the needed macros. Maybe you can just check these macros inside IDEEP_PREREQ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. The checks are moved there.
Could you please also review the related PR in ideep: intel/ideep#253 ? Currently this PR does not include oneDNN update. After the ideep PR is merged, this PR will include ideep & oneDNN update and be marked as ready for review. Thanks.
@@ -5,6 +5,11 @@ | |||
|
|||
#if AT_MKLDNN_ENABLED() | |||
#include <ideep.hpp> | |||
#ifndef IDEEP_PREREQ | |||
#define IDEEP_PREREQ(major, minor, patch, revision) \ | |||
(((IDEEP_VERSION_MAJOR << 32) + (IDEEP_VERSION_MINOR << 16) + (IDEEP_VERSION_PATCH << 8) + (DNNL_VERSION_PATCH)) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here for the semantic of IDEEP_VERSION_MAJOR
, IDEEP_VERSION_MINOR
, IDEEP_VERSION_PATCH
and DNNL_VERSION_PATCH
? Should IDEEP_VERSION_MAJOR
, IDEEP_VERSION_MINOR
be the same as OneDNN
major version and minor version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. The definitions of these macros are here: https://github.com/intel/ideep/blob/ideep_pytorch/include/ideep.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From above link, looks like IDEEP_VERSION_PATCH
is same as DNNL_VERSION_PATCH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We check ideep API change by IDEEP_VERSION_REVISION
. A comment is added and a typo is also fixed there.
305eb78
to
4879d80
Compare
@pytorchbot merge |
This PR updates submodules third_party/ideep If those updates are intentional, please add "submodule" keyword to PR title/description. |
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following: |
Hi @Xia-Weiwen looks like multiple quantization tests are failed in test_quantization :
|
@Xia-Weiwen have you tested that PyTorch works with with both previous and current versions of oneDNN? pytorch/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp Lines 382 to 386 in afbaa0c
|
@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge -f "like internal test_quantization are now passing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@Xia-Weiwen could you please include the details of the last commit in PR description: Fix versioning issue with older versions of oneDNN |
Hi @atalman I have updated the description at the top #112700 (comment). Could you please review? Thanks! |
Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
This reverts commit afbaa0c. Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: pytorch#112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700. Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2). 1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see pytorch/benchmark#2076 (comment)) Validation results with this patch: Latency increased by 0.60% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 metrics-1484287.json { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 418.851717 } } oneDNN v3.3.4 { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 421.381313 } } ``` 2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 3.23% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 (inductor speedup over eager mode) 2.876x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0 oneDNN v3.3.4 (inductor speedup over eager mode) 3.003x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0 ``` 3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 0.85% ``` Tested on an AWS spr metal instance oneDNN v3.1.1 (inductor speedup over eager mode) 1.120x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4 oneDNN v3.3.4 (inductor speedup over eager mode) 1.134x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4 ``` The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues. - #120211 - #120406 - #120547 ----- Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found. I. *torchbench CPU userbenchmark test* Suite | Speedup -- | -- eager_throughtput_bf16_infer | 1.001848 eager_throughtput_fp32_infer | 1.000257 eager_throughtput_fx_int8 | 1.003069 jit_llga_throughtput_amp_bf16 | 1.000682 jit_llga_throughtput_fp32 | 1.000313 eager_throughtput_bf16_train | 0.998222 eager_throughtput_fp32_train | 1.003384 II. *Inductor FP32/AMP inference tests* i. FP32 static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.09 timm_models | tinynet_a | multiple | 128 | 1.14 ii. FP32 dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | alexnet | multiple | 128 | 1.08 torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98 torchbench | timm_efficientnet | multiple | 64 | 1.08 iii. AMP static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | hf_distil_whisper | multiple | 1 | 1.18 torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | BartForConditionalGeneration | multiple | 2 | 1.19 timm_models | eca_halonext26ts | multiple | 128 | 1.13 timm_models | nfnet_l0 | multiple | 128 | 1.13 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | spnasnet_100 | multiple | 128 | 1.15 timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22 timm_models | tinynet_a | multiple | 128 | 1.49 torchbench | hf_Bert_large | single | 1 | 1.16 huggingface | XLNetLMHeadModel | single | 1 | 1.07 iv. AMP dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14 timm_models | nfnet_l0 | multiple | 128 | 1.15 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | tinynet_a | multiple | 128 | 1.34 huggingface | XLNetLMHeadModel | single | 1 | 1.09 ----- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: #120767 Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700. Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2). 1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see pytorch/benchmark#2076 (comment)) Validation results with this patch: Latency increased by 0.60% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 metrics-1484287.json { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 418.851717 } } oneDNN v3.3.4 { "name": "cpu", "environ": { "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0" }, "metrics": { "latency": 421.381313 } } ``` 2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 3.23% ``` Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake) oneDNN v3.1.1 (inductor speedup over eager mode) 2.876x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0 oneDNN v3.3.4 (inductor speedup over eager mode) 3.003x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0 ``` 3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see #115346 (comment)) Validation results with this patch: Latency reduced by 0.85% ``` Tested on an AWS spr metal instance oneDNN v3.1.1 (inductor speedup over eager mode) 1.120x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4 oneDNN v3.3.4 (inductor speedup over eager mode) 1.134x dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4 ``` The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues. - #120211 - #120406 - #120547 ----- Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found. I. *torchbench CPU userbenchmark test* Suite | Speedup -- | -- eager_throughtput_bf16_infer | 1.001848 eager_throughtput_fp32_infer | 1.000257 eager_throughtput_fx_int8 | 1.003069 jit_llga_throughtput_amp_bf16 | 1.000682 jit_llga_throughtput_fp32 | 1.000313 eager_throughtput_bf16_train | 0.998222 eager_throughtput_fp32_train | 1.003384 II. *Inductor FP32/AMP inference tests* i. FP32 static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.09 timm_models | tinynet_a | multiple | 128 | 1.14 ii. FP32 dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | alexnet | multiple | 128 | 1.08 torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98 torchbench | timm_efficientnet | multiple | 64 | 1.08 iii. AMP static default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | hf_distil_whisper | multiple | 1 | 1.18 torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | BartForConditionalGeneration | multiple | 2 | 1.19 timm_models | eca_halonext26ts | multiple | 128 | 1.13 timm_models | nfnet_l0 | multiple | 128 | 1.13 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | spnasnet_100 | multiple | 128 | 1.15 timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22 timm_models | tinynet_a | multiple | 128 | 1.49 torchbench | hf_Bert_large | single | 1 | 1.16 huggingface | XLNetLMHeadModel | single | 1 | 1.07 iv. AMP dynamic default suite | name | thread | batch size | Ratio Speedup(New/old) -- | -- | -- | -- | -- torchbench | timm_efficientnet | multiple | 64 | 1.32 huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14 timm_models | nfnet_l0 | multiple | 128 | 1.15 timm_models | rexnet_100 | multiple | 128 | 1.45 timm_models | tinynet_a | multiple | 128 | 1.34 huggingface | XLNetLMHeadModel | single | 1 | 1.09 ----- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: #120767 Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of
third_party/ideep
.Since we have versioning now, the changes won't break any pipeline even if
third_party/ideep
is not updated at the same time.The commit bf74e92 fixes a versioning issue with older oneDNN. It separates the versioning of oneDNN and ideep, so that even oneDNN is down graded, the versioning checks still guard the condition we expect.
Test plan for this commit:
Without this fix, tests about qconv will fail if third_party/ideep (and oneDNN) is downgraded. So, the test plan is to
third_party/ideep
and downgrade it to pytorch-rls-v3.1.1git submodule update
insidethird_party/ideep
python test/test_quantization.py
Performance benchmark
Inference throughput after vs. before oneDNN upgrade
![image](https://private-user-images.githubusercontent.com/12522207/290447822-62d51fb9-603c-4133-8f64-368dce6c0864.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk1NDAwMjQsIm5iZiI6MTcxOTUzOTcyNCwicGF0aCI6Ii8xMjUyMjIwNy8yOTA0NDc4MjItNjJkNTFmYjktNjAzYy00MTMzLThmNjQtMzY4ZGNlNmMwODY0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI4VDAxNTUyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTljODU5NmRiMDI0NDU1Njg0MDUzNDQzZjM0YTU2YWEwZTVkNzA3Yjk1NzhmN2YxMDZkYWFlMDI1OGIxNDY3NGUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.P60q_NZn6pK7aFThnYsckPwqRT9d8cjPVkjAV_g5eA0)
FP32 training after vs. before oneDNN upgrade
eager mode throughput geomean ratio: 0.988
Test scope: torchbench
Test platform: c7i.16xlarge instance
Notes about what this oneDNN update addresses for PyTorch:
Known issues with this update (will be addressed in the next patch release):
pytorch_stargan-train
(see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen