Upgrade submodule oneDNN to v3.4 #122472

Xia-Weiwen · 2024-03-22T03:04:59Z

Improvements

This upgrade fixes the following issues:

conv_transpose1d is 1000x times slow in torch 2.2.1+cpu vs torch 1.13.1+cpu #120982

This upgrade brings the following new features:

Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor ([AOTInductor] Need support to export freezing model #114450)

Validation results on CPU

No regression was found.

NLP models accuracy/inference/training

Model Name	Mode	Precision	New	Baseline	New/Baseline
bert-large	accuracy	fp32	93.15325	93.15325	100.00%
bert-large	accuracy	bf16	93.20125	93.20125	100.00%
bert-large	accuracy	int8	92.66641	92.66641	100.00%
LCM	accuracy	fp32	44.11152	44.11154	100.00%
LCM	accuracy	bf16	43.57667	43.65096	100.17%
ViT	accuracy	fp32	0.8033	0.8033	100.00%
ViT	accuracy	bf16	0.8031	0.8031	100.00%
ViT	accuracy	int8	0.7985	0.7985	100.00%
yolov7	accuracy	fp32	0.512	0.512	100.00%
yolov7	accuracy	bf16	0.504	0.504	100.00%
yolov7	accuracy	int8	0.507	0.507	100.00%
bert-large	realtime	fp32	37.433	39.136	95.65%
bert-large	realtime	bf16	166.592	160.134	104.03%
bert-large	realtime	int8	230.876	222.594	103.72%
ViT	realtime	fp32	288.19	282.05	102.18%
ViT	realtime	bf16	755.42	741.1	101.93%
ViT	realtime	int8	1060.94	1092.47	97.11%
yolov7	realtime	fp32	17.06927	16.47995	103.58%
yolov7	realtime	bf16	54.68561	54.00723	101.26%
yolov7	realtime	int8	78.38271	77.63214	100.97%
bert-large	throughput	fp32	47.142	47.341	99.58%
bert-large	throughput	bf16	200.365	200.806	99.78%
bert-large	throughput	int8	144.999	145.295	99.80%
LCM	throughput	fp32	0.54913	0.54897	100.03%
LCM	throughput	bf16	1.062417	1.07772	98.58%
stable-diffusion	throughput	fp32	0.03301	0.0331	99.73%
stable-diffusion	throughput	bf16	0.08773	0.08849	99.14%
stable-diffusion	throughput	int8	0.0491	0.05024	97.73%
ViT	throughput	fp32	342.55	346.47	98.87%
ViT	throughput	bf16	1263.4	1268.32	99.61%
ViT	throughput	int8	1331.3	1345.32	98.96%
yolov7	throughput	fp32	115.313	115.612	99.74%
yolov7	throughput	bf16	323.364	323.747	99.88%
yolov7	throughput	int8	388.137	384.236	101.02%
bert-large	train_phase1	fp32	34.223	34.309	99.75%
bert-large	train_phase1	bf16	90.372	88.453	102.17%
bert-large	train_phase2	fp32	7.307	7.318	99.85%

Data Type	Geomean
fp32	99.88%
bf16	100.70%
int8	99.88%
all	100.16%

Torchbench cpu userbenchmark inference & training

Test suite	Geomean Ratio (New/baseline)
eager_throughtput_bf16_infer	1.00x
eager_throughtput_fp32_infer	1.00x
jit_llga_throughtput_amp_bf16	0.99x
jit_llga_throughtput_fp32	1.01x
eager_throughtput_fx_int8	1.00x
eager_throughtput_bf16_train	1.00x
eager_throughtput_fp32_train	1.00x

Inductor quantization (static & dynamic) accuracy & performance

Config	Performance geomean ratio (New/baseline)	Accuracy ratio (New/baseline)
Static quant PTQ	0.99x	1.00x
Static quant PTQ_CPP_WRAPPER	0.98x	1.00x
Static quant QAT	0.99x	1.00x
Dynamic quant PTQ	1.00x	1.00x

Dynamo benchmarks

Precision	Shape	Wrapper	Thread	Ratio old/new GEOMEAN	Ratio old/new GEOMEAN
				Eager	Inductor
Float32	Static	Default	Multiple	0.998776	1.002091
			Single	1.014086	1.01054
Float32	Dynamic	Default	Multiple	1.00386	1.005975
			Single	1.011036	1.008317
AMP	Static	Default	Multiple	0.996965	1.005117
			Single	1.00092	0.995666
AMP	Dynamic	Default	Multiple	0.9959	0.995048
			Single	1.002569	0.994085

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @snadampal

pytorch-bot · 2024-03-22T03:05:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122472

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit ecaf0d0 with merge base 1c3fe84 ():

NEW FAILURES - The following jobs have failed:

linux-aarch64 / linux-jammy-aarch64-py3.10 / build (gh)
/var/lib/jenkins/workspace/third_party/ideep/mkl-dnn/src/cpu/aarch64/acl_indirect_gemm_convolution.hpp:54:53: error: no matching function for call to ‘arm_compute::Conv2dInfo::Conv2dInfo(const arm_compute::PadStrideInfo&, const arm_compute::Size2D&, const arm_compute::ActivationLayerInfo&, const bool&, int, const arm_compute::WeightsInfo&)’
xpu / linux-jammy-xpu-py3.8 / test (default, 4, 4, linux.idc.xpu) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

sanchitintel · 2024-03-22T05:00:03Z

Hi @Xia-Weiwen, can we wait for a couple of days, and choose another commit instead? Thanks!

Xia-Weiwen · 2024-03-22T05:02:21Z

Hi @Xia-Weiwen, can we wait for a couple of days, and choose another commit instead? Thanks!

No problem. And don't worry. This PR is for test now. We need to land #122164 first.

Guobing-Chen · 2024-04-10T02:27:27Z

@milpuz01, suppose you've also kicked off some tests for ARM platforms as per previous email communication. Could you share any test result if available?

Xia-Weiwen · 2024-04-10T04:19:37Z

Hi @jerryzh168 This upgrade is necessary for support of freezing models on CPU for AOT Inductor. Additional PRs are still needed after oneDNN upgrade.

…ize API" This PR requires #122472 to land firstly. The failure of "Check mergeability of ghstack PR" in the CI is because #122472 has been reverted on main. This mergeability issue could be fixed once it relanded. Upgrade ideep to include the new serialize/deserialize API in oneDNN v3.4.1. More details can be found in this ideep PR: intel/ideep#305. These APIs are needed to fix #114450. Since new APIs are added in ideep, we increased the ideep version so that this upgrade won't break any pipeline even if third_party/ideep is not updated at the same time. cc gujinghui PenghuiCheng XiaobingSuper jianyuh jgong5 mingfeima sanchitintel ashokei jingxu10 min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen snadampal voznesenskym penguinwu EikanWang zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

This PR requires #122472 to land firstly. The failure of "Check mergeability of ghstack PR" in the CI is because #122472 has been reverted on main. This mergeability issue could be fixed once it relanded. Upgrade ideep to include the new serialize/deserialize API in oneDNN v3.4.1. More details can be found in this ideep PR: intel/ideep#305. These APIs are needed to fix #114450. Since new APIs are added in ideep, we increased the ideep version so that this upgrade won't break any pipeline even if third_party/ideep is not updated at the same time. cc gujinghui PenghuiCheng XiaobingSuper jianyuh jgong5 mingfeima sanchitintel ashokei jingxu10 min-jean-cho yanbing-j Guobing-Chen Xia-Weiwen snadampal voznesenskym penguinwu EikanWang zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

## Description Fixes #114450. This PR builds upon the work from imzhuhl done in #114451. This PR requires #122472 to land firstly. The failure of "Check mergeability of ghstack PR" in the CI is because #122472 has been reverted on main. This mergeability issue could be fixed once it relanded. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Reopen of #122472 ## Improvements This upgrade fixes the following issues: - #120982 This upgrade brings the following new features: - Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (#114450) ## Validation results on CPU Original results with oneDNN v3.4.1 are here: #122472 (comment) Need to rerun validation and update results. Co-authored-by: Sunita Nadampalli <nadampal@amazon.com> Pull Request resolved: #126137 Approved by: https://github.com/jgong5, https://github.com/snadampal, https://github.com/atalman

## Description Fixes #114450. This PR builds upon the work from imzhuhl done in #114451. This PR requires #122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Reopen of pytorch#122472 ## Improvements This upgrade fixes the following issues: - pytorch#120982 This upgrade brings the following new features: - Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (pytorch#114450) ## Validation results on CPU Original results with oneDNN v3.4.1 are here: pytorch#122472 (comment) Need to rerun validation and update results. Co-authored-by: Sunita Nadampalli <nadampal@amazon.com> Pull Request resolved: pytorch#126137 Approved by: https://github.com/jgong5, https://github.com/snadampal, https://github.com/atalman

## Description Fixes #114450. This PR builds upon the work from imzhuhl done in #114451. This PR requires #122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

@imzhuhl

## Description Fixes #114450. This PR builds upon the work from @imzhuhl done in #114451. This PR requires #122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 Pull Request resolved: #124350 Approved by: https://github.com/jgong5, https://github.com/desertfire

## Description Fixes #114450. This PR builds upon the work from imzhuhl done in #114451. This PR requires #122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

@imzhuhl

## Description Fixes #114450. This PR builds upon the work from @imzhuhl done in #114451. This PR requires #122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in #119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 Pull Request resolved: #124350 Approved by: https://github.com/jgong5, https://github.com/desertfire

@imzhuhl

## Description Fixes pytorch#114450. This PR builds upon the work from @imzhuhl done in pytorch#114451. This PR requires pytorch#122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in pytorch#119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 Pull Request resolved: pytorch#124350 Approved by: https://github.com/jgong5, https://github.com/desertfire

@imzhuhl

## Description Fixes pytorch#114450. This PR builds upon the work from @imzhuhl done in pytorch#114451. This PR requires pytorch#122472 to land firstly. We leverage the serialization and deserialization API from oneDNN v3.4.1 to save the opaque MKLDNN tensor during the compilation and restore the opaque tensor when loading the compiled .so. ideep version is updated so that we won't break any pipeline even if third_party/ideep is not updated at the same time. ### Test plan: ```sh python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_conv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_deconv_freezing_non_abi_compatible_cpu python -u test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCpu.test_linear_freezing_non_abi_compatible_cpu ``` ### TODOs in follow-up PRs 1. We found that using `AOTI_TORCH_CHECK` will cause performance drop on several models (`DistillGPT2`, `MBartForConditionalGeneration`, `T5ForConditionalGeneration`, `T5Small`) compared with JIT Inductor which uses `TORCH_CHECK`. This may need further discussion how to address (`AOTI_TORCH_CHECK` is introduced in pytorch#119220). 2. Freezing in non-ABI compatible mode will work with the support in this PR. While for ABI compatible mode, we need to firstly address this issue: `AssertionError: None, i.e. optional output is not supported`. https://github.com/pytorch/pytorch/blob/6c4f43f82675b5fcfe8cf3e5983d0c0f326408aa/torch/_inductor/codegen/cpp_wrapper_cpu.py#L2023-L2024 Pull Request resolved: pytorch#124350 Approved by: https://github.com/jgong5, https://github.com/desertfire

pytorch-bot bot added module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration topic: not user facing topic category labels Mar 22, 2024

Xia-Weiwen added the intel This tag is for PR from Intel label Mar 22, 2024

pytorchbot added the open source label Mar 22, 2024

Xia-Weiwen added 2 commits April 1, 2024 10:04

Upgrade submodule oneDNN to v3.4

10015cc

Upgrade submodule oneDNN to v3.4.1

7de7ad6

Xia-Weiwen force-pushed the upgrade_onednn_v3.4 branch from 64d7513 to 7de7ad6 Compare April 1, 2024 02:17

Xia-Weiwen requested review from Guobing-Chen, jgong5, chuanqi129, EikanWang and chunyuan-w April 1, 2024 02:46

chunyuan-w mentioned this pull request Apr 1, 2024

[AOTInductor] Need support to export freezing model #114450

Closed

jgong5 approved these changes Apr 1, 2024

View reviewed changes

Xia-Weiwen mentioned this pull request Apr 3, 2024

conv_transpose1d is 1000x times slow in torch 2.2.1+cpu vs torch 1.13.1+cpu #120982

Open

Xia-Weiwen marked this pull request as ready for review April 10, 2024 02:26

Xia-Weiwen requested review from malfet, atalman and jerryzh168 April 10, 2024 02:28

Xia-Weiwen marked this pull request as draft April 11, 2024 06:03

Xia-Weiwen removed request for malfet, jerryzh168 and atalman April 11, 2024 06:03

update third_party/mkl-dnn.BUILD

680c267

Xia-Weiwen marked this pull request as ready for review April 11, 2024 06:17

Xia-Weiwen closed this May 21, 2024

Xia-Weiwen deleted the upgrade_onednn_v3.4 branch May 21, 2024 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade submodule oneDNN to v3.4 #122472

Upgrade submodule oneDNN to v3.4 #122472

Xia-Weiwen commented Mar 22, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 22, 2024 •

edited

Loading

sanchitintel commented Mar 22, 2024

Xia-Weiwen commented Mar 22, 2024

Guobing-Chen commented Apr 10, 2024

Xia-Weiwen commented Apr 10, 2024 •

edited

Loading

Upgrade submodule oneDNN to v3.4 #122472

Upgrade submodule oneDNN to v3.4 #122472

Conversation

Xia-Weiwen commented Mar 22, 2024 • edited by pytorch-bot bot Loading

Improvements

Validation results on CPU

pytorch-bot bot commented Mar 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122472

❌ 2 New Failures

sanchitintel commented Mar 22, 2024

Xia-Weiwen commented Mar 22, 2024

Guobing-Chen commented Apr 10, 2024

Xia-Weiwen commented Apr 10, 2024 • edited Loading

Xia-Weiwen commented Mar 22, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 22, 2024 •

edited

Loading

Xia-Weiwen commented Apr 10, 2024 •

edited

Loading