[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

Xia-Weiwen · 2024-05-14T01:47:30Z

Reopen of #122472

Improvements

This upgrade fixes the following issues:

conv_transpose1d is 1000x times slow in torch 2.2.1+cpu vs torch 1.13.1+cpu #120982

This upgrade brings the following new features:

Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor ([AOTInductor] Need support to export freezing model #114450)

Validation results on CPU

Original results with oneDNN v3.4.1 are here: #122472 (comment)

Need to rerun validation and update results.
Results of reruns

Torchbench cpu userbenchmark inference & training

Perf_Geomean	Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer	1.00x
eager_throughtput_fp32_infer	1.00x
jit_llga_throughtput_amp_bf16	0.99x
jit_llga_throughtput_fp32	1.00x
eager_throughtput_fx_int8	1.01x
eager_throughtput_bf16_train	0.99x
eager_throughtput_fp32_train	0.99x

Dynamo benchmarks

Precision	Shape	Wrapper	Thread	Eager Ratio old/new GEOMEAN	Inductor Ratio old/new GEOMEAN
Float32	Static	Default	Multiple	1.010737	1.019399
Float32	Static	Default	Single	1.002681	1.001732
Float32	Dynamic	Default	Multiple	1.006492	1.007722
Float32	Dynamic	Default	Single	1.003639	1.002965
AMP	Static	Default	Multiple	1.001126	1.005045
AMP	Static	Default	Single	0.995729	0.994613
AMP	Dynamic	Default	Multiple	0.995276	0.998698
AMP	Dynamic	Default	Single	0.997459	0.993833

All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team.

model_name	oneDNN v3.4.2	baseline	throughput ratio(new/old)
pytorch_stargan-train_throughput	69.759	85.378	0.8171

Dynamo benchmarks

Precision	Shape	Wrapper	Thread	Ratio old/new GEOMEAN	Ratio old/new GEOMEAN
				Eager	Inductor
Float32	Static	Default	Multiple	1.010736824	1.019399
			Single	1.002680965	1.001732
Float32	Dynamic	Default	Multiple	1.006492275	1.007722
			Single	1.003638601	1.002965
AMP	Static	Default	Multiple	1.001125822	1.005045
			Single	0.99572873	0.994613
AMP	Dynamic	Default	Multiple	0.995275854	0.998698
			Single	0.99745925	0.993833

Inductor benchmarks

Dtype	Throughput ratio new/old Geomean
all	99.60%
fp32	99.84%
bf16	99.55%
int8	99.08%
fp16	99.86%

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @snadampal

pytorch-bot · 2024-05-14T01:47:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126137

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1995054 with merge base 3c4058c ():

NEW FAILURE - The following job has failed:

docker-builds / docker-build (linux.12xlarge, pytorch-linux-focal-py3-clang10-onnx) (gh)
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/c/cfitsio/libcfitsio8_3.470-3_amd64.deb Undetermined Error [IP: 185.125.190.36 80]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2024-05-14T03:33:54Z

Hi @snadampal Please help check if the aarch64 failure is related to oneDNN upgrade. If yes, please help investigate. If no, we will start validation on x86 CPU.

milpuz01 · 2024-05-14T12:54:20Z

I think the problem here is that when we are building wheel (is this script used: https://github.com/pytorch/builder/blob/ead2e21b24038ed9117060bb2c00bc79c4e91d85/aarch64_linux/aarch64_wheel_ci_build.py#L81 to build linux-jammy-aarach64-py3.10 wheel or something else?) for CI it is using ACL 23.08 version where definition of Conv2dInfo has as one of the arguments experimental::PostOpList that got removed in ACL 23.11+ and updated in oneDNN 3.4.2 not to call it: https://github.com/oneapi-src/oneDNN/blob/1137e04ec0b5251ca2b4400a4fd3c667ce843d67/src/cpu/aarch64/acl_indirect_gemm_convolution.hpp#L49.

@snadampal is this right? Should CI build script be updated to use ACL 23.11+ to build with oneDNN 3.4+?

facebook-github-bot · 2024-05-14T13:40:33Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

snadampal · 2024-05-14T15:50:38Z

@Xia-Weiwen , the CI failure is because oneDNN3.4 requires ACL version update, here is the PR:#125549
both these PRs are inter-dependent.

Hi @atalman , could you please suggest how we can handle this dependency. one option is to merge aarch64-linux CI PR (#125549) first, so that the CI will be ready to test oneDNN 3.4 on aarch64-linux.

snadampal · 2024-05-14T15:51:48Z

Yes, @milpuz01 , it was the ACL version, i added the details above.

atalman · 2024-05-14T21:05:09Z

hi @snadampal and @Xia-Weiwen merging both PRs sounds like a good solution. @Xia-Weiwen could you please add this commit: 091f84e on top of this PR ?

This is required for oneDNN 3.4.x version

Xia-Weiwen · 2024-05-15T01:51:19Z

Hi @atalman @snadampal @milpuz01 I have added the commit 091f84e on top of this PR.

facebook-github-bot · 2024-05-15T19:18:52Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

atalman · 2024-05-16T11:58:29Z

@pytorchmergebot merge -f "All required workflows are green"

pytorchmergebot · 2024-05-16T12:00:01Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

chuanqi129 · 2024-05-17T14:24:46Z

Update oneDNN v3.4.2 acceptance test reports

Torchbench cpu userbenchmark inference & training

Perf_Geomean	Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer	1.00x
eager_throughtput_fp32_infer	1.00x
jit_llga_throughtput_amp_bf16	0.99x
jit_llga_throughtput_fp32	1.00x
eager_throughtput_fx_int8	1.01x
eager_throughtput_bf16_train	0.99x
eager_throughtput_fp32_train	0.99x

Dynamo benchmarks

Precision	Shape	Wrapper	Thread	Eager Ratio old/new GEOMEAN	Inductor Ratio old/new GEOMEAN
Float32	Static	Default	Multiple	1.010737	1.019399
Float32	Static	Default	Single	1.002681	1.001732
Float32	Dynamic	Default	Multiple	1.006492	1.007722
Float32	Dynamic	Default	Single	1.003639	1.002965
AMP	Static	Default	Multiple	1.001126	1.005045
AMP	Static	Default	Single	0.995729	0.994613
AMP	Dynamic	Default	Multiple	0.995276	0.998698
AMP	Dynamic	Default	Single	0.997459	0.993833

All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team. @atalman any comments on it?

model_name	oneDNN v3.4.2	baseline	throughput ratio(new/old)
pytorch_stargan-train_throughput	69.759	85.378	0.8171

Guobing-Chen · 2024-05-17T14:58:23Z

Update oneDNN v3.4.2 acceptance test reports

Torchbench cpu userbenchmark inference & training

Perf_Geomean Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer 1.00x
eager_throughtput_fp32_infer 1.00x
jit_llga_throughtput_amp_bf16 0.99x
jit_llga_throughtput_fp32 1.00x
eager_throughtput_fx_int8 1.01x
eager_throughtput_bf16_train 0.99x
eager_throughtput_fp32_train 0.99x
2. Dynamo benchmarks

Precision Shape Wrapper Thread Eager Ratio old/new GEOMEAN Inductor Ratio old/new GEOMEAN
Float32 Static Default Multiple 1.010737 1.019399
Float32 Static Default Single 1.002681 1.001732
Float32 Dynamic Default Multiple 1.006492 1.007722
Float32 Dynamic Default Single 1.003639 1.002965
AMP Static Default Multiple 1.001126 1.005045
AMP Static Default Single 0.995729 0.994613
AMP Dynamic Default Multiple 0.995276 0.998698
AMP Dynamic Default Single 0.997459 0.993833
All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team. @atalman any comments on it?

model_name oneDNN v3.4.2 baseline throughput ratio(new/old)
pytorch_stargan-train_throughput 69.759 85.378 0.8171

Thanks for Chuanqi's testing result, @vpirogov, could you also help comment the potential impact of this outlier?

Reopen of pytorch#122472 ## Improvements This upgrade fixes the following issues: - pytorch#120982 This upgrade brings the following new features: - Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (pytorch#114450) ## Validation results on CPU Original results with oneDNN v3.4.1 are here: pytorch#122472 (comment) Need to rerun validation and update results. Co-authored-by: Sunita Nadampalli <nadampal@amazon.com> Pull Request resolved: pytorch#126137 Approved by: https://github.com/jgong5, https://github.com/snadampal, https://github.com/atalman

Upgrade submodule oneDNN to v3.4.2

9da9159

pytorch-bot bot added ciflow/linux-aarch64 linux aarch64 CI workflow module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration topic: not user facing topic category labels May 14, 2024

Xia-Weiwen added ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel ciflow/xpu Run XPU CI tasks labels May 14, 2024

Xia-Weiwen requested review from Guobing-Chen, jgong5, chuanqi129, EikanWang and chunyuan-w May 14, 2024 01:48

Xia-Weiwen mentioned this pull request May 14, 2024

Upgrade submodule oneDNN to v3.4 #122472

Closed

pytorchbot added the open source label May 14, 2024

EikanWang removed the ciflow/xpu Run XPU CI tasks label May 14, 2024

Xia-Weiwen mentioned this pull request May 14, 2024

linux aarch64: CI: test_mkldnn started failing with oneDNN3.4+ACL24.04 version upgrade #125548

Closed

jgong5 approved these changes May 14, 2024

View reviewed changes

Xia-Weiwen marked this pull request as ready for review May 14, 2024 03:25

Xia-Weiwen requested review from malfet, atalman and snadampal May 14, 2024 03:25

CI: aarch64 linux: upgrade ACL version to 24.04

1995054

This is required for oneDNN 3.4.x version

Xia-Weiwen requested a review from jeffdaily as a code owner May 15, 2024 01:50

snadampal approved these changes May 16, 2024

View reviewed changes

atalman approved these changes May 16, 2024

View reviewed changes

pytorchmergebot added the merging label May 16, 2024

pytorchmergebot closed this in c2f8c75 May 16, 2024

pytorchmergebot added Merged and removed merging labels May 16, 2024

atalman mentioned this pull request May 16, 2024

Reland "aarch64: upgrade ACL version to 24.04"" pytorch/builder#1824

Merged

chunyuan-w mentioned this pull request May 20, 2024

upgrade ideep commit to include new serialize API #124349

Closed

chunyuan-w mentioned this pull request May 28, 2024

[AOTI] support freezing for MKLDNN #124350

Closed

snadampal mentioned this pull request May 28, 2024

CI: aarch64 linux: upgrade ACL version to 24.04 #125549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

Xia-Weiwen commented May 14, 2024 •

edited

Loading

pytorch-bot bot commented May 14, 2024 •

edited

Loading

Xia-Weiwen commented May 14, 2024

milpuz01 commented May 14, 2024

facebook-github-bot commented May 14, 2024

snadampal commented May 14, 2024

snadampal commented May 14, 2024

atalman commented May 14, 2024 •

edited

Loading

Xia-Weiwen commented May 15, 2024

facebook-github-bot commented May 15, 2024

atalman commented May 16, 2024

pytorchmergebot commented May 16, 2024

chuanqi129 commented May 17, 2024

Guobing-Chen commented May 17, 2024

[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

Conversation

Xia-Weiwen commented May 14, 2024 • edited Loading

Improvements

Validation results on CPU

pytorch-bot bot commented May 14, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126137

❌ 1 New Failure

Xia-Weiwen commented May 14, 2024

milpuz01 commented May 14, 2024

facebook-github-bot commented May 14, 2024

snadampal commented May 14, 2024

snadampal commented May 14, 2024

atalman commented May 14, 2024 • edited Loading

Xia-Weiwen commented May 15, 2024

facebook-github-bot commented May 15, 2024

atalman commented May 16, 2024

pytorchmergebot commented May 16, 2024

Merge started

chuanqi129 commented May 17, 2024

Guobing-Chen commented May 17, 2024

Xia-Weiwen commented May 14, 2024 •

edited

Loading

pytorch-bot bot commented May 14, 2024 •

edited

Loading

atalman commented May 14, 2024 •

edited

Loading