Upgrade submodule onednn to v3.3.5 #120767

Xia-Weiwen · 2024-02-28T03:16:35Z

This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700.

Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).

A performance regression with 5.8% perf drop from pytorch_stargan-train (see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))
Validation results with this patch: Latency increased by 0.60%

Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 418.851717
    }
}
oneDNN v3.3.4
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 421.381313
    }
}

Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
Validation results with this patch: Latency reduced by 3.23%

Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0

oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0

Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
Validation results with this patch: Latency reduced by 0.85%

Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4
 
oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4

The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.

Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I. torchbench CPU userbenchmark test

Suite	Speedup
eager_throughtput_bf16_infer	1.001848
eager_throughtput_fp32_infer	1.000257
eager_throughtput_fx_int8	1.003069
jit_llga_throughtput_amp_bf16	1.000682
jit_llga_throughtput_fp32	1.000313
eager_throughtput_bf16_train	0.998222
eager_throughtput_fp32_train	1.003384

II. Inductor FP32/AMP inference tests
i. FP32 static default

suite	name	thread	batch size	Ratio Speedup(New/old)
torchbench	timm_efficientnet	multiple	64	1.09
timm_models	tinynet_a	multiple	128	1.14

ii. FP32 dynamic default

suite	name	thread	batch size	Ratio Speedup(New/old)
torchbench	alexnet	multiple	128	1.08
torchbench	basic_gnn_edgecnn	multiple	1	0.98
torchbench	timm_efficientnet	multiple	64	1.08

iii. AMP static default

suite	name	thread	batch size	Ratio Speedup(New/old)
torchbench	hf_distil_whisper	multiple	1	1.18
torchbench	timm_efficientnet	multiple	64	1.32
huggingface	BartForConditionalGeneration	multiple	2	1.19
timm_models	eca_halonext26ts	multiple	128	1.13
timm_models	nfnet_l0	multiple	128	1.13
timm_models	rexnet_100	multiple	128	1.45
timm_models	spnasnet_100	multiple	128	1.15
timm_models	tf_efficientnet_b0	multiple	128	1.22
timm_models	tinynet_a	multiple	128	1.49
torchbench	hf_Bert_large	single	1	1.16
huggingface	XLNetLMHeadModel	single	1	1.07

iv. AMP dynamic default

suite	name	thread	batch size	Ratio Speedup(New/old)
torchbench	timm_efficientnet	multiple	64	1.32
huggingface	PLBartForConditionalGeneration	multiple	4	1.14
timm_models	nfnet_l0	multiple	128	1.15
timm_models	rexnet_100	multiple	128	1.45
timm_models	tinynet_a	multiple	128	1.34
huggingface	XLNetLMHeadModel	single	1	1.09

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen

pytorch-bot · 2024-02-28T03:16:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120767

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0873e19 with merge base f72eb5a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-03-04T16:55:17Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

atalman · 2024-03-04T19:44:24Z

@Xia-Weiwen @jgong5 Please note here is the currently list of OneDNN regressions: #121150

cc @malfet @albanD

malfet · 2024-03-04T21:33:41Z

@Xia-Weiwen unrelated to the above, can you please you markup instead of screenshots to represent perf gains, i.e. use something like:

Test name	Geomean speedup
eager_throughput_bf16_eager	1.0018
eager_throughput_fp32_eager	1.00025

...

Xia-Weiwen · 2024-03-05T01:06:03Z

@Xia-Weiwen unrelated to the above, can you please you markup instead of screenshots to represent perf gains, i.e. use something like:

Test name Geomean speedup
eager_throughput_bf16_eager 1.0018
eager_throughput_fp32_eager 1.00025
...

Sure. I have updated.

malfet

This update likely fixes number of regressions, but completely lacks any testing, so nothing prevents one from re-introducing the same regressions during the next update. Please add tests for functional regressions to this PR

Xia-Weiwen · 2024-03-06T08:40:01Z

This update likely fixes number of regressions, but completely lacks any testing, so nothing prevents one from re-introducing the same regressions during the next update. Please add tests for functional regressions to this PR

Thanks. I have added test cases for functionality issues listed above.

facebook-github-bot · 2024-03-06T14:04:01Z

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

malfet · 2024-03-13T15:43:38Z

I have a question about pytorch/builder#1716 and pytorch/builder#1717
Why those changes are not part of the release?
[Edit] I see that oneapi-src/oneDNN#1768 was accepted as part of 3.4, but not as 3.3.5, which makes total sense, question removed.

And for oneapi-src/oneDNN#1773 it looks like PR against oneDNN were never posted...

snadampal · 2024-03-13T18:17:38Z

Hi @Xia-Weiwen , Could you please cherry-pick this oneDNN PR from oneDNN 3.4 into oneDNN 3.3.5 branch and update the pytorch git submodule commit?
oneapi-src/oneDNN#1768

Some context and justification for this cherry-pick:
The PR is required to make torch.compile() performance on aarch64 not to regress compared to eager mode.
The PyTorch changes were merged on Dec 7th: #115037 into main, we could not enable the feature in PyTorch 2.2 because oneDNN PR was not merged then.

oneDNN PR was merged on Feb 7th. oneapi-src/oneDNN#1768 into oneDNN 3.4.
Today we are still not able to enable this feature in PyTorch 2.3 (even after 3 months of being it ready) because PyTorch 2.3 is going with oneDNN 3.3.5.
If we miss now, we need to wait till July to release this feature.
btw, I have raised this ideep issue last week for the same: intel/ideep#288

cc: @jgong5 , @malfet

to avoid the cherry-picks, going forward, it will be great if you could publish oneDNN version for PyTorch upfront, so that we all can plan our PRs accordingly.

vpirogov · 2024-03-13T18:27:22Z

@snadampal, please open PR with cherry picked changes in oneDNN repo targeting rls-v3.3 branch. I'll have v3.3.6 tagged as soon as all changes required for Pytorch land. Anything else we need besides oneapi-src/oneDNN#1768?

malfet · 2024-03-13T18:53:42Z

@vpirogov thank you for the prompt reply. Let me create a cherry-pick for xbyak change (another ARM related one), but first check if those changes are in trunk already

snadampal · 2024-03-13T19:07:43Z

@vpirogov , thank you, will create it right away.

atalman · 2024-03-13T19:53:19Z

@vpirogov @malfet Should we update this way 3.3.2 oneDNN release with this patch :
oneapi-src/oneDNN@bbaec14
To fix this issue:
#120547

We are having exactly same situation in patch release 2.2.2. Please see this comment: #120547 (comment)

malfet · 2024-03-13T20:00:33Z

Should we update this way 3.3.2 oneDNN release with this patch : oneapi-src/oneDNN@bbaec14 To fix this issue: #120547

@atalman alas, there are no 3.3.2 vs 3.3.5 branches, but rather one rls-v3.3 so no, it's not possible, we need to do the patching on our end to fix that issue

vpirogov · 2024-03-13T20:08:41Z

@malfet, rls-v3.3 is used to tag patch releases and includes only bug fixes. This should be safe mechanism to pick up bug fixes when updating oneDNN version in Pytorch.

atalman · 2024-03-13T20:16:15Z

@vpirogov @malfet In this case should we consider moving to oneDNN 3.3.5 with all the fixes for 2.2.2 release ? There are about ~20 cherry-picks difference I see between 3.3.5 and 3.3.2 release. This way we can totally avoid having oneDNN patches like above.

vpirogov · 2024-03-13T20:25:11Z

@atalman, oneDNN 3.3.5 is already tagged and will not change. We should consider adding all the necessary fixes to rls-v3.3 branch and updating Pytorch to oneDNN v3.3.6.

snadampal · 2024-03-13T20:29:20Z

@vpirogov , here is the cherrypick PR for rls-v3.3: oneapi-src/oneDNN#1831

atalman · 2024-03-13T20:30:41Z

@vpirogov This is clear. Hence we will have to :

Land it in main
Cherry-pick into release 2.3 branch
I am proposing to cherry-pick it in 2.2 branch at the same time

vpirogov · 2024-03-13T23:48:54Z

@atalman, I'm not intimately familiar with Pytorch branch management, but it sounds reasonable.

oneapi-src/oneDNN#1831 has landed. If anything else is needed in oneDNN v3.3.6 please open PRs to rls-v3.3. Otherwise I'm ready to tag the patch.

Xia-Weiwen · 2024-03-14T02:05:48Z

Hi @snadampal @atalman @malfet I am trying to figure out what we need to do next on our side. Looks like we need to

wait for all patches related to aarch64 issues landed in oneDNN rls-v3.3
wait for oneDNN v3.3.6 tag & release
upgrade oneDNN in PyTorch main to v3.3.6

Is that correct?

snadampal · 2024-03-14T03:34:01Z

Hi @Xia-Weiwen , yes, that's what my understanding. my PR got merged to rls-v3.3.
@malfet was planning to check the xbyak change whether it can be cherrypicked too.

Xia-Weiwen · 2024-03-14T03:48:41Z

@snadampal How many patches/PRs are still pending and not landed in oneDNN rls-v3.3? And about the xbyak change you mentioned, will it be cherry-picked to oneDNN or Pytorch itself?

snadampal · 2024-03-14T04:23:26Z

I was tracking only the onednn#1768 which was landed via onednn#1831. may be @malfet or @atalman can comment on the other.

Xia-Weiwen · 2024-03-14T05:00:00Z

Ok. Thanks.
Hi @atalman @malfet Since PT 2.3 branch cut is already done, how shall we upgrade oneDNN for PT 2.3 and PT 2.4? We have planned to upgrade oneDNN to v3.4 for PT 2.4 and we hope the upgrade to v3.4 can be done in PyTorch main branch by this month. Looks like we need to (1) upgrade oneDNN to v3.3.6 in main branch; then (2) cherry-pick it to 2.3 release branch; and finally (3) upgrade oneDNN to v3.4 in main. Do you have any suggestions or comments? Thanks!

atalman · 2024-03-14T19:19:14Z

@Xia-Weiwen
Yes this is correct:
(1) upgrade oneDNN to v3.3.6 in main branch; then
(2) cherry-pick it to 2.3 release branch; and finally
(3) upgrade oneDNN to v3.4 in main.

Xia-Weiwen · 2024-03-15T01:03:21Z

@atalman @malfet Got it. Thanks.
I am wondering when oneDNN v3.3.6 will be ready. Looks like there are two PRs that are needed and one of them (oneapi-src/oneDNN#1831) is landed. What about the other (oneapi-src/oneDNN#1773)?

Xia-Weiwen · 2024-03-15T01:08:37Z

Hi @snadampal Looks like you are the owner of the second PR (oneapi-src/oneDNN#1773). Will you cherry-pick this to oneDNN rls-v3.3 branch?

snadampal · 2024-03-15T12:52:35Z

Yes, I raised this oneapi-src/oneDNN#1773 for xbyak sysfs issue, but later I saw the similar fix fujitsu/xbyak_aarch64#96 from @malfet , where he fixed it in the xbyak repo instead of oneDNN. that's why I closed mine.

Hi @Xia-Weiwen , is xbyak upgrade in oneDNN possible now?

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

Guobing-Chen · 2024-03-15T13:04:17Z

Yes, I raised this oneapi-src/oneDNN#1773 for xbyak sysfs issue, but later I saw the similar fix fujitsu/xbyak_aarch64#96 from @malfet , where he fixed it in the xbyak repo instead of oneDNN. that's why I closed mine.

Hi @Xia-Weiwen , is xbyak upgrade in oneDNN possible now?

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

@vpirogov, could you help @snadampal's question of xbyak upgrade in OneDNN.

vpirogov · 2024-03-15T16:05:46Z

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

@snadampal, I would avoid upgrading xbyak in the patch release as it adds unnecessary risk. I suggest cherry picking fujitsu/xbyak_aarch64#96 into rls-v3.3 branch. Once this lands I'll have oneDNN v3.3.6 tagged.

snadampal · 2024-03-15T16:14:30Z

Hi @vpirogov , makes sense, here are the cherrypicked PRs, I did for both rls-v3.3 and rls-v3.4 branches.

oneapi-src/oneDNN#1834
oneapi-src/oneDNN#1835

vpirogov · 2024-03-18T15:49:21Z

Thanks, @snadampal.

oneDNN v3.3.6 is posted.

Guobing-Chen · 2024-03-19T01:21:43Z

Thanks, @snadampal.

oneDNN v3.3.6 is posted.

@vpirogov, there seems 6 more commits since v3.3.5, with 4 related to AARCH64 and 2 related X64. Just to confirm with you, have OneDNN part done release test for this patch release version? And does the test covers previously escaped regression cases? Especially AARCH64 related.

From our part, we plan to

Submit a new PR for this v3.3.6 version
Do a round of test with same scope as previous we tested for OneDNN v3.3.5 upgrade
And suppose we also need help from @milpuz01 to validate for AARCH64 also.

Any comments? @malfet and @atalman

Xia-Weiwen · 2024-03-19T01:25:24Z

Hi. FYI, the PR for oneDNN v3.3.6 upgrade is here: #122164 (under validation)

Guobing-Chen · 2024-03-19T06:56:41Z

@malfet and @atalman, there is another issue marked by @albanD as for PT 2.3 -- #120982. OneDNN team seems have fixed this issue at main branch. Do you suggest to have this fix included for PT 2.3? If yes, we may need OneDNN team to backport this fix into v3.3 branch and tag a new version.

vpirogov · 2024-03-19T20:20:00Z

@vpirogov, there seems 6 more commits since v3.3.5, with 4 related to AARCH64 and 2 related X64. Just to confirm with you, have OneDNN part done release test for this patch release version? And does the test covers previously escaped regression cases? Especially AARCH64 related.

Validation on x64 platforms is complete, no regressions identified. Validation of AArch64 changes was done by @snadampal.

Xia-Weiwen added 2 commits February 27, 2024 16:01

Upgrade oneDNN submodule to rls-v3.4 commit 20d2a43

2064f35

Checkout oneDNN to rls-v3.3 commit bbaec14

8e61219

pytorch-bot bot added the topic: not user facing topic category label Feb 28, 2024

github-actions bot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Feb 28, 2024

pytorchbot added the open source label Feb 28, 2024

Switch oneDNN to v3.3.5

49080dd

Xia-Weiwen changed the title ~~[Not for land] Update onednn to latest rls v3.3~~ Upgrade submodule onednn to v3.3.5 Mar 1, 2024

Xia-Weiwen requested review from Guobing-Chen, jgong5, chuanqi129, yanbing-j and chunyuan-w March 1, 2024 03:53

Xia-Weiwen marked this pull request as ready for review March 1, 2024 03:53

chuanqi129 approved these changes Mar 1, 2024

View reviewed changes

Xia-Weiwen requested a review from atalman March 1, 2024 03:58

This was referenced Mar 1, 2024

permute + conv2d crash/memory corruption on CPU in torch 2.2 #120211

Closed

Segmentation Fault in PyTorch 2.2 with F.conv3d function #120406

Closed

jgong5 approved these changes Mar 1, 2024

View reviewed changes

Xia-Weiwen mentioned this pull request Mar 5, 2024

F.conv1d fails with RuntimeError: cannot create std::vector larger than max_size() after transpose of input tensor #120547

Closed

malfet requested changes Mar 5, 2024

View reviewed changes

Xia-Weiwen added 2 commits March 6, 2024 16:25

Add UT for issues fixed by this upgrad

d24750c

Merge branch 'main' into update_onednn_rls_v3.3

5511a6a

Xia-Weiwen requested a review from malfet March 6, 2024 08:40

Upgrade submodule onednn to v3.3.5 #120767

Upgrade submodule onednn to v3.3.5 #120767

Conversation

Xia-Weiwen commented Feb 28, 2024 • edited

pytorch-bot bot commented Feb 28, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120767

✅ No Failures

facebook-github-bot commented Mar 4, 2024

atalman commented Mar 4, 2024 • edited

malfet commented Mar 4, 2024

Xia-Weiwen commented Mar 5, 2024

malfet left a comment

Choose a reason for hiding this comment

Xia-Weiwen commented Mar 6, 2024

facebook-github-bot commented Mar 6, 2024

malfet commented Mar 13, 2024 • edited

snadampal commented Mar 13, 2024 • edited

vpirogov commented Mar 13, 2024

malfet commented Mar 13, 2024

snadampal commented Mar 13, 2024

atalman commented Mar 13, 2024 • edited

malfet commented Mar 13, 2024 • edited

vpirogov commented Mar 13, 2024

atalman commented Mar 13, 2024

vpirogov commented Mar 13, 2024

snadampal commented Mar 13, 2024

atalman commented Mar 13, 2024

vpirogov commented Mar 13, 2024

Xia-Weiwen commented Mar 14, 2024

snadampal commented Mar 14, 2024

Xia-Weiwen commented Mar 14, 2024

snadampal commented Mar 14, 2024

Xia-Weiwen commented Mar 14, 2024

atalman commented Mar 14, 2024

Xia-Weiwen commented Mar 15, 2024

Xia-Weiwen commented Mar 15, 2024

snadampal commented Mar 15, 2024

Guobing-Chen commented Mar 15, 2024 • edited

vpirogov commented Mar 15, 2024 • edited

snadampal commented Mar 15, 2024

vpirogov commented Mar 18, 2024

Guobing-Chen commented Mar 19, 2024

Xia-Weiwen commented Mar 19, 2024

Guobing-Chen commented Mar 19, 2024

vpirogov commented Mar 19, 2024

Xia-Weiwen commented Feb 28, 2024 •

edited

pytorch-bot bot commented Feb 28, 2024 •

edited

atalman commented Mar 4, 2024 •

edited

malfet commented Mar 13, 2024 •

edited

snadampal commented Mar 13, 2024 •

edited

atalman commented Mar 13, 2024 •

edited

malfet commented Mar 13, 2024 •

edited

Guobing-Chen commented Mar 15, 2024 •

edited

vpirogov commented Mar 15, 2024 •

edited