Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade submodule onednn to v3.3.5 #120767

Closed
wants to merge 6 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Feb 28, 2024

This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700.

Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).

  1. A performance regression with 5.8% perf drop from pytorch_stargan-train (see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))
    Validation results with this patch: Latency increased by 0.60%
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 418.851717
    }
}
oneDNN v3.3.4
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 421.381313
    }
}
  1. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
    Validation results with this patch: Latency reduced by 3.23%
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0

oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0
  1. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
    Validation results with this patch: Latency reduced by 0.85%
Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4
 
oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4

The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.


Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I. torchbench CPU userbenchmark test

Suite Speedup
eager_throughtput_bf16_infer 1.001848
eager_throughtput_fp32_infer 1.000257
eager_throughtput_fx_int8 1.003069
jit_llga_throughtput_amp_bf16 1.000682
jit_llga_throughtput_fp32 1.000313
eager_throughtput_bf16_train 0.998222
eager_throughtput_fp32_train 1.003384

II. Inductor FP32/AMP inference tests
i. FP32 static default

suite name thread batch size Ratio Speedup(New/old)
torchbench timm_efficientnet multiple 64 1.09
timm_models tinynet_a multiple 128 1.14

ii. FP32 dynamic default

suite name thread batch size Ratio Speedup(New/old)
torchbench alexnet multiple 128 1.08
torchbench basic_gnn_edgecnn multiple 1 0.98
torchbench timm_efficientnet multiple 64 1.08

iii. AMP static default

suite name thread batch size Ratio Speedup(New/old)
torchbench hf_distil_whisper multiple 1 1.18
torchbench timm_efficientnet multiple 64 1.32
huggingface BartForConditionalGeneration multiple 2 1.19
timm_models eca_halonext26ts multiple 128 1.13
timm_models nfnet_l0 multiple 128 1.13
timm_models rexnet_100 multiple 128 1.45
timm_models spnasnet_100 multiple 128 1.15
timm_models tf_efficientnet_b0 multiple 128 1.22
timm_models tinynet_a multiple 128 1.49
torchbench hf_Bert_large single 1 1.16
huggingface XLNetLMHeadModel single 1 1.07

iv. AMP dynamic default

suite name thread batch size Ratio Speedup(New/old)
torchbench timm_efficientnet multiple 64 1.32
huggingface PLBartForConditionalGeneration multiple 4 1.14
timm_models nfnet_l0 multiple 128 1.15
timm_models rexnet_100 multiple 128 1.45
timm_models tinynet_a multiple 128 1.34
huggingface XLNetLMHeadModel single 1 1.09

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Feb 28, 2024
Copy link

pytorch-bot bot commented Feb 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120767

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0873e19 with merge base f72eb5a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Feb 28, 2024
@Xia-Weiwen Xia-Weiwen changed the title [Not for land] Update onednn to latest rls v3.3 Upgrade submodule onednn to v3.3.5 Mar 1, 2024
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review March 1, 2024 03:53
@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@atalman
Copy link
Contributor

atalman commented Mar 4, 2024

@Xia-Weiwen @jgong5 Please note here is the currently list of OneDNN regressions: #121150

cc @malfet @albanD

@malfet
Copy link
Contributor

malfet commented Mar 4, 2024

@Xia-Weiwen unrelated to the above, can you please you markup instead of screenshots to represent perf gains, i.e. use something like:

Test name Geomean speedup
eager_throughput_bf16_eager 1.0018
eager_throughput_fp32_eager 1.00025

...

@Xia-Weiwen
Copy link
Collaborator Author

@Xia-Weiwen unrelated to the above, can you please you markup instead of screenshots to represent perf gains, i.e. use something like:

Test name Geomean speedup
eager_throughput_bf16_eager 1.0018
eager_throughput_fp32_eager 1.00025
...

Sure. I have updated.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update likely fixes number of regressions, but completely lacks any testing, so nothing prevents one from re-introducing the same regressions during the next update. Please add tests for functional regressions to this PR

@Xia-Weiwen
Copy link
Collaborator Author

This update likely fixes number of regressions, but completely lacks any testing, so nothing prevents one from re-introducing the same regressions during the next update. Please add tests for functional regressions to this PR

Thanks. I have added test cases for functionality issues listed above.

@Xia-Weiwen Xia-Weiwen requested a review from malfet March 6, 2024 08:40
@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@malfet
Copy link
Contributor

malfet commented Mar 13, 2024

I have a question about pytorch/builder#1716 and pytorch/builder#1717
Why those changes are not part of the release?
[Edit] I see that oneapi-src/oneDNN#1768 was accepted as part of 3.4, but not as 3.3.5, which makes total sense, question removed.

And for oneapi-src/oneDNN#1773 it looks like PR against oneDNN were never posted...

@snadampal
Copy link
Collaborator

snadampal commented Mar 13, 2024

Hi @Xia-Weiwen , Could you please cherry-pick this oneDNN PR from oneDNN 3.4 into oneDNN 3.3.5 branch and update the pytorch git submodule commit?
oneapi-src/oneDNN#1768

Some context and justification for this cherry-pick:
The PR is required to make torch.compile() performance on aarch64 not to regress compared to eager mode.
The PyTorch changes were merged on Dec 7th: #115037 into main, we could not enable the feature in PyTorch 2.2 because oneDNN PR was not merged then.

oneDNN PR was merged on Feb 7th. oneapi-src/oneDNN#1768 into oneDNN 3.4.
Today we are still not able to enable this feature in PyTorch 2.3 (even after 3 months of being it ready) because PyTorch 2.3 is going with oneDNN 3.3.5.
If we miss now, we need to wait till July to release this feature.
btw, I have raised this ideep issue last week for the same: intel/ideep#288

cc: @jgong5 , @malfet

to avoid the cherry-picks, going forward, it will be great if you could publish oneDNN version for PyTorch upfront, so that we all can plan our PRs accordingly.

@vpirogov
Copy link

@snadampal, please open PR with cherry picked changes in oneDNN repo targeting rls-v3.3 branch. I'll have v3.3.6 tagged as soon as all changes required for Pytorch land. Anything else we need besides oneapi-src/oneDNN#1768?

@malfet
Copy link
Contributor

malfet commented Mar 13, 2024

@vpirogov thank you for the prompt reply. Let me create a cherry-pick for xbyak change (another ARM related one), but first check if those changes are in trunk already

@snadampal
Copy link
Collaborator

@vpirogov , thank you, will create it right away.

@atalman
Copy link
Contributor

atalman commented Mar 13, 2024

@vpirogov @malfet Should we update this way 3.3.2 oneDNN release with this patch :
oneapi-src/oneDNN@bbaec14
To fix this issue:
#120547

We are having exactly same situation in patch release 2.2.2. Please see this comment: #120547 (comment)

@malfet
Copy link
Contributor

malfet commented Mar 13, 2024

Should we update this way 3.3.2 oneDNN release with this patch : oneapi-src/oneDNN@bbaec14 To fix this issue: #120547

@atalman alas, there are no 3.3.2 vs 3.3.5 branches, but rather one rls-v3.3 so no, it's not possible, we need to do the patching on our end to fix that issue

@vpirogov
Copy link

@malfet, rls-v3.3 is used to tag patch releases and includes only bug fixes. This should be safe mechanism to pick up bug fixes when updating oneDNN version in Pytorch.

@atalman
Copy link
Contributor

atalman commented Mar 13, 2024

@vpirogov @malfet In this case should we consider moving to oneDNN 3.3.5 with all the fixes for 2.2.2 release ? There are about ~20 cherry-picks difference I see between 3.3.5 and 3.3.2 release. This way we can totally avoid having oneDNN patches like above.

@vpirogov
Copy link

@atalman, oneDNN 3.3.5 is already tagged and will not change. We should consider adding all the necessary fixes to rls-v3.3 branch and updating Pytorch to oneDNN v3.3.6.

@snadampal
Copy link
Collaborator

@vpirogov , here is the cherrypick PR for rls-v3.3: oneapi-src/oneDNN#1831

@atalman
Copy link
Contributor

atalman commented Mar 13, 2024

@vpirogov This is clear. Hence we will have to :

  • Land it in main
  • Cherry-pick into release 2.3 branch
    I am proposing to cherry-pick it in 2.2 branch at the same time

@vpirogov
Copy link

@atalman, I'm not intimately familiar with Pytorch branch management, but it sounds reasonable.

oneapi-src/oneDNN#1831 has landed. If anything else is needed in oneDNN v3.3.6 please open PRs to rls-v3.3. Otherwise I'm ready to tag the patch.

@Xia-Weiwen
Copy link
Collaborator Author

Hi @snadampal @atalman @malfet I am trying to figure out what we need to do next on our side. Looks like we need to

  • wait for all patches related to aarch64 issues landed in oneDNN rls-v3.3
  • wait for oneDNN v3.3.6 tag & release
  • upgrade oneDNN in PyTorch main to v3.3.6

Is that correct?

@snadampal
Copy link
Collaborator

Hi @Xia-Weiwen , yes, that's what my understanding. my PR got merged to rls-v3.3.
@malfet was planning to check the xbyak change whether it can be cherrypicked too.

@Xia-Weiwen
Copy link
Collaborator Author

@snadampal How many patches/PRs are still pending and not landed in oneDNN rls-v3.3? And about the xbyak change you mentioned, will it be cherry-picked to oneDNN or Pytorch itself?

@snadampal
Copy link
Collaborator

I was tracking only the onednn#1768 which was landed via onednn#1831. may be @malfet or @atalman can comment on the other.

@Xia-Weiwen
Copy link
Collaborator Author

Ok. Thanks.
Hi @atalman @malfet Since PT 2.3 branch cut is already done, how shall we upgrade oneDNN for PT 2.3 and PT 2.4? We have planned to upgrade oneDNN to v3.4 for PT 2.4 and we hope the upgrade to v3.4 can be done in PyTorch main branch by this month. Looks like we need to (1) upgrade oneDNN to v3.3.6 in main branch; then (2) cherry-pick it to 2.3 release branch; and finally (3) upgrade oneDNN to v3.4 in main. Do you have any suggestions or comments? Thanks!

@atalman
Copy link
Contributor

atalman commented Mar 14, 2024

@Xia-Weiwen
Yes this is correct:
(1) upgrade oneDNN to v3.3.6 in main branch; then
(2) cherry-pick it to 2.3 release branch; and finally
(3) upgrade oneDNN to v3.4 in main.

@Xia-Weiwen
Copy link
Collaborator Author

@atalman @malfet Got it. Thanks.
I am wondering when oneDNN v3.3.6 will be ready. Looks like there are two PRs that are needed and one of them (oneapi-src/oneDNN#1831) is landed. What about the other (oneapi-src/oneDNN#1773)?

@Xia-Weiwen
Copy link
Collaborator Author

Hi @snadampal Looks like you are the owner of the second PR (oneapi-src/oneDNN#1773). Will you cherry-pick this to oneDNN rls-v3.3 branch?

@snadampal
Copy link
Collaborator

Yes, I raised this oneapi-src/oneDNN#1773 for xbyak sysfs issue, but later I saw the similar fix fujitsu/xbyak_aarch64#96 from @malfet , where he fixed it in the xbyak repo instead of oneDNN. that's why I closed mine.

Hi @Xia-Weiwen , is xbyak upgrade in oneDNN possible now?

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

@Guobing-Chen
Copy link
Collaborator

Guobing-Chen commented Mar 15, 2024

Yes, I raised this oneapi-src/oneDNN#1773 for xbyak sysfs issue, but later I saw the similar fix fujitsu/xbyak_aarch64#96 from @malfet , where he fixed it in the xbyak repo instead of oneDNN. that's why I closed mine.

Hi @Xia-Weiwen , is xbyak upgrade in oneDNN possible now?

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

@vpirogov, could you help @snadampal's question of xbyak upgrade in OneDNN.

@vpirogov
Copy link

vpirogov commented Mar 15, 2024

if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches.

@snadampal, I would avoid upgrading xbyak in the patch release as it adds unnecessary risk. I suggest cherry picking fujitsu/xbyak_aarch64#96 into rls-v3.3 branch. Once this lands I'll have oneDNN v3.3.6 tagged.

@snadampal
Copy link
Collaborator

Hi @vpirogov , makes sense, here are the cherrypicked PRs, I did for both rls-v3.3 and rls-v3.4 branches.

oneapi-src/oneDNN#1834
oneapi-src/oneDNN#1835

@vpirogov
Copy link

Thanks, @snadampal.

oneDNN v3.3.6 is posted.

@Guobing-Chen
Copy link
Collaborator

Thanks, @snadampal.

oneDNN v3.3.6 is posted.

@vpirogov, there seems 6 more commits since v3.3.5, with 4 related to AARCH64 and 2 related X64. Just to confirm with you, have OneDNN part done release test for this patch release version? And does the test covers previously escaped regression cases? Especially AARCH64 related.

From our part, we plan to

  1. Submit a new PR for this v3.3.6 version
  2. Do a round of test with same scope as previous we tested for OneDNN v3.3.5 upgrade
  3. And suppose we also need help from @milpuz01 to validate for AARCH64 also.

Any comments? @malfet and @atalman

@Xia-Weiwen
Copy link
Collaborator Author

Hi. FYI, the PR for oneDNN v3.3.6 upgrade is here: #122164 (under validation)

@Guobing-Chen
Copy link
Collaborator

@malfet and @atalman, there is another issue marked by @albanD as for PT 2.3 -- #120982. OneDNN team seems have fixed this issue at main branch. Do you suggest to have this fix included for PT 2.3? If yes, we may need OneDNN team to backport this fix into v3.3 branch and tag a new version.

@vpirogov
Copy link

@vpirogov, there seems 6 more commits since v3.3.5, with 4 related to AARCH64 and 2 related X64. Just to confirm with you, have OneDNN part done release test for this patch release version? And does the test covers previously escaped regression cases? Especially AARCH64 related.

Validation on x64 platforms is complete, no regressions identified. Validation of AArch64 changes was done by @snadampal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet