-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade submodule onednn to v3.3.5 #120767
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120767
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0873e19 with merge base f72eb5a (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@Xia-Weiwen @jgong5 Please note here is the currently list of OneDNN regressions: #121150 |
@Xia-Weiwen unrelated to the above, can you please you markup instead of screenshots to represent perf gains, i.e. use something like:
... |
Sure. I have updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update likely fixes number of regressions, but completely lacks any testing, so nothing prevents one from re-introducing the same regressions during the next update. Please add tests for functional regressions to this PR
Thanks. I have added test cases for functionality issues listed above. |
@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I have a question about pytorch/builder#1716 and pytorch/builder#1717 And for oneapi-src/oneDNN#1773 it looks like PR against oneDNN were never posted... |
Hi @Xia-Weiwen , Could you please cherry-pick this oneDNN PR from oneDNN 3.4 into oneDNN 3.3.5 branch and update the pytorch git submodule commit? Some context and justification for this cherry-pick: oneDNN PR was merged on Feb 7th. oneapi-src/oneDNN#1768 into oneDNN 3.4. to avoid the cherry-picks, going forward, it will be great if you could publish oneDNN version for PyTorch upfront, so that we all can plan our PRs accordingly. |
@snadampal, please open PR with cherry picked changes in oneDNN repo targeting rls-v3.3 branch. I'll have v3.3.6 tagged as soon as all changes required for Pytorch land. Anything else we need besides oneapi-src/oneDNN#1768? |
@vpirogov thank you for the prompt reply. Let me create a cherry-pick for xbyak change (another ARM related one), but first check if those changes are in trunk already |
@vpirogov , thank you, will create it right away. |
@vpirogov @malfet Should we update this way 3.3.2 oneDNN release with this patch : We are having exactly same situation in patch release 2.2.2. Please see this comment: #120547 (comment) |
@atalman alas, there are no 3.3.2 vs 3.3.5 branches, but rather one |
@malfet, rls-v3.3 is used to tag patch releases and includes only bug fixes. This should be safe mechanism to pick up bug fixes when updating oneDNN version in Pytorch. |
@atalman, oneDNN 3.3.5 is already tagged and will not change. We should consider adding all the necessary fixes to rls-v3.3 branch and updating Pytorch to oneDNN v3.3.6. |
@vpirogov , here is the cherrypick PR for rls-v3.3: oneapi-src/oneDNN#1831 |
@vpirogov This is clear. Hence we will have to :
|
@atalman, I'm not intimately familiar with Pytorch branch management, but it sounds reasonable. oneapi-src/oneDNN#1831 has landed. If anything else is needed in oneDNN v3.3.6 please open PRs to rls-v3.3. Otherwise I'm ready to tag the patch. |
Hi @snadampal @atalman @malfet I am trying to figure out what we need to do next on our side. Looks like we need to
Is that correct? |
Hi @Xia-Weiwen , yes, that's what my understanding. my PR got merged to rls-v3.3. |
@snadampal How many patches/PRs are still pending and not landed in oneDNN rls-v3.3? And about the xbyak change you mentioned, will it be cherry-picked to oneDNN or Pytorch itself? |
Ok. Thanks. |
@Xia-Weiwen |
@atalman @malfet Got it. Thanks. |
Hi @snadampal Looks like you are the owner of the second PR (oneapi-src/oneDNN#1773). Will you cherry-pick this to oneDNN rls-v3.3 branch? |
Yes, I raised this oneapi-src/oneDNN#1773 for xbyak sysfs issue, but later I saw the similar fix fujitsu/xbyak_aarch64#96 from @malfet , where he fixed it in the xbyak repo instead of oneDNN. that's why I closed mine. Hi @Xia-Weiwen , is xbyak upgrade in oneDNN possible now? if it not a trivial change for oneDNN 3.3.x branch, I will cherrypick my oneDNN PR, oneapi-src/oneDNN#1773, into oneDNN 3.3 and 3.4 branches. |
@vpirogov, could you help @snadampal's question of xbyak upgrade in OneDNN. |
@snadampal, I would avoid upgrading xbyak in the patch release as it adds unnecessary risk. I suggest cherry picking fujitsu/xbyak_aarch64#96 into rls-v3.3 branch. Once this lands I'll have oneDNN v3.3.6 tagged. |
Hi @vpirogov , makes sense, here are the cherrypicked PRs, I did for both rls-v3.3 and rls-v3.4 branches. |
Thanks, @snadampal. oneDNN v3.3.6 is posted. |
@vpirogov, there seems 6 more commits since v3.3.5, with 4 related to AARCH64 and 2 related X64. Just to confirm with you, have OneDNN part done release test for this patch release version? And does the test covers previously escaped regression cases? Especially AARCH64 related. From our part, we plan to
|
Hi. FYI, the PR for oneDNN v3.3.6 upgrade is here: #122164 (under validation) |
Validation on x64 platforms is complete, no regressions identified. Validation of AArch64 changes was done by @snadampal. |
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700.
Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).
pytorch_stargan-train
(see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))Validation results with this patch: Latency increased by 0.60%
Validation results with this patch: Latency reduced by 3.23%
Validation results with this patch: Latency reduced by 0.85%
The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.
permute
+conv2d
crash/memory corruption on CPU in torch 2.2 #120211F.conv3d
function #120406RuntimeError: cannot create std::vector larger than max_size()
after transpose of input tensor #120547Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I. torchbench CPU userbenchmark test
II. Inductor FP32/AMP inference tests
i. FP32 static default
ii. FP32 dynamic default
iii. AMP static default
iv. AMP dynamic default
cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen