Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade submodule oneDNN to v3.3.6 #122164

Closed
wants to merge 1 commit into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Mar 19, 2024

As the title. Including issue fixes for aarch64:


Validation results

(on Intel CPU + Linux)
Static quantization with Inductor on CV models

Quant method Geomean throughput ratio (v3.3.6/baseline)
ptq 0.982937
ptq (cpp wrapper) 0.978384
qat 0.978828

Torchbench cpu userbenchmark with Inductor

Items Perf Geomean Ratio (v3.3.6/baseline)
eager_throughtput_bf16_infer 1.00x
eager_throughtput_fp32_infer 1.00x
jit_llga_throughtput_amp_bf16 1.01x
jit_llga_throughtput_fp32 1.00x
eager_throughtput_fx_int8 1.00x
eager_throughtput_bf16_train 1.46x
eager_throughtput_fp32_train 1.41x

Dynamo benchmarks tests

Precision Shape Wrapper Thread Eager old/new GEOMEAN Inductor old/new GEOMEAN
Float32 Static Default Multiple 1.003836812 1.003425
Float32 Static Default Single 1.000181451 0.999611
Float32 Dynamic Default Multiple 1.003980183 1.006563
Float32 Dynamic Default Single 1.000076939 0.999969
AMP Static Default Multiple 0.996824772 0.998715
AMP Static Default Single 0.996402574 1.001483
AMP Dynamic Default Multiple 0.994919866 1.000467
AMP Dynamic Default Single 0.9962054 1.000767

(on Aarch64)
#122164 (comment)


cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen

Copy link

pytorch-bot bot commented Mar 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122164

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d637dae with merge base ae983d2 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration topic: not user facing topic category labels Mar 19, 2024
@Xia-Weiwen Xia-Weiwen added the intel This tag is for PR from Intel label Mar 19, 2024
Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to provide more details (e.g., links to the issues) on the motivation of this upgrade.

@Xia-Weiwen
Copy link
Collaborator Author

Suggest to provide more details (e.g., links to the issues) on the motivation of this upgrade.

Thanks. I have added links.

@Xia-Weiwen Xia-Weiwen requested a review from jgong5 March 19, 2024 01:57
@snadampal
Copy link
Collaborator

I have tested aarch64 linux for torch.compile and #115482
btw, I did source build with the commits updated for ideep and mkl-dnn for v3.3.6..
The changes look good to me.

@Xia-Weiwen
Copy link
Collaborator Author

I have tested aarch64 linux for torch.compile and #115482 btw, I did source build with the commits updated for ideep and mkl-dnn for v3.3.6.. The changes look good to me.

@snadampal Thanks for the results. Are you also able to validate on Apple Silicon? We don't have such platforms.

@snadampal
Copy link
Collaborator

@Xia-Weiwen , no, i don't have the platform either.

@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review March 25, 2024 06:18
@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Mar 25, 2024

Hi @atalman @malfet We have done validation on our side and results are given at the top. It is now ready for review. Could you please review this PR? Thanks!

@Guobing-Chen
Copy link
Collaborator

Guobing-Chen commented Mar 25, 2024

@milpuz01, as you will also do some test on ARM platforms, could you help share your test result when available? Then we can have broader coverage to verify this upgrade.

@Guobing-Chen
Copy link
Collaborator

Guobing-Chen commented Mar 25, 2024

@malfet and @atalman, we've finished the upgrade test with test scope and result pasted in the PR description. And @milpuz01 from ARM will also help test from ARM platform perspective to verify the upgrade. Could you review and see whether the test scope is good enough for you? (One missing part is that we don't have Apple platform coverage due to hardware not available)

And there is another issue marked by @albanD as for PT 2.3 -- #120982. OneDNN team seems have fixed this issue at main branch. Do you suggest to have this fix included for PT 2.3 or not? If yes, we need OneDNN team to backport this fix into v3.3 branch and tag a new version again, and a new round of test to verify the new version, which will consume more additional weeks, not sure it will be a good fit for PT 2.3 schedule.

@malfet
Copy link
Contributor

malfet commented Mar 28, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 28, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: .github/workflows/trunk.yml / macos-12-py3-arm64-mps / test (mps, 1, 1, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@snadampal
Copy link
Collaborator

Looks like the macos-arm64 failure is not related to this PR, present on the base as well.

@atalman
Copy link
Contributor

atalman commented Mar 28, 2024

@pytorchbot merge -f "All required changes are passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@snadampal
Copy link
Collaborator

thanks for merging it @atalman .
Hi @Xia-Weiwen , would you be able to raise cherrypick for pytorch 2.3 release branch?

Xia-Weiwen added a commit to Xia-Weiwen/pytorch that referenced this pull request Mar 29, 2024
As the title. Including issue fixes for aarch64:
- oneapi-src/oneDNN#1831
- oneapi-src/oneDNN#1834

---

## Validation results
(on Intel CPU + Linux)
**Static quantization with Inductor on CV models**

Quant method | Geomean throughput ratio (v3.3.6/baseline)
-- | --
ptq | 0.982937
ptq (cpp wrapper) | 0.978384
qat | 0.978828

**Torchbench cpu userbenchmark with Inductor**

Items | Perf Geomean Ratio (v3.3.6/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 1.01x
jit_llga_throughtput_fp32 | 1.00x
eager_throughtput_fx_int8 | 1.00x
eager_throughtput_bf16_train | 1.46x
eager_throughtput_fp32_train | 1.41x

**Dynamo benchmarks tests**
Precision | Shape | Wrapper | Thread | Eager old/new GEOMEAN | Inductor old/new GEOMEAN
-- | -- | -- | -- | -- | --
Float32 | Static | Default | Multiple | 1.003836812 | 1.003425
Float32 | Static | Default | Single | 1.000181451 | 0.999611
Float32 | Dynamic | Default | Multiple | 1.003980183 | 1.006563
Float32 | Dynamic | Default | Single | 1.000076939 | 0.999969
AMP | Static | Default | Multiple | 0.996824772 | 0.998715
AMP | Static | Default | Single | 0.996402574 | 1.001483
AMP | Dynamic | Default | Multiple | 0.994919866 | 1.000467
AMP | Dynamic | Default | Single | 0.9962054 | 1.000767

(on Aarch64)
pytorch#122164 (comment)

---

Pull Request resolved: pytorch#122164
Approved by: https://github.com/snadampal, https://github.com/malfet, https://github.com/atalman
@Xia-Weiwen
Copy link
Collaborator Author

Thank you all for helping land this PR. I have cherry-picked it for release/2.3 here: #122930

atalman pushed a commit that referenced this pull request Apr 2, 2024
As the title. Including issue fixes for aarch64:
- oneapi-src/oneDNN#1831
- oneapi-src/oneDNN#1834

---

## Validation results
(on Intel CPU + Linux)
**Static quantization with Inductor on CV models**

Quant method | Geomean throughput ratio (v3.3.6/baseline)
-- | --
ptq | 0.982937
ptq (cpp wrapper) | 0.978384
qat | 0.978828

**Torchbench cpu userbenchmark with Inductor**

Items | Perf Geomean Ratio (v3.3.6/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 1.01x
jit_llga_throughtput_fp32 | 1.00x
eager_throughtput_fx_int8 | 1.00x
eager_throughtput_bf16_train | 1.46x
eager_throughtput_fp32_train | 1.41x

**Dynamo benchmarks tests**
Precision | Shape | Wrapper | Thread | Eager old/new GEOMEAN | Inductor old/new GEOMEAN
-- | -- | -- | -- | -- | --
Float32 | Static | Default | Multiple | 1.003836812 | 1.003425
Float32 | Static | Default | Single | 1.000181451 | 0.999611
Float32 | Dynamic | Default | Multiple | 1.003980183 | 1.006563
Float32 | Dynamic | Default | Single | 1.000076939 | 0.999969
AMP | Static | Default | Multiple | 0.996824772 | 0.998715
AMP | Static | Default | Single | 0.996402574 | 1.001483
AMP | Dynamic | Default | Multiple | 0.994919866 | 1.000467
AMP | Dynamic | Default | Single | 0.9962054 | 1.000767

(on Aarch64)
#122164 (comment)

---

Pull Request resolved: #122164
Approved by: https://github.com/snadampal, https://github.com/malfet, https://github.com/atalman
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
As the title. Including issue fixes for aarch64:
- oneapi-src/oneDNN#1831
- oneapi-src/oneDNN#1834

---

## Validation results
(on Intel CPU + Linux)
**Static quantization with Inductor on CV models**

Quant method | Geomean throughput ratio (v3.3.6/baseline)
-- | --
ptq | 0.982937
ptq (cpp wrapper) | 0.978384
qat | 0.978828

**Torchbench cpu userbenchmark with Inductor**

Items | Perf Geomean Ratio (v3.3.6/baseline)
-- | --
eager_throughtput_bf16_infer | 1.00x
eager_throughtput_fp32_infer | 1.00x
jit_llga_throughtput_amp_bf16 | 1.01x
jit_llga_throughtput_fp32 | 1.00x
eager_throughtput_fx_int8 | 1.00x
eager_throughtput_bf16_train | 1.46x
eager_throughtput_fp32_train | 1.41x

**Dynamo benchmarks tests**
Precision | Shape | Wrapper | Thread | Eager old/new GEOMEAN | Inductor old/new GEOMEAN
-- | -- | -- | -- | -- | --
Float32 | Static | Default | Multiple | 1.003836812 | 1.003425
Float32 | Static | Default | Single | 1.000181451 | 0.999611
Float32 | Dynamic | Default | Multiple | 1.003980183 | 1.006563
Float32 | Dynamic | Default | Single | 1.000076939 | 0.999969
AMP | Static | Default | Multiple | 0.996824772 | 0.998715
AMP | Static | Default | Single | 0.996402574 | 1.001483
AMP | Dynamic | Default | Multiple | 0.994919866 | 1.000467
AMP | Dynamic | Default | Single | 0.9962054 | 1.000767

(on Aarch64)
pytorch#122164 (comment)

---

Pull Request resolved: pytorch#122164
Approved by: https://github.com/snadampal, https://github.com/malfet, https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants