Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reopen] Upgrade submodule oneDNN to v3.4.2 #126137

Closed
wants to merge 2 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented May 14, 2024

Reopen of #122472

Improvements

This upgrade fixes the following issues:

This upgrade brings the following new features:

Validation results on CPU

Original results with oneDNN v3.4.1 are here: #122472 (comment)

Need to rerun validation and update results.
Results of reruns

  1. Torchbench cpu userbenchmark inference & training
Perf_Geomean Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer 1.00x
eager_throughtput_fp32_infer 1.00x
jit_llga_throughtput_amp_bf16 0.99x
jit_llga_throughtput_fp32 1.00x
eager_throughtput_fx_int8 1.01x
eager_throughtput_bf16_train 0.99x
eager_throughtput_fp32_train 0.99x
  1. Dynamo benchmarks
Precision Shape Wrapper Thread Eager Ratio old/new GEOMEAN Inductor Ratio old/new GEOMEAN
Float32 Static Default Multiple 1.010737 1.019399
Float32 Static Default Single 1.002681 1.001732
Float32 Dynamic Default Multiple 1.006492 1.007722
Float32 Dynamic Default Single 1.003639 1.002965
AMP Static Default Multiple 1.001126 1.005045
AMP Static Default Single 0.995729 0.994613
AMP Dynamic Default Multiple 0.995276 0.998698
AMP Dynamic Default Single 0.997459 0.993833

All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team.

model_name oneDNN v3.4.2 baseline throughput ratio(new/old)
pytorch_stargan-train_throughput 69.759 85.378 0.8171

Dynamo benchmarks

Precision Shape Wrapper Thread Ratio old/new GEOMEAN Ratio old/new GEOMEAN
        Eager Inductor
Float32 Static Default Multiple 1.010736824 1.019399
      Single 1.002680965 1.001732
Float32 Dynamic Default Multiple 1.006492275 1.007722
      Single 1.003638601 1.002965
AMP Static Default Multiple 1.001125822 1.005045
      Single 0.99572873 0.994613
AMP Dynamic Default Multiple 0.995275854 0.998698
      Single 0.99745925 0.993833

Inductor benchmarks

Dtype Throughput ratio new/old Geomean
all 99.60%
fp32 99.84%
bf16 99.55%
int8 99.08%
fp16 99.86%

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @snadampal

Copy link

pytorch-bot bot commented May 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126137

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1995054 with merge base 3c4058c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/linux-aarch64 linux aarch64 CI workflow module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration topic: not user facing topic category labels May 14, 2024
@Xia-Weiwen Xia-Weiwen added ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel ciflow/xpu Run XPU CI tasks labels May 14, 2024
@EikanWang EikanWang removed the ciflow/xpu Run XPU CI tasks label May 14, 2024
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review May 14, 2024 03:25
@Xia-Weiwen
Copy link
Collaborator Author

Hi @snadampal Please help check if the aarch64 failure is related to oneDNN upgrade. If yes, please help investigate. If no, we will start validation on x86 CPU.

@milpuz01
Copy link
Contributor

I think the problem here is that when we are building wheel (is this script used: https://github.com/pytorch/builder/blob/ead2e21b24038ed9117060bb2c00bc79c4e91d85/aarch64_linux/aarch64_wheel_ci_build.py#L81 to build linux-jammy-aarach64-py3.10 wheel or something else?) for CI it is using ACL 23.08 version where definition of Conv2dInfo has as one of the arguments experimental::PostOpList that got removed in ACL 23.11+ and updated in oneDNN 3.4.2 not to call it: https://github.com/oneapi-src/oneDNN/blob/1137e04ec0b5251ca2b4400a4fd3c667ce843d67/src/cpu/aarch64/acl_indirect_gemm_convolution.hpp#L49.

@snadampal is this right? Should CI build script be updated to use ACL 23.11+ to build with oneDNN 3.4+?

@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@snadampal
Copy link
Collaborator

@Xia-Weiwen , the CI failure is because oneDNN3.4 requires ACL version update, here is the PR:#125549
both these PRs are inter-dependent.

Hi @atalman , could you please suggest how we can handle this dependency. one option is to merge aarch64-linux CI PR (#125549) first, so that the CI will be ready to test oneDNN 3.4 on aarch64-linux.

@snadampal
Copy link
Collaborator

Yes, @milpuz01 , it was the ACL version, i added the details above.

@atalman
Copy link
Contributor

atalman commented May 14, 2024

hi @snadampal and @Xia-Weiwen merging both PRs sounds like a good solution. @Xia-Weiwen could you please add this commit: 091f84e on top of this PR ?

This is required for oneDNN 3.4.x version
@Xia-Weiwen
Copy link
Collaborator Author

Hi @atalman @snadampal @milpuz01 I have added the commit 091f84e on top of this PR.

@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@atalman
Copy link
Contributor

atalman commented May 16, 2024

@pytorchmergebot merge -f "All required workflows are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@chuanqi129
Copy link
Collaborator

Update oneDNN v3.4.2 acceptance test reports

  1. Torchbench cpu userbenchmark inference & training
Perf_Geomean Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer 1.00x
eager_throughtput_fp32_infer 1.00x
jit_llga_throughtput_amp_bf16 0.99x
jit_llga_throughtput_fp32 1.00x
eager_throughtput_fx_int8 1.01x
eager_throughtput_bf16_train 0.99x
eager_throughtput_fp32_train 0.99x
  1. Dynamo benchmarks
Precision Shape Wrapper Thread Eager Ratio old/new GEOMEAN Inductor Ratio old/new GEOMEAN
Float32 Static Default Multiple 1.010737 1.019399
Float32 Static Default Single 1.002681 1.001732
Float32 Dynamic Default Multiple 1.006492 1.007722
Float32 Dynamic Default Single 1.003639 1.002965
AMP Static Default Multiple 1.001126 1.005045
AMP Static Default Single 0.995729 0.994613
AMP Dynamic Default Multiple 0.995276 0.998698
AMP Dynamic Default Single 0.997459 0.993833

All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team. @atalman any comments on it?

model_name oneDNN v3.4.2 baseline throughput ratio(new/old)
pytorch_stargan-train_throughput 69.759 85.378 0.8171

@Guobing-Chen
Copy link
Collaborator

Update oneDNN v3.4.2 acceptance test reports

  1. Torchbench cpu userbenchmark inference & training

Perf_Geomean Ratio (oneDNN v3.4.2/baseline)
eager_throughtput_bf16_infer 1.00x
eager_throughtput_fp32_infer 1.00x
jit_llga_throughtput_amp_bf16 0.99x
jit_llga_throughtput_fp32 1.00x
eager_throughtput_fx_int8 1.01x
eager_throughtput_bf16_train 0.99x
eager_throughtput_fp32_train 0.99x
2. Dynamo benchmarks

Precision Shape Wrapper Thread Eager Ratio old/new GEOMEAN Inductor Ratio old/new GEOMEAN
Float32 Static Default Multiple 1.010737 1.019399
Float32 Static Default Single 1.002681 1.001732
Float32 Dynamic Default Multiple 1.006492 1.007722
Float32 Dynamic Default Single 1.003639 1.002965
AMP Static Default Multiple 1.001126 1.005045
AMP Static Default Single 0.995729 0.994613
AMP Dynamic Default Multiple 0.995276 0.998698
AMP Dynamic Default Single 0.997459 0.993833
All tests geomean ratio is good, but detected 1 model AMP training performance drop with oneDNN v3.4.2 as below. The issue has been reported to oneDNN team. @atalman any comments on it?

model_name oneDNN v3.4.2 baseline throughput ratio(new/old)
pytorch_stargan-train_throughput 69.759 85.378 0.8171

Thanks for Chuanqi's testing result, @vpirogov, could you also help comment the potential impact of this outlier?

ZelboK pushed a commit to ZelboK/pytorch that referenced this pull request May 19, 2024
Reopen of pytorch#122472

## Improvements
This upgrade fixes the following issues:
- pytorch#120982

This upgrade brings the following new features:
- Introduced memory descriptor serialization API. This API is needed to support freezing on CPU in AOTInductor (pytorch#114450)

## Validation results on CPU
Original results with oneDNN v3.4.1 are here: pytorch#122472 (comment)

Need to rerun validation and update results.

Co-authored-by: Sunita Nadampalli <nadampal@amazon.com>
Pull Request resolved: pytorch#126137
Approved by: https://github.com/jgong5, https://github.com/snadampal, https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet