Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update oneDNN submodule to v3.3.2 #112700

Closed
wants to merge 8 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Nov 2, 2023

Update oneDNN submodule to v3.3.2.
Add a macro to check the version of third_party/ideep.
Since we have versioning now, the changes won't break any pipeline even if third_party/ideep is not updated at the same time.

The commit bf74e92 fixes a versioning issue with older oneDNN. It separates the versioning of oneDNN and ideep, so that even oneDNN is down graded, the versioning checks still guard the condition we expect.
Test plan for this commit:
Without this fix, tests about qconv will fail if third_party/ideep (and oneDNN) is downgraded. So, the test plan is to

  • Go to third_party/ideep and downgrade it to pytorch-rls-v3.1.1
  • run git submodule update inside third_party/ideep
  • Go up to the source root directory of PyTorch and rebuild PyTorch
  • Run python test/test_quantization.py

Performance benchmark

  • Inference throughput after vs. before oneDNN upgrade
    image

  • FP32 training after vs. before oneDNN upgrade
    eager mode throughput geomean ratio: 0.988

Test scope: torchbench
Test platform: c7i.16xlarge instance

Notes about what this oneDNN update addresses for PyTorch:

  1. Fixed a crash bug in SDPA backward (Memory corruption in test_transformers #115253).
  2. Fixed a performance regression (40% speedup) for the inductor workload doctr_det_predictor on x86 (see [inductor][cpu] Perf regression #108324 (comment)).
  3. A few improvements for ARM CPU: ACL fixed format kernels (allows for in-place weights updates), weights pre-packing optimizations for torch.compile(), depthwise convolutions and dilated tensor support.

Known issues with this update (will be addressed in the next patch release):

  1. A performance regression with 5.8% perf drop from pytorch_stargan-train (see V2 Performance Signal Detected by TorchBench CI on '2.2.0.dev20231205+cu118' benchmark#2076 (comment))
  2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))
  3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see [inductor][cpu]rexnet100 dynamic FP32 multiple threads performance regression #115346 (comment))

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen

@pytorch-bot pytorch-bot bot added the release notes: quantization release notes category label Nov 2, 2023
Copy link

pytorch-bot bot commented Nov 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112700

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bf74e92 with merge base 624f202 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 2, 2023
@@ -1473,7 +1473,24 @@ static at::Tensor _quantized_convolution_onednn(
// Scales of ONEDNN and PyTorch are reciprocal
const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0 / act_scale);

#if defined(IDEEP_VERSION_MAJOR) && IDEEP_VERSION_MAJOR>=3 && defined(IDEEP_VERSION_REVISION) && IDEEP_VERSION_REVISION == 0
#if defined(IDEEP_VERSION_MAJOR) && IDEEP_PREREQ(3, 1, 0, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since IDEEP_PREREQ uses not only IDEEP_VERSION_MAJOR, suggest to check all the needed macros. Maybe you can just check these macros inside IDEEP_PREREQ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. The checks are moved there.
Could you please also review the related PR in ideep: intel/ideep#253 ? Currently this PR does not include oneDNN update. After the ideep PR is merged, this PR will include ideep & oneDNN update and be marked as ready for review. Thanks.

@@ -5,6 +5,11 @@

#if AT_MKLDNN_ENABLED()
#include <ideep.hpp>
#ifndef IDEEP_PREREQ
#define IDEEP_PREREQ(major, minor, patch, revision) \
(((IDEEP_VERSION_MAJOR << 32) + (IDEEP_VERSION_MINOR << 16) + (IDEEP_VERSION_PATCH << 8) + (DNNL_VERSION_PATCH)) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here for the semantic of IDEEP_VERSION_MAJOR , IDEEP_VERSION_MINOR, IDEEP_VERSION_PATCH and DNNL_VERSION_PATCH? Should IDEEP_VERSION_MAJOR , IDEEP_VERSION_MINOR be the same as OneDNN major version and minor version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. The definitions of these macros are here: https://github.com/intel/ideep/blob/ideep_pytorch/include/ideep.hpp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From above link, looks like IDEEP_VERSION_PATCH is same as DNNL_VERSION_PATCH?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We check ideep API change by IDEEP_VERSION_REVISION. A comment is added and a typo is also fixed there.

@Xia-Weiwen Xia-Weiwen force-pushed the update_onednn_to_v3.3 branch 2 times, most recently from 305eb78 to 4879d80 Compare November 6, 2023 05:32
@github-actions github-actions bot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Nov 7, 2023
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review November 7, 2023 09:47
@ezyang
Copy link
Contributor

ezyang commented Nov 9, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 9, 2023
@pytorchmergebot
Copy link
Collaborator

This PR updates submodules third_party/ideep

If those updates are intentional, please add "submodule" keyword to PR title/description.

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 9, 2023
@ezyang ezyang changed the title Update oneDNN to v3.3 Update oneDNN submodule to v3.3 Nov 9, 2023
@ezyang
Copy link
Contributor

ezyang commented Nov 9, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following:
yulin0077, satgera, pvtuan10, 842974287, digantdesai, ...

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@Xia-Weiwen Xia-Weiwen marked this pull request as draft November 9, 2023 22:37
@atalman
Copy link
Contributor

atalman commented Dec 4, 2023

Hi @Xia-Weiwen looks like multiple quantization tests are failed in test_quantization :

@malfet
Copy link
Contributor

malfet commented Dec 4, 2023

@Xia-Weiwen have you tested that PyTorch works with with both previous and current versions of oneDNN?
As from the failures it looks like the following check was not executed as planned using older oneDNN:

#if IDEEP_PREREQ(3, 1, 0, 1)
wgt_scales = ideep::scale_t(1, weight.q_scale());
#elif IDEEP_PREREQ(3, 1, 0, 0)
wgt_scales = ideep::scale_t(1, 1.0/weight.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal
#else

@Xia-Weiwen Xia-Weiwen restored the update_onednn_to_v3.3 branch December 5, 2023 01:50
@Xia-Weiwen
Copy link
Collaborator Author

Hi @atalman @malfet Thanks for the feedback. May I know how old your oneDNN was? For the checks, we assume that oneDNN is >= 3.1.0.

@Xia-Weiwen Xia-Weiwen reopened this Dec 5, 2023
@Xia-Weiwen
Copy link
Collaborator Author

Hi @atalman @malfet I have updated the PR to fix the issue with older versions of oneDNN. Could you please try again? Thanks!

@facebook-github-bot
Copy link
Contributor

@atalman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@atalman
Copy link
Contributor

atalman commented Dec 5, 2023

@pytorchbot merge -f "like internal test_quantization are now passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Dec 5, 2023

@Xia-Weiwen could you please include the details of the last commit in PR description: Fix versioning issue with older versions of oneDNN
Also please include test plan, how exactly was the versioning change validated on your end.

@Xia-Weiwen
Copy link
Collaborator Author

@Xia-Weiwen could you please include the details of the last commit in PR description: Fix versioning issue with older versions of oneDNN Also please include test plan, how exactly was the versioning change validated on your end.

Hi @atalman I have updated the description at the top #112700 (comment). Could you please review? Thanks!

dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: pytorch#112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
This reverts commit afbaa0c.

Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: pytorch#112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: pytorch#112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
This reverts commit afbaa0c.

Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: pytorch#112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
ZhiweiYan-96 pushed a commit to ZhiweiYan-96/pytorch that referenced this pull request Dec 22, 2023
This reverts commit afbaa0c.

Reverted pytorch#112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](pytorch#112700 (comment)))
ZhiweiYan-96 pushed a commit to ZhiweiYan-96/pytorch that referenced this pull request Dec 22, 2023
Update oneDNN submodule to v3.3.2.
Add a macro to check the version of `third_party/ideep`.
Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time.

Pull Request resolved: pytorch#112700
Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman
pytorchmergebot pushed a commit that referenced this pull request Mar 11, 2024
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700.

Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).
1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see pytorch/benchmark#2076 (comment))
Validation results with this patch: Latency increased by 0.60%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 418.851717
    }
}
oneDNN v3.3.4
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 421.381313
    }
}
```

2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see #115346 (comment))
Validation results with this patch: Latency reduced by 3.23%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0

oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0
```

3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see #115346 (comment))
Validation results with this patch: Latency reduced by 0.85%
```
Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4

oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4
```

The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.
- #120211
- #120406
- #120547

-----

Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I.  *torchbench CPU userbenchmark test*
Suite | Speedup
-- | --
eager_throughtput_bf16_infer | 1.001848
eager_throughtput_fp32_infer | 1.000257
eager_throughtput_fx_int8 | 1.003069
jit_llga_throughtput_amp_bf16 | 1.000682
jit_llga_throughtput_fp32 | 1.000313
eager_throughtput_bf16_train | 0.998222
eager_throughtput_fp32_train | 1.003384

II. *Inductor FP32/AMP inference tests*
i.  FP32 static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.09
timm_models | tinynet_a | multiple | 128 | 1.14

ii.  FP32 dynamic default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | alexnet | multiple | 128 | 1.08
torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98
torchbench | timm_efficientnet | multiple | 64 | 1.08

iii. AMP static default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | hf_distil_whisper | multiple | 1 | 1.18
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | BartForConditionalGeneration | multiple | 2 | 1.19
timm_models | eca_halonext26ts | multiple | 128 | 1.13
timm_models | nfnet_l0 | multiple | 128 | 1.13
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | spnasnet_100 | multiple | 128 | 1.15
timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22
timm_models | tinynet_a | multiple | 128 | 1.49
torchbench | hf_Bert_large | single | 1 | 1.16
huggingface | XLNetLMHeadModel | single | 1 | 1.07

iv.  AMP dynamic default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14
timm_models | nfnet_l0 | multiple | 128 | 1.15
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | tinynet_a | multiple | 128 | 1.34
huggingface | XLNetLMHeadModel | single | 1 | 1.09

-----

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #120767
Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman
pianpwk pushed a commit that referenced this pull request Mar 11, 2024
This upgrade contains the fixes to the known issues brought by oneDNN v3.3.2, including issues #115346, #120211 and #120406 and those listed in PR #112700.

Issue #115346 (perf regression) was fixed by oneDNN v3.3.4. No new regression was found with v3.3.5. The detailed results of v3.3.4 are given below and compared with v3.1.1 (the oneDNN version in PyTorch before it was updated to v3.3.2).
1. A performance regression with 5.8% perf drop from `pytorch_stargan-train` (see pytorch/benchmark#2076 (comment))
Validation results with this patch: Latency increased by 0.60%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
metrics-1484287.json
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 418.851717
    }
}
oneDNN v3.3.4
{
    "name": "cpu",
    "environ": {
        "pytorch_git_version": "6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0"
    },
    "metrics": {
        "latency": 421.381313
    }
}
```

2. Performance regression of FP32 rexnet_100 with Inductor, dynamic shape, multi-threads (see #115346 (comment))
Validation results with this patch: Latency reduced by 3.23%
```
Tested on an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz instance (IceLake)
oneDNN v3.1.1
(inductor speedup over eager mode) 2.876x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,2.875904,113.314765,18.455283,0.990437,1302.636134,1315.212902,351,1,0,0

oneDNN v3.3.4
(inductor speedup over eager mode) 3.003x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,rexnet_100,128,3.003012,109.653012,91.547260,0.990048,1302.532506,1315.625370,351,1,0,0
```

3. Performance regression of AMP hf_T5_generate and tinynet_a with Inductor, static shape, multi-threads (see #115346 (comment))
Validation results with this patch: Latency reduced by 0.85%
```
Tested on an AWS spr metal instance
oneDNN v3.1.1
(inductor speedup over eager mode) 1.120x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.120018,1197.807729,205.905466,0.442803,125.179904,282.698957,10550,48,8,4

oneDNN v3.3.4
(inductor speedup over eager mode) 1.134x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,hf_T5_generate,1,1.133594,1187.701514,205.855527,0.422012,128.405094,304.268493,10550,48,8,4
```

The following issues about functionality are fixed by this upgrade. Test cases are also added for these issues.
- #120211
- #120406
- #120547

-----

Below are detailed data of torchbench CPU userbenchmark test and Inductor FP32/AMP inference tests. No regression of perf or functionality was found.
I.  *torchbench CPU userbenchmark test*
Suite | Speedup
-- | --
eager_throughtput_bf16_infer | 1.001848
eager_throughtput_fp32_infer | 1.000257
eager_throughtput_fx_int8 | 1.003069
jit_llga_throughtput_amp_bf16 | 1.000682
jit_llga_throughtput_fp32 | 1.000313
eager_throughtput_bf16_train | 0.998222
eager_throughtput_fp32_train | 1.003384

II. *Inductor FP32/AMP inference tests*
i.  FP32 static default
suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.09
timm_models | tinynet_a | multiple | 128 | 1.14

ii.  FP32 dynamic default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | alexnet | multiple | 128 | 1.08
torchbench | basic_gnn_edgecnn | multiple | 1 | 0.98
torchbench | timm_efficientnet | multiple | 64 | 1.08

iii. AMP static default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | hf_distil_whisper | multiple | 1 | 1.18
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | BartForConditionalGeneration | multiple | 2 | 1.19
timm_models | eca_halonext26ts | multiple | 128 | 1.13
timm_models | nfnet_l0 | multiple | 128 | 1.13
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | spnasnet_100 | multiple | 128 | 1.15
timm_models | tf_efficientnet_b0 | multiple | 128 | 1.22
timm_models | tinynet_a | multiple | 128 | 1.49
torchbench | hf_Bert_large | single | 1 | 1.16
huggingface | XLNetLMHeadModel | single | 1 | 1.07

iv.  AMP dynamic default

suite | name | thread | batch size | Ratio Speedup(New/old)
-- | -- | -- | -- | --
torchbench | timm_efficientnet | multiple | 64 | 1.32
huggingface | PLBartForConditionalGeneration | multiple | 4 | 1.14
timm_models | nfnet_l0 | multiple | 128 | 1.15
timm_models | rexnet_100 | multiple | 128 | 1.45
timm_models | tinynet_a | multiple | 128 | 1.34
huggingface | XLNetLMHeadModel | single | 1 | 1.09

-----

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #120767
Approved by: https://github.com/chuanqi129, https://github.com/jgong5, https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source release notes: quantization release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants