add mixed data type support for LayerNorm backward on CPU #88064

jiayisunx · 2022-10-31T03:03:16Z

Motivation

Amp provides convenience methods for mixed precision. If users use amp to run bfloat16 models, torch.autocast will keep module parameters in acc dtype which will leave gamma and beta in float while input/output will be in bfloat16. The same goes for backward: parameters are in float, and X & dX & dY are in bfloat16.
Mixed data type support for LayerNorm backward is also needed for model training with LayerNorm.

Testing

Single socket (icx, 32cores):

shape	fp32 forward (ms)	bf16 forward (ms)	mix forward (ms)	fp32 backward (ms)	bf16 backward (ms)	mix backward (ms)
(1, 8, 16)	0.012	0.012	0.012	0.071	0.065	0.062
(8, 8, 16)	0.015	0.014	0.015	0.074	0.070	0.063
(32, 8, 16)	0.062	0.016	0.016	0.073	0.073	0.072
(64, 128, 56, 56)	2.467	0.907	0.0897	12.993	7.603	7.777
(64, 128, 256, 256)	48.904	25.589	25.472	343.992	183.133	188.222

Single core(icx):

shape	fp32 forward (ms)	bf16 forward (ms)	mix forward (ms)	fp32 backward (ms)	bf16 backward (ms)	mix backward (ms)
(1, 8, 16)	0.012	0.012	0.012	0.050	0.050	0.050
(8, 8, 16)	0.014	0.014	0.014	0.052	0.054	0.053
(32, 8, 16)	0.034	0.019	0.018	0.059	0.067	0.066
(64, 128, 56, 56)	66.791	17.725	19.799	119.431	106.123	107.446
(64, 128, 256, 256)	1542.477	402.132	527.044	3019.437	2336.318	2448.320

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2022-10-31T03:03:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88064

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 27f3948:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2022-10-31T03:03:23Z

The committers listed above are authorized under a signed CLA.

✅ login: jiayisunx (118b40d)

jgong5 · 2022-10-31T12:04:41Z

@jiayisunx Do you mind elaborate the motivation of this PR in the description?

jiayisunx · 2022-11-17T06:36:46Z

@pytorchbot label intel

jiayisunx · 2022-12-06T08:44:59Z

@jgong5, @mingfeima, could you please help review this PR? thanks.

jiayisunx · 2022-12-14T00:39:15Z

@pytorchbot label intel priority

pytorch-bot · 2022-12-14T00:39:21Z

Didn't find following labels among repository labels: priority

malfet

Looks like layer_norm_backward_kernel_mixed_type is just a copy of LayerNormBackwardKernelImplInternal with slightly different argument types.

Perhaps a better approach would be to just move ACC_T as template argument of LayerNormBackwardKernelImplInternal and call it slightly differently in mixed precision type scenario

jiayisunx · 2022-12-15T03:06:49Z

aten/src/ATen/native/cpu/layer_norm_kernel.cpp

+            fVec x_fvec0, x_fvec1, dy_fvec0, dy_fvec1, gamma_fvec0, gamma_fvec1;
+            std::tie(x_fvec0, x_fvec1) = convert_bfloat16_float(x_bvec);
+            std::tie(dy_fvec0, dy_fvec1) = convert_bfloat16_float(dy_bvec);
+            std::tie(gamma_fvec0, gamma_fvec1) = load2f(gamma_data, N);


Looks like layer_norm_backward_kernel_mixed_type is just a copy of LayerNormBackwardKernelImplInternal with slightly different argument types.

Perhaps a better approach would be to just move ACC_T as template argument of LayerNormBackwardKernelImplInternal and call it slightly differently in mixed precision type scenario

@malfet thanks for your comments, actually this backward implementation is consistent with forward, and there are many places that need to be modified to support mixed data type, not just slightly different argument type. For example, here, we cannot simply call vec::map3_reduce_all to do this part.

jiayisunx · 2022-12-19T11:19:55Z

@pytorchbot rebase

pytorchmergebot · 2022-12-19T11:21:42Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-12-19T11:21:47Z

Successfully rebased layer_norm onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout layer_norm && git pull --rebase)

jiayisunx · 2023-01-06T05:33:05Z

@malfet , I have refactored the layernorm backward kernel, could you please help to review this PR again?

jiayisunx · 2023-02-06T01:35:22Z

@malfet

malfet

Looks good, though template specialization is essentially copy-n-paste from the generic template, would it be possible to somehow reuse code more by introducing more templates. I.e. why bf16 can not utilize the same vec::map primitives

malfet · 2023-02-08T23:57:07Z

aten/src/ATen/native/cpu/utils.h

+inline std::tuple<Vectorized<float>, Vectorized<float>> load2f(const BFloat16* ptr, int64_t count) {
+  return convert_bfloat16_float(Vectorized<BFloat16>::loadu(ptr, count));
+}
+
+inline std::tuple<Vectorized<float>, Vectorized<float>> load2f(const float* ptr, int64_t count) {
+  using Vec = Vectorized<float>;
+  if (count > Vec::size()) {
+  return std::make_tuple(Vec::loadu(ptr), Vec::loadu(ptr + Vec::size(), count - Vec::size()));
+  } else {
+    return std::make_tuple(Vec::loadu(ptr, count), Vec(0));
+  }
+}


Hmm, what's the difference between this implementation and the previous one? Should it be just the same template with default argument, something like `load2f(const BFloat16*ptr, int64_t count = 1);

Also, why int64 rather than uint64?

The previous one is the helper for mixed data type parameter Vec::load(ptr), this implementation is the helper for Vec::load(ptr, count), and use int64 because loadu(const void* ptr, int64_t count) uses int64.

jiayisunx · 2023-02-09T04:58:26Z

Looks good, though template specialization is essentially copy-n-paste from the generic template, would it be possible to somehow reuse code more by introducing more templates. I.e. why bf16 can not utilize the same vec::map primitives

Actually bf16 can utilize the same vec::map primitives, but mixed datatype cannot. And the template specialization is for mixed datatype cases.

jiayisunx · 2023-02-09T05:33:52Z

@pytorchbot merge

pytorchmergebot · 2023-02-09T05:38:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-09T05:38:36Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team

Raised by workflow job

jiayisunx · 2023-02-09T05:55:28Z

@pytorchbot rebase

pytorchmergebot · 2023-02-09T05:57:24Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-02-09T05:57:28Z

Successfully rebased layer_norm onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout layer_norm && git pull --rebase)

jiayisunx · 2023-02-10T03:06:41Z

@pytorchbot merge

pytorchmergebot · 2023-02-10T03:10:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the release notes: nn release notes category label Oct 31, 2022

jiayisunx marked this pull request as draft October 31, 2022 03:03

pytorchbot added the open source label Oct 31, 2022

pytorch-bot bot added the intel This tag is for PR from Intel label Nov 17, 2022

jiayisunx force-pushed the layer_norm branch from bf41667 to 118b40d Compare December 5, 2022 05:11

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 5, 2022

jiayisunx marked this pull request as ready for review December 6, 2022 08:42

bdhirsh requested a review from mingfeima December 7, 2022 02:40

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 7, 2022

jgong5 approved these changes Dec 9, 2022

View reviewed changes

zhuhaozhe added the intel priority matters to intel architecture from performance wise label Dec 14, 2022

zhuhaozhe requested a review from malfet December 14, 2022 00:51

malfet requested changes Dec 15, 2022

View reviewed changes

jiayisunx commented Dec 15, 2022

View reviewed changes

jiayisunx closed this Dec 15, 2022

jiayisunx reopened this Dec 15, 2022

jiayisunx requested a review from malfet December 15, 2022 03:11

pytorchmergebot force-pushed the layer_norm branch from 118b40d to 2fc868b Compare December 19, 2022 11:21

jiayisunx requested a review from jgong5 January 6, 2023 02:17

jgong5 approved these changes Jan 6, 2023

View reviewed changes

atalman added this to the 2.0.0 milestone Jan 11, 2023

jiayisunx force-pushed the layer_norm branch 2 times, most recently from 3770921 to 1a4595e Compare January 18, 2023 07:46

malfet approved these changes Feb 8, 2023

View reviewed changes

malfet reviewed Feb 8, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 9, 2023

jiayisunx added 2 commits February 9, 2023 05:57

add mixed data type support for LayerNorm backward

9bf98de

refactor layernorm backward kernel

27f3948

pytorchmergebot force-pushed the layer_norm branch from 1a4595e to 27f3948 Compare February 9, 2023 05:57

pytorchmergebot added the Merged label Feb 10, 2023

pytorchmergebot closed this in 01de5dd Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add mixed data type support for LayerNorm backward on CPU #88064

add mixed data type support for LayerNorm backward on CPU #88064

jiayisunx commented Oct 31, 2022 •

edited

pytorch-bot bot commented Oct 31, 2022 •

edited

linux-foundation-easycla bot commented Oct 31, 2022 •

edited

jgong5 commented Oct 31, 2022

jiayisunx commented Nov 17, 2022

jiayisunx commented Dec 6, 2022

jiayisunx commented Dec 14, 2022

pytorch-bot bot commented Dec 14, 2022

malfet left a comment

jiayisunx Dec 15, 2022 •

edited

jiayisunx commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

jiayisunx commented Jan 6, 2023

jiayisunx commented Feb 6, 2023

malfet left a comment

malfet Feb 8, 2023

jiayisunx Feb 9, 2023

jiayisunx commented Feb 9, 2023

jiayisunx commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

jiayisunx commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

jiayisunx commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

add mixed data type support for LayerNorm backward on CPU #88064

add mixed data type support for LayerNorm backward on CPU #88064

Conversation

jiayisunx commented Oct 31, 2022 • edited

Motivation

Testing

pytorch-bot bot commented Oct 31, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88064

✅ No Failures

linux-foundation-easycla bot commented Oct 31, 2022 • edited

jgong5 commented Oct 31, 2022

jiayisunx commented Nov 17, 2022

jiayisunx commented Dec 6, 2022

jiayisunx commented Dec 14, 2022

pytorch-bot bot commented Dec 14, 2022

malfet left a comment

Choose a reason for hiding this comment

jiayisunx Dec 15, 2022 • edited

Choose a reason for hiding this comment

jiayisunx commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

jiayisunx commented Jan 6, 2023

jiayisunx commented Feb 6, 2023

malfet left a comment

Choose a reason for hiding this comment

malfet Feb 8, 2023

Choose a reason for hiding this comment

jiayisunx Feb 9, 2023

Choose a reason for hiding this comment

jiayisunx commented Feb 9, 2023

jiayisunx commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

Merge started

pytorchmergebot commented Feb 9, 2023

Merge failed

jiayisunx commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

pytorchmergebot commented Feb 9, 2023

jiayisunx commented Feb 10, 2023

pytorchmergebot commented Feb 10, 2023

Merge started

jiayisunx commented Oct 31, 2022 •

edited

pytorch-bot bot commented Oct 31, 2022 •

edited

linux-foundation-easycla bot commented Oct 31, 2022 •

edited

jiayisunx Dec 15, 2022 •

edited