Optimize GELU BFloat16 Impl in CPU path #79378

yanbing-j · 2022-06-12T10:59:39Z

Description

For slow path (with non-contiguous inputs) with none or tanh approximate, current bfloat16 impl is not performance friendly in ATen. This PR uses float32 as an immediate type, in order to reduce the heavy cost of converting bf16 to fp32.

Test

IceLake 2S 32C (Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz)

single socket (32 cores):
approximate is none:

input shapes	forward ( base) (ms)	backward (base) (ms)	forward (optimized) (ms)	backward (optimized) (ms)
[16, 32, 32]	0.361	1.055	0.348	0.672
[32, 32, 64]	0.084	2.003	0.076	1.426
[32, 64, 128]	0.237	2.007	0.22	1.454
[64, 128, 128]	2.23	6.348	1.943	4.103

approximate is tanh:

input shapes	forward ( base) (ms)	backward (base) (ms)	forward (optimized) (ms)	backward (optimized) (ms)
[16, 32, 32]	0.203	1.209	0.138	0.474
[32, 32, 64]	0.063	2.497	0.043	0.985
[32, 64, 128]	0.201	2.707	0.141	1.205
[64, 128, 128]	1.549	8.749	1.065	3.635

single core:
approximate is none:

input shapes	forward ( base) (ms)	backward (base) (ms)	forward (optimized) (ms)	backward (optimized) (ms)
[16, 32, 32]	0.359	1.055	0.267	0.592
[32, 32, 64]	1.11	3.483	1.063	2.373
[32, 64, 128]	4.478	13.866	4.27	9.426
[64, 128, 128]	17.675	55.231	16.805	37.509

approximate is tanh:

input shapes	forward ( base) (ms)	backward (base) (ms)	forward (optimized) (ms)	backward (optimized) (ms)
[16, 32, 32]	0.202	1.212	0.138	0.473
[32, 32, 64]	0.776	4.843	0.531	1.872
[32, 64, 128]	3.203	19.267	2.16	7.243
[64, 128, 128]	12.33	76.834	8.286	29.553

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

facebook-github-bot · 2022-06-12T10:59:45Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79378
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit a917806 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

github-actions · 2022-09-19T16:44:39Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

facebook-github-bot · 2022-10-04T01:02:07Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T01:02:21Z

The committers listed above are authorized under a signed CLA.

✅ login: yanbing-j / name: YanbingJiang (e1e40a0, 6df4c71, d1dd927, edb2202, 19fc33a, d28905e, a917806, 84e1a5d)

yanbing-j · 2022-10-19T04:34:56Z

@pytorchbot label ciflow/trunk

pytorch-bot · 2022-10-19T04:35:01Z

Can't add following labels to PR: ciflow/trunk Please ping one of the reviewers for help.

pytorch-bot · 2022-12-12T05:29:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79378

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ddda96a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima · 2023-01-05T01:41:34Z

@pytorchbot merge

pytorchmergebot · 2023-01-05T01:43:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Optimize GELU BFloat16

e1e40a0

facebook-github-bot added the cla signed label Jun 12, 2022

pytorchbot added the open source label Jun 12, 2022

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

6df4c71

yanbing-j force-pushed the yanbing/gelu_bf16_vec_optimize branch from e78dd68 to d1dd927 Compare June 15, 2022 09:23

Optimize GELU BF16 backward

d1dd927

yanbing-j added the intel This tag is for PR from Intel label Jun 16, 2022

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

edb2202

yanbing-j force-pushed the yanbing/gelu_bf16_vec_optimize branch from 86a2b7b to edb2202 Compare July 6, 2022 06:11

yanbing-j added 3 commits July 7, 2022 08:57

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

19fc33a

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

d28905e

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

a917806

github-actions bot added the Stale label Sep 19, 2022

github-actions bot closed this Nov 18, 2022

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

84e1a5d

chunyuan-w reopened this Dec 12, 2022

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 12, 2022

chunyuan-w added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 13, 2022

yanbing-j added 3 commits December 14, 2022 13:22

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

db083d4

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

c7f89e4

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

26bc454

yanbing-j marked this pull request as ready for review December 19, 2022 13:17

yanbing-j requested a review from mingfeima December 19, 2022 15:16

drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 22, 2022

yanbing-j added 3 commits December 26, 2022 09:31

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

066c8f1

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

bd0ce28

Merge branch 'master' into yanbing/gelu_bf16_vec_optimize

ddda96a

mingfeima approved these changes Jan 5, 2023

View reviewed changes

pytorchmergebot added the Merged label Jan 5, 2023

pytorchmergebot closed this in f630294 Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GELU BFloat16 Impl in CPU path #79378

Optimize GELU BFloat16 Impl in CPU path #79378

yanbing-j commented Jun 12, 2022 •

edited

facebook-github-bot commented Jun 12, 2022 •

edited

github-actions bot commented Sep 19, 2022

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

yanbing-j commented Oct 19, 2022

pytorch-bot bot commented Oct 19, 2022

pytorch-bot bot commented Dec 12, 2022 •

edited

mingfeima commented Jan 5, 2023

pytorchmergebot commented Jan 5, 2023

Optimize GELU BFloat16 Impl in CPU path #79378

Optimize GELU BFloat16 Impl in CPU path #79378

Conversation

yanbing-j commented Jun 12, 2022 • edited

Description

Test

facebook-github-bot commented Jun 12, 2022 • edited

🔗 Helpful links

✅ No Failures (0 Pending)

github-actions bot commented Sep 19, 2022

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 • edited

yanbing-j commented Oct 19, 2022

pytorch-bot bot commented Oct 19, 2022

pytorch-bot bot commented Dec 12, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79378

✅ No Failures

mingfeima commented Jan 5, 2023

pytorchmergebot commented Jan 5, 2023

Merge started

yanbing-j commented Jun 12, 2022 •

edited

facebook-github-bot commented Jun 12, 2022 •

edited

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

pytorch-bot bot commented Dec 12, 2022 •

edited