add mixed data type support for LayerNorm #81851

mingfeima · 2022-07-21T03:13:41Z

Stack from ghstack:

If user uses amp to run bfloat16 models, torch.autocast will
keep module paramters in acc dtype which will leave gamma andbeta
in float while input/output will be in bfloat16.
If user explicitly cast the model to bfloat16 such as:

  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)

The input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

facebook-github-bot · 2022-07-21T03:13:47Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81851
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 1 New Failures

As of commit 1f7ca03 (more details on the Dr. CI page):

Expand to see more

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:14:23.2412274Z AssertionError: te...than the torch result was (1.666915112918943e-05)!

2022-09-01T11:14:23.2401792Z =================================== FAILURES ===================================
2022-09-01T11:14:23.2403012Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:14:23.2405142Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:14:23.2407410Z Traceback (most recent call last):
2022-09-01T11:14:23.2407749Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:14:23.2409903Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:14:23.2410484Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:14:23.2410907Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:14:23.2411360Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:14:23.2411690Z     raise self.failureException(msg)
2022-09-01T11:14:23.2412274Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:14:23.2656537Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:14:23.2659196Z =========================== short test summary info ============================
2022-09-01T11:14:23.2659603Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:14:23.2659975Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:14:23.2660882Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:14:23.2693015Z = 1 failed, 7083 passed, 4803 skipped, 105 xfailed, 121 warnings, 2 rerun in 373.44s (0:06:13) =
2022-09-01T11:14:23.4247651Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:14:24.7251990Z Traceback (most recent call last):
2022-09-01T11:14:24.7252264Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:14:24.7253687Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

ezyang · 2022-07-31T13:23:08Z

test failure looks real

2022-07-22T03:41:59.6289214Z =================================== FAILURES ===================================
2022-07-22T03:41:59.6290167Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-07-22T03:41:59.6290866Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-07-22T03:41:59.6293201Z Traceback (most recent call last):
2022-07-22T03:41:59.6293527Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 496, in test_python_ref
2022-07-22T03:41:59.6293933Z     self._ref_test_helper(lambda: TorchRefsMode.push(strict=True), device, dtype, op)
2022-07-22T03:41:59.6294270Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 479, in _ref_test_helper
2022-07-22T03:41:59.6294568Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-07-22T03:41:59.6294920Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-07-22T03:41:59.6295169Z     raise self.failureException(msg)
2022-07-22T03:41:59.6295797Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-07-22T03:41:59.6536222Z

aten/src/ATen/native/cpu/layer_norm_kernel.cpp

malfet

As Ed commented 10 days ago, test_python_ref__refs_native_layer_norm_cpu_float32 seems real and related to the PR in question, isn't it?

mingfeima · 2022-09-01T05:57:01Z

As Ed commented 10 days ago, test_python_ref__refs_native_layer_norm_cpu_float32 seems real and related to the PR in question, isn't it?

Yes, this failure is solid. This PR has dependency on this performance regression on pytorch/benchmark#1099. Will have the CI fixed when handling this performance regression.

Sorry for the late response, I have to deal with some difficult optimizations on pyg last few days. Will fix the issues on this stack asap.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba Pull Request resolved: pytorch#81851

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

pytorch-bot · 2022-09-08T05:38:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81851

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 82dcf55:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima · 2022-09-08T05:59:31Z

Fix CI failure on test_python_ref__refs_native_layer_norm_cpu_float32. old failure is due to improper data type conversion.

ezyang · 2022-09-08T13:52:24Z

@pytorchbot merge

pytorchmergebot · 2022-09-08T13:54:09Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

pytorchmergebot · 2022-09-08T13:54:11Z

Merge failed

Reason: PR #84404 has not been reviewed yet (Rule CPU ATen backend)

Details for Dev Infra team

Raised by workflow job

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

ezyang · 2022-09-27T14:56:53Z

@pytorchbot merge -f "distributed failure looks unrelated, test was previously disabled"

pytorchmergebot · 2022-09-27T14:58:30Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

pytorchmergebot · 2022-09-27T14:58:35Z

Merge failed

Reason: PR #84404 has not been reviewed yet (Rule CPU ATen backend)

Details for Dev Infra team

Raised by workflow job

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

facebook-github-bot · 2022-10-04T00:50:50Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:51:03Z

The committers listed above are authorized under a signed CLA.

✅ login: mingfeima / name: Ma Mingfei (7d7574f, 3e5237d, 1f7ca03, bae8625, debed72, 3c4970e)

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba Pull Request resolved: pytorch#81851

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima · 2022-12-01T04:44:24Z

@pytorchbot merge

pytorchmergebot · 2022-12-01T04:48:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16 such as: ``` x = torch.randn(n, t, c).bfloat16() ln = nn.LayerNorm(c).bfloat16() y = ln(x) ``` The input/output and gamma/beta will all be in bfloat16. Pull Request resolved: pytorch#81851 Approved by: https://github.com/ezyang

facebook-github-bot added the cla signed label Jul 21, 2022

This was referenced Jul 21, 2022

fix RowwiseMoments vectorization issue on CPU #81849

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #81850

Closed

add mixed data type support for GroupNorm #81852

Closed

pytorchbot added the open source label Jul 21, 2022

mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022

yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022

frank-wei requested review from vkuzo, wenleix and ezyang July 28, 2022 03:42

ezyang reviewed Aug 1, 2022

View reviewed changes

aten/src/ATen/native/cpu/layer_norm_kernel.cpp Show resolved Hide resolved

ezyang reviewed Aug 1, 2022

View reviewed changes

aten/src/ATen/native/cpu/layer_norm_kernel.cpp Outdated Show resolved Hide resolved

malfet requested changes Aug 10, 2022

View reviewed changes

yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022

This was referenced Sep 1, 2022

fix RowwiseMoments vectorization issue on CPU #84404

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

Closed

pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022

mingfeima requested review from malfet and ezyang September 8, 2022 05:59

zhuhaozhe closed this Oct 20, 2022

zhuhaozhe reopened this Oct 20, 2022

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 28, 2022

mingfeima added 2 commits November 29, 2022 13:18

mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022

pytorchmergebot added the Merged label Dec 1, 2022

pytorchmergebot closed this in f1978b1 Dec 1, 2022

facebook-github-bot deleted the gh/mingfeima/82/head branch June 8, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add mixed data type support for LayerNorm #81851

add mixed data type support for LayerNorm #81851

mingfeima commented Jul 21, 2022 •

edited by pytorch-bot bot

facebook-github-bot commented Jul 21, 2022 •

edited

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

ezyang commented Jul 31, 2022

malfet left a comment

mingfeima commented Sep 1, 2022

pytorch-bot bot commented Sep 8, 2022 •

edited

mingfeima commented Sep 8, 2022 •

edited

ezyang commented Sep 8, 2022

pytorchmergebot commented Sep 8, 2022

pytorchmergebot commented Sep 8, 2022

ezyang commented Sep 27, 2022

pytorchmergebot commented Sep 27, 2022

pytorchmergebot commented Sep 27, 2022

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

mingfeima commented Dec 1, 2022

pytorchmergebot commented Dec 1, 2022

add mixed data type support for LayerNorm #81851

add mixed data type support for LayerNorm #81851

Conversation

mingfeima commented Jul 21, 2022 • edited by pytorch-bot bot

facebook-github-bot commented Jul 21, 2022 • edited

🔗 Helpful links

❌ 1 New Failures

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

ezyang commented Jul 31, 2022

malfet left a comment

Choose a reason for hiding this comment

mingfeima commented Sep 1, 2022

pytorch-bot bot commented Sep 8, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81851

✅ No Failures

mingfeima commented Sep 8, 2022 • edited

ezyang commented Sep 8, 2022

pytorchmergebot commented Sep 8, 2022

pytorchmergebot commented Sep 8, 2022

Merge failed

ezyang commented Sep 27, 2022

pytorchmergebot commented Sep 27, 2022

pytorchmergebot commented Sep 27, 2022

Merge failed

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 • edited

mingfeima commented Dec 1, 2022

pytorchmergebot commented Dec 1, 2022

Merge started

mingfeima commented Jul 21, 2022 •

edited by pytorch-bot bot

facebook-github-bot commented Jul 21, 2022 •

edited

pytorch-bot bot commented Sep 8, 2022 •

edited

mingfeima commented Sep 8, 2022 •

edited

linux-foundation-easycla bot commented Oct 4, 2022 •

edited