Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mixed data type support for LayerNorm #81851

Closed
wants to merge 9 commits into from

Conversation

mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Jul 21, 2022

Stack from ghstack:

  1. If user uses amp to run bfloat16 models, torch.autocast will
    keep module paramters in acc dtype which will leave gamma andbeta
    in float while input/output will be in bfloat16.

  2. If user explicitly cast the model to bfloat16 such as:

  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)

The input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 21, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 1f7ca03 (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:14:23.2412274Z AssertionError: te...than the torch result was (1.666915112918943e-05)!
2022-09-01T11:14:23.2401792Z =================================== FAILURES ===================================
2022-09-01T11:14:23.2403012Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:14:23.2405142Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:14:23.2407410Z Traceback (most recent call last):
2022-09-01T11:14:23.2407749Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:14:23.2409903Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:14:23.2410484Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:14:23.2410907Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:14:23.2411360Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:14:23.2411690Z     raise self.failureException(msg)
2022-09-01T11:14:23.2412274Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:14:23.2656537Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:14:23.2659196Z =========================== short test summary info ============================
2022-09-01T11:14:23.2659603Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:14:23.2659975Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:14:23.2660882Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:14:23.2693015Z = 1 failed, 7083 passed, 4803 skipped, 105 xfailed, 121 warnings, 2 rerun in 373.44s (0:06:13) =
2022-09-01T11:14:23.4247651Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:14:24.7251990Z Traceback (most recent call last):
2022-09-01T11:14:24.7252264Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:14:24.7253687Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@mingfeima mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022
@yanbing-j yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022
@ezyang
Copy link
Contributor

ezyang commented Jul 31, 2022

test failure looks real

2022-07-22T03:41:59.6289214Z =================================== FAILURES ===================================
2022-07-22T03:41:59.6290167Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-07-22T03:41:59.6290866Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-07-22T03:41:59.6293201Z Traceback (most recent call last):
2022-07-22T03:41:59.6293527Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 496, in test_python_ref
2022-07-22T03:41:59.6293933Z     self._ref_test_helper(lambda: TorchRefsMode.push(strict=True), device, dtype, op)
2022-07-22T03:41:59.6294270Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 479, in _ref_test_helper
2022-07-22T03:41:59.6294568Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-07-22T03:41:59.6294920Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-07-22T03:41:59.6295169Z     raise self.failureException(msg)
2022-07-22T03:41:59.6295797Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-07-22T03:41:59.6536222Z 

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Ed commented 10 days ago, test_python_ref__refs_native_layer_norm_cpu_float32 seems real and related to the PR in question, isn't it?

@yanbing-j yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022
@mingfeima
Copy link
Collaborator Author

As Ed commented 10 days ago, test_python_ref__refs_native_layer_norm_cpu_float32 seems real and related to the PR in question, isn't it?

Yes, this failure is solid. This PR has dependency on this performance regression on pytorch/benchmark#1099. Will have the CI fixed when handling this performance regression.

Sorry for the late response, I have to deal with some difficult optimizations on pyg last few days. Will fix the issues on this stack asap.

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Sep 7, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81851

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 82dcf55:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mingfeima
Copy link
Collaborator Author

mingfeima commented Sep 8, 2022

Fix CI failure on test_python_ref__refs_native_layer_norm_cpu_float32. old failure is due to improper data type conversion.

@ezyang
Copy link
Contributor

ezyang commented Sep 8, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: PR #84404 has not been reviewed yet (Rule CPU ATen backend)

Details for Dev Infra team Raised by workflow job

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Sep 27, 2022

@pytorchbot merge -f "distributed failure looks unrelated, test was previously disabled"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: PR #84404 has not been reviewed yet (Rule CPU ATen backend)

Details for Dev Infra team Raised by workflow job

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@zhuhaozhe zhuhaozhe closed this Oct 20, 2022
@zhuhaozhe zhuhaozhe reopened this Oct 20, 2022
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Oct 25, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Nov 3, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Nov 3, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Nov 3, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Nov 4, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Nov 8, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: c7a01eb75c37dc13e875ae4e194558ced08c18ba
Pull Request resolved: pytorch#81851
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 28, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@mingfeima mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022
@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16 such as:
```
  x = torch.randn(n, t, c).bfloat16()
  ln = nn.LayerNorm(c).bfloat16()
  y = ln(x)
```
The input/output and gamma/beta will all be in bfloat16.

Pull Request resolved: pytorch#81851
Approved by: https://github.com/ezyang
@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/82/head branch June 8, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: nn release notes category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants