add mixed data type support for GroupNorm #81852

mingfeima · 2022-07-21T03:13:53Z

Stack from ghstack:

-> add mixed data type support for GroupNorm #81852

If user uses amp to run bfloat16 models, torch.autocast will
keep module paramters in acc dtype which will leave gamma andbeta
in float while input/output will be in bfloat16.
If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

facebook-github-bot · 2022-07-21T03:13:59Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 1 New Failures

As of commit 624f804 (more details on the Dr. CI page):

Expand to see more

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:12:13.8946754Z AssertionError: te...than the torch result was (1.666915112918943e-05)!

2022-09-01T11:12:13.8932744Z =================================== FAILURES ===================================
2022-09-01T11:12:13.8936464Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:12:13.8941186Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:12:13.8942504Z Traceback (most recent call last):
2022-09-01T11:12:13.8942919Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:12:13.8944395Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:12:13.8944806Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:12:13.8945336Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:12:13.8945935Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:12:13.8946206Z     raise self.failureException(msg)
2022-09-01T11:12:13.8946754Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.9171705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:12:13.9174156Z =========================== short test summary info ============================
2022-09-01T11:12:13.9174684Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:12:13.9178815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:12:13.9179407Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:12:13.9210278Z = 1 failed, 6894 passed, 4798 skipped, 105 xfailed, 121 warnings, 2 rerun in 349.85s (0:05:49) =
2022-09-01T11:12:14.0678482Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:12:15.3915917Z Traceback (most recent call last):
2022-09-01T11:12:15.3916583Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:12:15.3918129Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 31fb4afe21fc0afb240eb9038501dfdd2933a57b Pull Request resolved: #81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 9a20d8f52a8579c8bef69abf367be35eba34cc93 Pull Request resolved: #81852

aten/src/ATen/native/cpu/group_norm_kernel.cpp

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: d6c20b15e0d7da0fee1cd765031f43a209921bdc Pull Request resolved: #81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: d6c20b15e0d7da0fee1cd765031f43a209921bdc Pull Request resolved: pytorch#81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

pytorch-bot · 2022-09-08T05:38:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d7f5ccd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2022-09-08T05:38:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: e9198d5ade2a328bbfddfbdd3394e202c259ae7c Pull Request resolved: #81852

Seems to be caused by incompatible types in group_norm when we use autocast. Patch group_norm to cast the weights to the same type as the inputs From what I can understand all the other repos just switch to full precision instead of addressing this. I think this would make things slower but I'm not sure. So maybe the patching solution I'm doing is better? pytorch/pytorch#81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: e402edfac1d5252a979217a2ccb6412ea58839cc Pull Request resolved: #81852

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 02e74b34efad7e03e616d314ef928b695d266c2f Pull Request resolved: #81852

mingfeima · 2022-12-14T04:36:47Z

@pytorchbot merge

pytorchmergebot · 2022-12-14T04:38:24Z

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
shreyanb98, 842974287, Xirider, husthyc, aruntonic, ...

Details for Dev Infra team

Raised by workflow job

aten/src/ATen/native/cpu/group_norm_kernel.cpp

malfet

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 31172bb8d22486bdaf0681693f04e5052da71cc9 Pull Request resolved: #81852

mingfeima · 2022-12-19T02:01:06Z

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:
template<scalar_t>
void CallGroupNorm(...)
And specialize it for BFloat16 to call mixed_type

We will add minxed_type for float16 in near future for the coming new platform of Xeon (Granite Rapids).

Also I noticed that vec_scalar_t is a duplicate now, will have it replaced with opmath_t.

mingfeima · 2022-12-19T07:57:56Z

@pytorchbot merge

pytorchmergebot · 2022-12-19T07:59:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Seems to be caused by incompatible types in group_norm when we use autocast. Patch group_norm to cast the weights to the same type as the inputs From what I can understand all the other repos just switch to full precision instead of addressing this. I think this would make things slower but I'm not sure. So maybe the patching solution I'm doing is better? pytorch/pytorch#81852

This was referenced Jul 21, 2022

fix RowwiseMoments vectorization issue on CPU #81849

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #81850

Closed

facebook-github-bot added the cla signed label Jul 21, 2022

mingfeima mentioned this pull request Jul 21, 2022

add mixed data type support for LayerNorm #81851

Closed

pytorchbot added the open source label Jul 21, 2022

mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022

yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022

frank-wei reviewed Jul 28, 2022

View reviewed changes

aten/src/ATen/native/cpu/group_norm_kernel.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/cpu/group_norm_kernel.cpp Show resolved Hide resolved

yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022

This was referenced Sep 1, 2022

fix RowwiseMoments vectorization issue on CPU #84404

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

Closed

pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022

mingfeima requested a review from frank-wei September 8, 2022 05:58

brycedrennan mentioned this pull request Sep 22, 2022

Bugfixes + per-prompt tile mode brycedrennan/imaginAIry#18

Merged

mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022

CaoE added a commit that referenced this pull request Dec 5, 2022

Update base for Update on "add mixed data type support for LayerNorm …

f9fbbc4

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

CaoE added a commit that referenced this pull request Dec 5, 2022

Update on "add mixed data type support for LayerNorm and GroupNorm"

7bc8f43

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

CaoE added a commit that referenced this pull request Dec 9, 2022

Update base for Update on "add mixed data type support for LayerNorm …

24a6ab5

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

CaoE added a commit that referenced this pull request Dec 9, 2022

Update on "add mixed data type support for LayerNorm and GroupNorm"

5c691e4

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

CaoE added a commit that referenced this pull request Dec 13, 2022

Update on "add mixed data type support for LayerNorm and GroupNorm"

84e82f8

This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima requested review from jgong5 and malfet December 13, 2022 05:42

mingfeima added the intel priority matters to intel architecture from performance wise label Dec 13, 2022

jgong5 approved these changes Dec 14, 2022

View reviewed changes

malfet reviewed Dec 14, 2022

View reviewed changes

aten/src/ATen/native/cpu/group_norm_kernel.cpp Outdated Show resolved Hide resolved

malfet approved these changes Dec 14, 2022

View reviewed changes

This was referenced Dec 14, 2022

add mixed data type support for GroupNorm backward on CPU #88663

Closed

add channels last with mixed data type support for GroupNorm backward #89485

Closed

pytorchmergebot added the Merged label Dec 19, 2022

pytorchmergebot closed this in 4bf22fc Dec 19, 2022

facebook-github-bot deleted the gh/mingfeima/83/head branch June 8, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add mixed data type support for GroupNorm #81852

add mixed data type support for GroupNorm #81852

mingfeima commented Jul 21, 2022 •

edited

facebook-github-bot commented Jul 21, 2022 •

edited

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

pytorch-bot bot commented Sep 8, 2022 •

edited

pytorch-bot bot commented Sep 8, 2022

mingfeima commented Dec 14, 2022

pytorchmergebot commented Dec 14, 2022

malfet left a comment

mingfeima commented Dec 19, 2022

mingfeima commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

add mixed data type support for GroupNorm #81852

add mixed data type support for GroupNorm #81852

Conversation

mingfeima commented Jul 21, 2022 • edited

facebook-github-bot commented Jul 21, 2022 • edited

🔗 Helpful links

❌ 1 New Failures

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

pytorch-bot bot commented Sep 8, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

✅ No Failures

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

mingfeima commented Dec 14, 2022

pytorchmergebot commented Dec 14, 2022

Merge failed

malfet left a comment

Choose a reason for hiding this comment

mingfeima commented Dec 19, 2022

mingfeima commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

Merge started

mingfeima commented Jul 21, 2022 •

edited

facebook-github-bot commented Jul 21, 2022 •

edited

pytorch-bot bot commented Sep 8, 2022 •

edited