Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mixed data type support for GroupNorm #81852

Closed
wants to merge 12 commits into from

Conversation

mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Jul 21, 2022

Stack from ghstack:

  1. If user uses amp to run bfloat16 models, torch.autocast will
    keep module paramters in acc dtype which will leave gamma andbeta
    in float while input/output will be in bfloat16.

  2. If user explicitly cast the model to bfloat16,
    the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 21, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 624f804 (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:12:13.8946754Z AssertionError: te...than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.8932744Z =================================== FAILURES ===================================
2022-09-01T11:12:13.8936464Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:12:13.8941186Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:12:13.8942504Z Traceback (most recent call last):
2022-09-01T11:12:13.8942919Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:12:13.8944395Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:12:13.8944806Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:12:13.8945336Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:12:13.8945935Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:12:13.8946206Z     raise self.failureException(msg)
2022-09-01T11:12:13.8946754Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.9171705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:12:13.9174156Z =========================== short test summary info ============================
2022-09-01T11:12:13.9174684Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:12:13.9178815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:12:13.9179407Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:12:13.9210278Z = 1 failed, 6894 passed, 4798 skipped, 105 xfailed, 121 warnings, 2 rerun in 349.85s (0:05:49) =
2022-09-01T11:12:14.0678482Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:12:15.3915917Z Traceback (most recent call last):
2022-09-01T11:12:15.3916583Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:12:15.3918129Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Jul 21, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 31fb4afe21fc0afb240eb9038501dfdd2933a57b
Pull Request resolved: #81852
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Jul 22, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 9a20d8f52a8579c8bef69abf367be35eba34cc93
Pull Request resolved: #81852
@mingfeima mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022
@yanbing-j yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022
@yanbing-j yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022
mingfeima added a commit that referenced this pull request Sep 1, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: d6c20b15e0d7da0fee1cd765031f43a209921bdc
Pull Request resolved: #81852
CaoE pushed a commit to CaoE/pytorch that referenced this pull request Sep 7, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: d6c20b15e0d7da0fee1cd765031f43a209921bdc
Pull Request resolved: pytorch#81852
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d7f5ccd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Sep 8, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: e9198d5ade2a328bbfddfbdd3394e202c259ae7c
Pull Request resolved: #81852
brycedrennan added a commit to brycedrennan/imaginAIry that referenced this pull request Sep 22, 2022
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
mingfeima added a commit that referenced this pull request Nov 29, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: e402edfac1d5252a979217a2ccb6412ea58839cc
Pull Request resolved: #81852
@mingfeima mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022
CaoE added a commit that referenced this pull request Dec 5, 2022
…and GroupNorm"




This PR is cherry-picked from #84404 ~ #81852.

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 5, 2022


This PR is cherry-picked from #84404 ~ #81852.

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 9, 2022
…and GroupNorm"




This PR is cherry-picked from #84404 ~ #81852.

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 9, 2022


This PR is cherry-picked from #84404 ~ #81852.

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 13, 2022
…and GroupNorm"




This PR is cherry-picked from #84404 ~ #81852.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 13, 2022


This PR is cherry-picked from #84404 ~ #81852.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@mingfeima mingfeima added the intel priority matters to intel architecture from performance wise label Dec 13, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 13, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 02e74b34efad7e03e616d314ef928b695d266c2f
Pull Request resolved: #81852
@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
shreyanb98, 842974287, Xirider, husthyc, aruntonic, ...

Details for Dev Infra team Raised by workflow job

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 19, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 31172bb8d22486bdaf0681693f04e5052da71cc9
Pull Request resolved: #81852
@mingfeima
Copy link
Collaborator Author

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

We will add minxed_type for float16 in near future for the coming new platform of Xeon (Granite Rapids).

Also I noticed that vec_scalar_t is a duplicate now, will have it replaced with opmath_t.

@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/83/head branch June 8, 2023 18:01
mattstern31 added a commit to mattstern31/imagin-AIry-Python that referenced this pull request Nov 11, 2023
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: nn release notes category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

9 participants