Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add channels last with mixed data type support for GroupNorm backward #89485

Closed
wants to merge 14 commits into from

Conversation

CaoE
Copy link
Collaborator

@CaoE CaoE commented Nov 22, 2022

Stack from ghstack (oldest at bottom):

Motivation

  1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
  2. Same as add mixed data type support for GroupNorm backward on CPU #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

Testing

Single socket (28cores):

  • Contiguous:
shape forward / s forward / s backward / s backward / s
  fp32 mixed fp32 bf16 fp32 mixed fp32 bf16
[10, 128, 20, 20] 3.20E-05 3.60E-05 8.31E-05 8.13E-05
[10, 128, 50, 50] 0.000126 0.000115 0.000356 0.000257
  • Channels Last:
shape forward / s forward / s backward / s backward / s
  fp32 mixed fp32 bf16 fp32 mixed fp32 bf16
[10, 128, 20, 20] 4.11E-05 4.12E-05 9.74E-05 9.66E-05
[10, 128, 50, 50] 0.000179 0.000178 0.000393 0.000317

Single core:

  • Contiguous:
shape forward / s forward / s backward / s backward / s
  fp32 mixed fp32 bf16 fp32 mixed fp32 bf16
[10, 128, 20, 20] 2.47E-04 2.53E-04 5.92E-04 4.50E-04
[10, 128, 50, 50] 0.001559 0.001384 0.004343 0.002436
  • Channels Last:
shape forward / s forward / s backward / s backward / s
  fp32 mixed fp32 bf16 fp32 mixed fp32 bf16
[10, 128, 20, 20] 2.27E-04 3.24E-04 0.0006224 0.000459
[10, 128, 50, 50] 0.00167 0.001278 0.0041858 0.003027

cc @ezyang @gchanan @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @VitalyFedyunin

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 22, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89485

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a8eb674:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Nov 22, 2022
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 22, 2022
CaoE added a commit that referenced this pull request Nov 22, 2022
ghstack-source-id: 1776f9ff792812955e00244bc25f03556e11dc85
Pull Request resolved: #89485
@CaoE CaoE marked this pull request as draft November 22, 2022 05:45
CaoE added a commit that referenced this pull request Nov 22, 2022
ghstack-source-id: dab8486ffbb576871851923259e660390b68c2c4
Pull Request resolved: #89485
CaoE added a commit that referenced this pull request Nov 22, 2022
ghstack-source-id: 8749cdc844892ba74b744a1d6250f553f934dcd8
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





[ghstack-poisoned]
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





[ghstack-poisoned]
CaoE added a commit that referenced this pull request Nov 23, 2022
ghstack-source-id: 7fcba9e0c6a1bb4808e20fc802f899857112b080
Pull Request resolved: #89485
CaoE added a commit that referenced this pull request Nov 23, 2022
ghstack-source-id: 4549af4231508abefd9e443b2d0a13be2f9817df
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





[ghstack-poisoned]
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





[ghstack-poisoned]
@CaoE CaoE added the intel This tag is for PR from Intel label Nov 30, 2022
CaoE added a commit that referenced this pull request Dec 5, 2022
ghstack-source-id: 065abc108b5be2e623ecb480a668ec27ac0d8d81
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 9, 2022
ghstack-source-id: bda3595d20f04f7f4905b9d526fcd85c94521f84
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@CaoE CaoE added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 13, 2022
CaoE added a commit that referenced this pull request Dec 13, 2022
ghstack-source-id: 9bb35db126bb6e252de852646bf1b2435f6c80e6
Pull Request resolved: #89485
@CaoE CaoE added the intel priority matters to intel architecture from performance wise label Dec 13, 2022
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@CaoE
Copy link
Collaborator Author

CaoE commented Dec 23, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#91332

Details for Dev Infra team Raised by workflow job

CaoE added a commit that referenced this pull request Dec 26, 2022
ghstack-source-id: e117d1e993249764aafda981bfaa152346ab1398
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 28, 2022
ghstack-source-id: fee09288d4da1ae17b0b18fc1ce0d4b8efdb6682
Pull Request resolved: #89485
CaoE added a commit that referenced this pull request Dec 28, 2022
ghstack-source-id: 95514958f41600c08ac65ff65338abf0039f3e42
Pull Request resolved: #89485
…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@CaoE
Copy link
Collaborator Author

CaoE commented Dec 29, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

…rm backward"



### Motivation
1. Add channels last support for GroupNorm backward to make sure GroupNorm fully support channels last.
2. Same as #88663, mixed data type support is also needed for channels last implementation of GroupNorm backward.

### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 3.20E-05 | 3.60E-05 | 8.31E-05 | 8.13E-05
[10, 128, 50, 50] | 0.000126 | 0.000115 | 0.000356 | 0.000257


* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 4.11E-05 | 4.12E-05 | 9.74E-05 | 9.66E-05
[10, 128, 50, 50] | 0.000179 | 0.000178 | 0.000393 | 0.000317


Single core:

* Contiguous:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.47E-04 | 2.53E-04 | 5.92E-04 | 4.50E-04
[10, 128, 50, 50] | 0.001559 | 0.001384 | 0.004343 | 0.002436

* Channels Last:

shape | forward / s | forward / s | backward / s | backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 bf16 | fp32 | mixed fp32 bf16
[10, 128, 20, 20] | 2.27E-04 | 3.24E-04 | 0.0006224 | 0.000459
[10, 128, 50, 50] | 0.00167 | 0.001278 | 0.0041858 | 0.003027





cc VitalyFedyunin jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@CaoE
Copy link
Collaborator Author

CaoE commented Dec 29, 2022

@malfet Is there any adjustments that can be made in later PRs. Thanks for your any suggestions.

@jgong5
Copy link
Collaborator

jgong5 commented Dec 29, 2022

@malfet Is there any adjustments that can be made in later PRs. Thanks for your any suggestions.

@malfet In particular, may I know what code duplicate you were concerned about in this PR. We can address this in the follow-up PR. Thanks.

@malfet malfet added the module: bc-breaking Related to a BC-breaking change label Jan 19, 2023
@pytorch-bot pytorch-bot bot added the topic: bc breaking topic category label Jan 19, 2023
@malfet
Copy link
Contributor

malfet commented Jan 19, 2023

This PR regressed training step for https://github.com/lucidrains/DALLE2-pytorch, see #92166

@CaoE
Copy link
Collaborator Author

CaoE commented Jan 20, 2023

@malfet Sorry for the incontinence. Can we move the .grads[0].contiguous and input.contiguous from https://github.com/pytorch/pytorch/pull/89485/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8cL1171 to the native_group_norm_backward function in aten/src/ATen/native/group_norm.cpp and add memory format check for X and dY . In this way, we can neither break the channel last on CPU nor affect CUDA.

@malfet
Copy link
Contributor

malfet commented Jan 20, 2023

@malfet Sorry for the incontinence. Can we move the .grads[0].contiguous and input.contiguous from https://github.com/pytorch/pytorch/pull/89485/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8cL1171 to the native_group_norm_backward function in aten/src/ATen/native/group_norm.cpp and add memory format check for X and dY . In this way, we can neither break the channel last on CPU nor affect CUDA.

That's what I'm doing: forcing the continuous on the CUDA side, but also adding the unit test so it would not regress again

@CaoE
Copy link
Collaborator Author

CaoE commented Jan 20, 2023

@malfet Thank you very much ! I will submit a PR for the issue.

@CaoE
Copy link
Collaborator Author

CaoE commented Jan 20, 2023

@malfet I submitted a PR #92668 to fix it but I have no CUDA machine to verify if the issue is fixed. Could you please help to check it ?

malfet added a commit that referenced this pull request Jan 20, 2023
Fixes regression introduced by #89485

Adds test to prevent those regressions from happening in the future
In process, discovered that GroupNormBackwards on CPU does not produce
the same results if input and gradient memory_format is different
@malfet
Copy link
Contributor

malfet commented Jan 20, 2023

@CaoE This sound very similar to the approach I've taken in #92671 and it fixes the problem for CUDA, but surprisingly I had to skip part of the test on CPU, as I think current implementation silently produces wrong results, but probably requires some further refactoring...

pytorchmergebot pushed a commit that referenced this pull request Jan 20, 2023
Fixes regression introduced by #89485

Adds test to prevent those regressions from happening in the future In process, discovered that GroupNormBackwards on CPU does not produce the same results if input and gradient memory_format is different

Fixes #92166

Pull Request resolved: #92671
Approved by: https://github.com/ngimel, https://github.com/xuzhao9
@facebook-github-bot facebook-github-bot deleted the gh/CaoE/7/head branch June 8, 2023 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged module: bc-breaking Related to a BC-breaking change module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: nn release notes category topic: bc breaking topic category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants