[pytorch] correct input size check for GroupNorm #33008

vkhalidov · 2020-02-05T14:23:31Z

Summary:
Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed.

Details:
The current implementation disallows GroupNorm to be applied to tensors of shape e.g. (1, C, 1, 1) to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros.
However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape (1, 256, 1, 1) for GroupNorm(32, 256), the statistics will be computed over 8 elements and thus be meaningful.

One use case is Atrous Spatial Pyramid Pooling (ASPPPooling), where GroupNorm could be used in place of BatchNorm here. However, now this is prohibited and results in failures.

Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size.

Test Plan: check that existing tests pass

Differential Revision: D19723407

facebook-github-bot · 2020-02-05T14:23:58Z

This pull request was exported from Phabricator. Differential Revision: D19723407

ezyang · 2020-02-06T22:08:36Z

I'm not really the right reviewer for this, but this needs a test at least.

ezyang · 2020-02-06T22:10:58Z

@v0dro, you added the checks, could you please take a look at this? Thanks!

Summary: Pull Request resolved: pytorch#33008 Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed. **Details**: The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros. However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful. One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L50), where GroupNorm could be used in place of BatchNorm [here](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L55). However, now this is prohibited and results in failures. Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size. Test Plan: check that existing tests pass Differential Revision: D19723407 fbshipit-source-id: 8d241d5fd8ceb32b6c985e1a501e0c806f2626bf

facebook-github-bot · 2020-02-13T23:01:11Z

This pull request was exported from Phabricator. Differential Revision: D19723407

dr-ci · 2020-02-14T00:20:39Z

💊 CircleCI build failures summary and remediations

As of commit 9983b73:

None of the build failures appear to be your fault.

1/1 recognized as flaky ❄️
- Re-run these jobs?

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

❄️ 1 failure recognized as flaky

The following build failures have been detected as flaky and may not be your fault:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | pattern match details) ❄️

AssertionError: 3072 not less than or equal to 1e-05 :

---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_cuda.py", line 278, in test_memory_stats 
    for _ in self._test_memory_stats_generator(self): 
  File "test_cuda.py", line 207, in _test_memory_stats_generator 
    assert_change(0, empty_cache=True) 
  File "test_cuda.py", line 172, in assert_change 
    self.assertEqual(new_m, last_m_arr[0]) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 893, in assertEqual 
    super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
AssertionError: 3072 not less than or equal to 1e-05 :  
 
---------------------------------------------------------------------- 
Ran 108 tests in 45.382s 
 
FAILED (failures=1, skipped=51) 
Traceback (most recent call last): 
  File "run_test.py", line 486, in <module> 
    main() 
  File "run_test.py", line 479, in main 
    raise RuntimeError(message)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

v0dro

Looks good to me. The reason given for the changes seems legit.

facebook-github-bot · 2020-02-18T15:13:16Z

This pull request has been merged in cfb4862.

Summary: Pull Request resolved: pytorch#33008 Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed. **Details**: The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros. However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful. One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L50), where GroupNorm could be used in place of BatchNorm [here](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L55). However, now this is prohibited and results in failures. Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size. Test Plan: check that existing tests pass Reviewed By: fmassa Differential Revision: D19723407 fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c

vkhalidov requested a review from apaszke as a code owner February 5, 2020 14:23

facebook-github-bot added the fb-exported label Feb 5, 2020

vkhalidov requested a review from ezyang February 6, 2020 08:00

ezyang requested a review from v0dro February 6, 2020 22:10

ezyang removed their request for review February 6, 2020 22:11

vkhalidov force-pushed the export-D19723407 branch from 82b9d60 to 9983b73 Compare February 13, 2020 23:01

v0dro reviewed Feb 14, 2020

View reviewed changes

ezyang approved these changes Feb 14, 2020

View reviewed changes

facebook-github-bot closed this in cfb4862 Feb 18, 2020

facebook-github-bot added the merged label Feb 18, 2020

vkhalidov deleted the export-D19723407 branch February 18, 2020 15:16

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytorch] correct input size check for GroupNorm #33008

[pytorch] correct input size check for GroupNorm #33008

Uh oh!

vkhalidov commented Feb 5, 2020

Uh oh!

facebook-github-bot commented Feb 5, 2020

Uh oh!

ezyang commented Feb 6, 2020

Uh oh!

ezyang commented Feb 6, 2020

Uh oh!

facebook-github-bot commented Feb 13, 2020

Uh oh!

dr-ci bot commented Feb 14, 2020

Uh oh!

v0dro left a comment

Uh oh!

facebook-github-bot commented Feb 18, 2020

Uh oh!

Uh oh!

[pytorch] correct input size check for GroupNorm #33008

[pytorch] correct input size check for GroupNorm #33008

Uh oh!

Conversation

vkhalidov commented Feb 5, 2020

Uh oh!

facebook-github-bot commented Feb 5, 2020

Uh oh!

ezyang commented Feb 6, 2020

Uh oh!

ezyang commented Feb 6, 2020

Uh oh!

facebook-github-bot commented Feb 13, 2020

Uh oh!

dr-ci bot commented Feb 14, 2020

💊 CircleCI build failures summary and remediations

Detailed failure analysis

❄️ 1 failure recognized as flaky

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Uh oh!

v0dro left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 18, 2020

Uh oh!

Uh oh!