Skip to content

[pytorch] correct input size check for GroupNorm #33008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

vkhalidov
Copy link
Contributor

Summary:
Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed.

Details:
The current implementation disallows GroupNorm to be applied to tensors of shape e.g. (1, C, 1, 1) to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros.
However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape (1, 256, 1, 1) for GroupNorm(32, 256), the statistics will be computed over 8 elements and thus be meaningful.

One use case is Atrous Spatial Pyramid Pooling (ASPPPooling), where GroupNorm could be used in place of BatchNorm here. However, now this is prohibited and results in failures.

Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size.

Test Plan: check that existing tests pass

Differential Revision: D19723407

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19723407

@vkhalidov vkhalidov requested a review from ezyang February 6, 2020 08:00
@ezyang
Copy link
Contributor

ezyang commented Feb 6, 2020

I'm not really the right reviewer for this, but this needs a test at least.

@ezyang ezyang requested a review from v0dro February 6, 2020 22:10
@ezyang
Copy link
Contributor

ezyang commented Feb 6, 2020

@v0dro, you added the checks, could you please take a look at this? Thanks!

@ezyang ezyang removed their request for review February 6, 2020 22:11
Summary:
Pull Request resolved: pytorch#33008

Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed.

**Details**:
The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros.
However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful.

One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L50), where GroupNorm could be used in place of BatchNorm [here](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L55). However, now this is prohibited and results in failures.

Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size.

Test Plan: check that existing tests pass

Differential Revision: D19723407

fbshipit-source-id: 8d241d5fd8ceb32b6c985e1a501e0c806f2626bf
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19723407

@dr-ci
Copy link

dr-ci bot commented Feb 14, 2020

💊 CircleCI build failures summary and remediations

As of commit 9983b73:

None of the build failures appear to be your fault.

  • 1/1 recognized as flaky ❄️
    • Re-run these jobs?

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

❄️ 1 failure recognized as flaky

The following build failures have been detected as flaky and may not be your fault:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | pattern match details) ❄️

AssertionError: 3072 not less than or equal to 1e-05 :
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_cuda.py", line 278, in test_memory_stats 
    for _ in self._test_memory_stats_generator(self): 
  File "test_cuda.py", line 207, in _test_memory_stats_generator 
    assert_change(0, empty_cache=True) 
  File "test_cuda.py", line 172, in assert_change 
    self.assertEqual(new_m, last_m_arr[0]) 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 893, in assertEqual 
    super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
AssertionError: 3072 not less than or equal to 1e-05 :  
 
---------------------------------------------------------------------- 
Ran 108 tests in 45.382s 
 
FAILED (failures=1, skipped=51) 
Traceback (most recent call last): 
  File "run_test.py", line 486, in <module> 
    main() 
  File "run_test.py", line 479, in main 
    raise RuntimeError(message) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

Copy link
Contributor

@v0dro v0dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The reason given for the changes seems legit.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in cfb4862.

@vkhalidov vkhalidov deleted the export-D19723407 branch February 18, 2020 15:16
ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
Pull Request resolved: pytorch#33008

Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed.

**Details**:
The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros.
However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful.

One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L50), where GroupNorm could be used in place of BatchNorm [here](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L55). However, now this is prohibited and results in failures.

Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size.

Test Plan: check that existing tests pass

Reviewed By: fmassa

Differential Revision: D19723407

fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants