-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Dirac init compatibility with group convolutions #32825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CircleCI build failures summary and remediationsAs of commit 94b539d:
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. 🕵️ 3 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakage:
|
a94f554
to
b2539e8
Compare
@vincentqb do you think you would be able to review this? (No is a fine answer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you do the following instead?
x = torch.randn([1, 3, 3, 3])
print('input:\n', x)
conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=1, bias=False)
torch.nn.init.dirac_(conv_layer.weight)
print('\noutput (before this PR):\n',conv_layer(x))
input:
tensor([[[[-0.4437, -0.2448, 1.4811],
[ 0.2103, 0.2234, 0.2437],
[-2.6258, 0.2899, -0.1680]],
[[-0.3531, -0.3139, -3.1256],
[ 0.5868, 0.3861, -0.0573],
[ 1.8914, 0.2475, 0.2421]],
[[-0.8827, 0.4264, -0.4944],
[ 1.8532, 0.0400, 1.4679],
[ 1.8510, 2.3384, 0.8307]]]])
output (before this PR):
tensor([[[[-0.4437, -0.2448, 1.4811],
[ 0.2103, 0.2234, 0.2437],
[-2.6258, 0.2899, -0.1680]],
[[-0.3531, -0.3139, -3.1256],
[ 0.5868, 0.3861, -0.0573],
[ 1.8914, 0.2475, 0.2421]],
[[-0.8827, 0.4264, -0.4944],
[ 1.8532, 0.0400, 1.4679],
[ 1.8510, 2.3384, 0.8307]]]], grad_fn=<MkldnnConvolutionBackward>)
I'm assuming the win is to do the identity by groups when out_channels
is different from input_channels
? But then this can be explicitly achieved by assigning with advanced indexing on a zero matrix of the final shape, no?
i.e. assign the sub tensor of
tensor([[[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]],
[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]],
[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]],
[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>)
the values of
tensor([[[[ 1.2205, -0.6608, 0.8640],
[-0.5464, 1.1288, 1.4726],
[-0.6693, 0.4000, -1.7613]],
[[-0.8760, -0.8814, -0.4705],
[ 0.6283, -0.5943, 0.6873],
[-0.6852, 1.4723, 0.3325]]]])
to get
tensor([[[[ 1.2205, -0.6608, 0.8640],
[-0.5464, 1.1288, 1.4726],
[-0.6693, 0.4000, -1.7613]],
[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]],
[[-0.8760, -0.8814, -0.4705],
[ 0.6283, -0.5943, 0.6873],
[-0.6852, 1.4723, 0.3325]],
[[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>)
Moreover, if torch.nn.init.dirac_
supports groups
, should torch.nn.init.*
also?
@vincentqb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestions are valid if you wanted to do one time operation, but if you want to initialize a group conv layer, so changing back to groups=1 misses the point. Using indexing also. There is no problem copying a tensor but the point is initializing weights when using a group conv and do it consistently.
Moreover, currently what you get for a group conv initialized with Dirac doesn't make sense.
Reg init.*, Other initializations don't suffer from this inconsistency, since they don't need to create a structure that is channel aware like a dirc matrix, they simply fill with numbers all the channels without cross-dependencies so I think Dirac is the only one requires group support.
Sounds good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@vincentqb |
None of the tests failing seem related. Landing. |
@vincentqb merged this pull request in 2c99ea8. |
Summary: Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense: ``` x = torch.randn([1, 3, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=3, bias=False) torch.nn.init.dirac_(conv_layer.weight.data) print('\noutput (before this PR):\n',conv_layer(x)) input: tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]]) output (before this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ```` This PR allows introducing groups to the initialization: ``` torch.nn.init.dirac_(conv_layer.weight.data, groups=3) print('output (after this PR):\n', conv_layer(x)) output (after this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]], grad_fn=<MkldnnConvolutionBackward>) ``` When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately: ``` x = torch.randn([1, 2, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(2, 4, 3, padding=1, groups=2, bias=False) torch.nn.init.dirac_(conv_layer.weight.data, groups=2) print('\noutput:\n', conv_layer(x)) input: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]]]]) output: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ``` Argument 'groups' defaults to 1 so it is backward compatible. Tests are modified to include cases of with groups>1 but also contain groups=1 cases. Pull Request resolved: pytorch#32825 Differential Revision: D19859926 Pulled By: vincentqb fbshipit-source-id: 9dfdd24471ff14d79c442dfd28c1891aff812fdf
Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense:
This PR allows introducing groups to the initialization:
When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately:
Argument 'groups' defaults to 1 so it is backward compatible.
Tests are modified to include cases of with groups>1 but also contain groups=1 cases.