Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Similar implementation like BMM & ADDMM, the bias tensor is using the packed weights, similar to MM, but increases the index via the z-dim to get more matrices in the batch.
Packed bias (input of MM):
Packed bias (input of BMM):
To support broadcasting, the bias packing of
mm
is slightly different than weight packing, which repeats the single element in height-dim twice to fill the 4 planes (see code for details). The width-dim doesn’t repeat twice, but the code still works, because stacking 3 planes together with the last one empty yields the same 3D image.However, this doesn’t work for
bmm
, since it’s a series of{4 planes} {4 planes} … {4 planes}
, and each{4 planes}
represents a matrix, so only 3 planes completely mess up the indexing. Thus, I repeat the single element in width-dim as well to fill all 4 planes to have the correct indexing.https://pytorch.org/docs/stable/generated/torch.baddbmm.html
Test Plan:
Reviewed By: yipjustin
Differential Revision: D49402181