-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate no-spatial mode of BN #1637
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this bring us any benefit besides less attribute? I would like to keep this potential for enabling non-spatial BN. Since it's available in CuDNN, this hint is useful to enable the special handling for non-spatial cases. We have an issue to track this optimization in Caffe2: pytorch/pytorch#14185
@houseroad, the purpose of this PR is to simplify BN spec by removing the no-spatial mode. One can view BN in its most generic form as an op to compute statistics across a subset of the input tensor's dimensions. Dimensions not involved in statistics computation are grouped into a single dimension as channel. Therefore, it is unnecessary to call out spatial and non-spatial modes. |
@houseroad please any feedback? Thanks |
LGTM! |
ONNX opset 9 removed `spatial` attribute of BatchNormalization. onnx/onnx#1637 Since the default value of `spatial` has been True for the whole time, just dropping the attribute would be the simplest and the most robust fix to handle the update.
* deprecate spatial mode of BN * merge with master * deprecate no-spatial mode of BN - update TestCoverage.md
* deprecate spatial mode of BN * merge with master * deprecate no-spatial mode of BN - update TestCoverage.md
Most AI frameworks do not have non-spatial mode for BN. For example, all 3 ops in Pytorch (torch.nn.BatchNorm1d, torch.nn.BatchNorm2d, torch.nn.BatchNorm3d) are spatial mode. In TF, BN is treaded in a flexible way that statistics are computed over dimensions whose size if 1. In all cases, what is needed is to specify dimensions over which statistics are computed. With this update, dimensions except the channel dimension are dimensions over which statistics are computed. We therefore deprecate the non-spatial mode.