-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Dropout and BatchNormalization Training-friendly #1887
Conversation
LGTM 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a breaking change, please bump up the opset version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please register the opset 10 batchnorm here: https://github.com/onnx/onnx/blob/master/onnx/defs/operator_sets.h#L533,L554
cc: @linkerzhang we need to enable the debug build in ci to detect the problem here and furthermore, we should fix the our op registry.
Do you see any issue with the 'is_train' input being defined separately on each operation, which could potentially result in some operations in training mode and some in testing mode in the same model? |
@hobei Imagine you have a model with two dropout node, one locates in the first few layer, one in the last few layer. If you wish to only fine tune the last few layers, one can set the second dropout node to train mode while keep the first dropout in the test mode. |
@SherlockNoMad |
@SherlockNoMad I think the proposed solution is sound, handling different behaviors of certain operators between training and inference modes. The question is how do we know we have covered all operators that need to have this optional input? Another question is what do we do with a Dropout/BatchNorm with is_train=true while creating/optimizing the inference graph? |
@hobei While await for Sherlock's response, I just want to provide my view... The feature of "defining the fine tune layers" seems a separate interesting topic. The is_train input is to determine the expected behavior at the individual operator level. To selectively fine tune certain layers, or some sub-graphs, during training, we might need to introduce something new. |
@chinhuang007 |
@hobei @chinhuang007 |
.Input( | ||
5, | ||
"is_train", | ||
"If set to nonzero, run spatial batch normalization in training mode, default is 0.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is_train is type tensor(boolean). Maybe we should consider using {true, false} instead.
@SherlockNoMad
Can you please elaborate on how the value of the new 'is_train' input relates to the number of outputs of batchnorm? As I understand it the number of outputs in a model is fixed, as they may be used as inputs to other operations. |
@hobei , I have updated the doc to further elaborate it. Output case #1: Y, mean, var, saved_mean, saved_var (training mode) The number of outputs is indeed fixed. During testing, the latter 4 outputs are not populated and should not be consumed by other nodes. |
ce96571
to
fb1f4aa
Compare
.Input( | ||
1, | ||
"ratio", | ||
"The ratio of random dropout, with value in [0, 1]. If this input was not set, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ratio's range should be [0,1)
Since #2568 is merged, we can close this PR. |
As we have discussed in the ONNX training working group, we wish to make ONNX ops' training friendly.
To start with, we are adding "is_train" as an input to Dropout and BatchNormalization.
Why as an input, why not attribute?
During training, we also need to performs evaluation periodically to check model's performance. This requires flipping the operation mode on-the-fly. As attribute is usually a constant value for a model, it would be tricky to override it during training. Exposing it as an input allows user to change the value through data feeding from model input.
Why 'is_train', why not 'is_test'?
In the previous versions of Dropout and BatchNormalization, the mode is named as 'is_test'. As ONNX is still mainly for inference purpose, it's better to have the default mode as inference, so that the change won't affect the existing models. By setting is_train = true, we enable the training mode. This is more intuitive than setting is_test = false.
This PR should also address issues #1042
@houseroad @pranavsharma @ebarsoum @linkerzhang @prasanthpul @yuanbyu