Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Dropout and BatchNormalization Training-friendly #1887

Closed
wants to merge 9 commits into from

Conversation

SherlockNoMad
Copy link
Contributor

As we have discussed in the ONNX training working group, we wish to make ONNX ops' training friendly.
To start with, we are adding "is_train" as an input to Dropout and BatchNormalization.

  • Why as an input, why not attribute?
    During training, we also need to performs evaluation periodically to check model's performance. This requires flipping the operation mode on-the-fly. As attribute is usually a constant value for a model, it would be tricky to override it during training. Exposing it as an input allows user to change the value through data feeding from model input.

  • Why 'is_train', why not 'is_test'?
    In the previous versions of Dropout and BatchNormalization, the mode is named as 'is_test'. As ONNX is still mainly for inference purpose, it's better to have the default mode as inference, so that the change won't affect the existing models. By setting is_train = true, we enable the training mode. This is more intuitive than setting is_test = false.

This PR should also address issues #1042

@houseroad @pranavsharma @ebarsoum @linkerzhang @prasanthpul @yuanbyu

@pranavsharma
Copy link
Contributor

LGTM 👍

Copy link
Member

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a breaking change, please bump up the opset version.

Copy link
Member

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please register the opset 10 batchnorm here: https://github.com/onnx/onnx/blob/master/onnx/defs/operator_sets.h#L533,L554

cc: @linkerzhang we need to enable the debug build in ci to detect the problem here and furthermore, we should fix the our op registry.

@hobei
Copy link

hobei commented Apr 3, 2019

Do you see any issue with the 'is_train' input being defined separately on each operation, which could potentially result in some operations in training mode and some in testing mode in the same model?

@SherlockNoMad
Copy link
Contributor Author

@hobei
Hi Simon, I actually see this as an advantage to have separated control on each op.

Imagine you have a model with two dropout node, one locates in the first few layer, one in the last few layer. If you wish to only fine tune the last few layers, one can set the second dropout node to train mode while keep the first dropout in the test mode.

@hobei
Copy link

hobei commented Apr 4, 2019

@SherlockNoMad
Hi Sherlock. I agree that this is a potential feature. My concern is how to make it consistent if you would want apply the same control to operations that do not have the 'is_train' input. How do you identify to fine tune layers which do not contain Dropout and Batchnorm?

@chinhuang007
Copy link
Contributor

@SherlockNoMad I think the proposed solution is sound, handling different behaviors of certain operators between training and inference modes. The question is how do we know we have covered all operators that need to have this optional input? Another question is what do we do with a Dropout/BatchNorm with is_train=true while creating/optimizing the inference graph?

@chinhuang007
Copy link
Contributor

@hobei While await for Sherlock's response, I just want to provide my view... The feature of "defining the fine tune layers" seems a separate interesting topic. The is_train input is to determine the expected behavior at the individual operator level. To selectively fine tune certain layers, or some sub-graphs, during training, we might need to introduce something new.

@SherlockNoMad
Copy link
Contributor Author

@chinhuang007
I have gone through the operator list and only identified these two operators behave differently during inference and training. As we encounter more such operators, we will need to add this optional input for them. Hopefully, there are not many of them.
Inference graph should have is_train set to false, or left empty. This should be handled by the model constructor/converter/optimizer.

@SherlockNoMad
Copy link
Contributor Author

@hobei @chinhuang007
Identifying the layers to fine tune is indeed a separate topic. In my opinion, it should be handle by the training runtime/backend.

.Input(
5,
"is_train",
"If set to nonzero, run spatial batch normalization in training mode, default is 0.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is_train is type tensor(boolean). Maybe we should consider using {true, false} instead.

@hobei
Copy link

hobei commented Apr 10, 2019

@SherlockNoMad
I have another comment, specifically for batch norm. When the is_test attribute was removed from op set 7 it was replaced (as I understand it) with the number of outputs instead indicating the mode (training/testing).

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Can you please elaborate on how the value of the new 'is_train' input relates to the number of outputs of batchnorm? As I understand it the number of outputs in a model is fixed, as they may be used as inputs to other operations.

@SherlockNoMad
Copy link
Contributor Author

@hobei , I have updated the doc to further elaborate it.

Output case #1: Y, mean, var, saved_mean, saved_var (training mode)
Output case #2: Y (test mode, where other outputs will not be populated)

The number of outputs is indeed fixed. During testing, the latter 4 outputs are not populated and should not be consumed by other nodes.

@CLAassistant
Copy link

CLAassistant commented Jul 24, 2019

CLA assistant check
All committers have signed the CLA.

@prasanthpul prasanthpul added this to the 1.7 milestone Aug 20, 2019
.Input(
1,
"ratio",
"The ratio of random dropout, with value in [0, 1]. If this input was not set, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ratio's range should be [0,1)

@wschin
Copy link
Contributor

wschin commented Feb 18, 2020

Since #2568 is merged, we can close this PR.

@wschin wschin closed this Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants