Skip to content

Fix evaluation issues#3538

Merged
Tixxx merged 9 commits intomasterfrom
tix/fix_eval_step
Apr 29, 2020
Merged

Fix evaluation issues#3538
Tixxx merged 9 commits intomasterfrom
tix/fix_eval_step

Conversation

@Tixxx
Copy link
Contributor

@Tixxx Tixxx commented Apr 15, 2020

Description:
Fix evaluation issues

Motivation and Context
Override drop ratio in eval with 0 so that Dropout behaves as an Identity node so that evaluation results are correct.
Added unit tests for python frontend.

@Tixxx Tixxx requested review from a team, SherlockNoMad and liqunfu April 15, 2020 19:19
@Tixxx Tixxx added the training issues related to ONNX Runtime training; typically submitted using template label Apr 15, 2020
@Tixxx Tixxx force-pushed the tix/fix_eval_step branch from cbf43ef to a1c801c Compare April 16, 2020 00:44
@SherlockNoMad
Copy link
Contributor

Overall, I find this approach a bit too intrusive to the framework. It's injecting a kernel level control from the RunOption. I think this is breaking the original design principle that an op is self-descriptive and stateless. Input+Attribute alone should determine the behavior of an op.

If this flag is introduced, the op's behavior would depend on the config set by run time.

For dropout, we have a "ratio" input to determine if it should invoke training mode.
Similarly for BatchNorm-12, we introduced another input "is_training_mode".

To make these nodes to operate in the eval mode, we can override the inputs of there nodes. Initializer of a node can still be override with graph feeds.

@jessebenson
Copy link
Member

I agree with Sherlock, and he summarized the design quite well.

There's a relatively small number of operators that need different behaviors in training vs inferencing/evaluation mode. We should favor controlling those behaviors with operator inputs, and if we need to control it per session.run(), then we can add that as a graph input. As Sherlock said, we can have an initializer (for the default value) and still add it as a graph input so we can override it with the input feeds.

@Tixxx Tixxx changed the title allow switching between eval and training modes dynamically Fix evaluation issues Apr 23, 2020
@liqunfu
Copy link
Contributor

liqunfu commented Apr 29, 2020

Do not forget to rebase and target to the master branch

@liqunfu liqunfu self-requested a review April 29, 2020 02:09
liqunfu
liqunfu previously approved these changes Apr 29, 2020
@Tixxx Tixxx changed the base branch from ort_training to master April 29, 2020 02:11
@Tixxx Tixxx dismissed liqunfu’s stale review April 29, 2020 02:11

The base branch was changed.

@Tixxx Tixxx merged commit 0638565 into master Apr 29, 2020
@Tixxx Tixxx deleted the tix/fix_eval_step branch April 29, 2020 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants