Fix evaluation issues by Tixxx · Pull Request #3538 · microsoft/onnxruntime

Tixxx · 2020-04-15T19:19:51Z

Description:
Fix evaluation issues

Motivation and Context
Override drop ratio in eval with 0 so that Dropout behaves as an Identity node so that evaluation results are correct.
Added unit tests for python frontend.

SherlockNoMad · 2020-04-16T21:00:58Z

Overall, I find this approach a bit too intrusive to the framework. It's injecting a kernel level control from the RunOption. I think this is breaking the original design principle that an op is self-descriptive and stateless. Input+Attribute alone should determine the behavior of an op.

If this flag is introduced, the op's behavior would depend on the config set by run time.

For dropout, we have a "ratio" input to determine if it should invoke training mode.
Similarly for BatchNorm-12, we introduced another input "is_training_mode".

To make these nodes to operate in the eval mode, we can override the inputs of there nodes. Initializer of a node can still be override with graph feeds.

jessebenson · 2020-04-17T00:42:51Z

I agree with Sherlock, and he summarized the design quite well.

There's a relatively small number of operators that need different behaviors in training vs inferencing/evaluation mode. We should favor controlling those behaviors with operator inputs, and if we need to control it per session.run(), then we can add that as a graph input. As Sherlock said, we can have an initializer (for the default value) and still add it as a graph input so we can override it with the input feeds.

This reverts commit a1c801c.

add dropout ratio node to graph input

orttraining/orttraining/core/session/training_session.cc

orttraining/orttraining/python/ort_trainer.py

liqunfu · 2020-04-29T02:09:14Z

Do not forget to rebase and target to the master branch

The base branch was changed.

Tixxx requested review from a team, SherlockNoMad and liqunfu April 15, 2020 19:19

Tixxx added the training issues related to ONNX Runtime training; typically submitted using template label Apr 15, 2020

allow switching between eval and training modes dynamically

a1c801c

Tixxx force-pushed the tix/fix_eval_step branch from cbf43ef to a1c801c Compare April 16, 2020 00:44

CircleCI and others added 5 commits April 17, 2020 09:38

Revert "allow switching between eval and training modes dynamically"

ed9a6ec

This reverts commit a1c801c.

override dropout ratio feed when doing eval

53fe199

added more testing

9b6b6a2

added unit tests

87f2e39

add dropout ratio node to graph input

improved unit test

14d831a

Tixxx changed the title ~~allow switching between eval and training modes dynamically~~ Fix evaluation issues Apr 23, 2020

CircleCI added 2 commits April 23, 2020 00:50

Merge branch 'ort_training' into tix/fix_eval_step

f68efb9

temporarily disable dropout unit test

1115967

liqunfu reviewed Apr 28, 2020

View reviewed changes

orttraining/orttraining/core/session/training_session.cc Show resolved Hide resolved

orttraining/orttraining/python/ort_trainer.py Outdated Show resolved Hide resolved

PR comments

090053a

liqunfu self-requested a review April 29, 2020 02:09

liqunfu previously approved these changes Apr 29, 2020

View reviewed changes

Tixxx changed the base branch from ort_training to master April 29, 2020 02:11

liqunfu approved these changes Apr 29, 2020

View reviewed changes

Tixxx merged commit 0638565 into master Apr 29, 2020

Tixxx deleted the tix/fix_eval_step branch April 29, 2020 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix evaluation issues#3538

Fix evaluation issues#3538
Tixxx merged 9 commits intomasterfrom
tix/fix_eval_step

Tixxx commented Apr 15, 2020 •

edited

Loading

Uh oh!

SherlockNoMad commented Apr 16, 2020

Uh oh!

jessebenson commented Apr 17, 2020

Uh oh!

Uh oh!

Uh oh!

liqunfu commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Tixxx commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SherlockNoMad commented Apr 16, 2020

Uh oh!

jessebenson commented Apr 17, 2020

Uh oh!

Uh oh!

Uh oh!

liqunfu commented Apr 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tixxx commented Apr 15, 2020 •

edited

Loading