Autograd Function Fallback bug fix - moe support #8105

pengwa · 2021-06-20T12:20:56Z

Description: Autograd Function Fallback bug fix - moe support

Support forward inputs orders like "Non_tensor/Tensor/Non_tensor". Correspondingly, support "None/Tensor_Grad/None" for backward outputs.
Report RuntimeError when PythonOp detected but _enable_custom_autograd_function is NOT enabled.
Simplify the attributed used by PythonOpGrad. Renaming some attribute to reflect whether it is used for tensor inputs or all (tensor + non-tensor) inputs.

Attached the PythonOP/PythonOpGrad schemas (and how those attributes used) after the change:

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

…rrespondingly, support "None/Tensor_Grad/None" fpr backward outputs.

…d_function is enabled.

…raining\python\training\ortmodule\__init__.py (1 issue)"

…->input_requires_grads

orttraining/orttraining/python/training/ortmodule/__init__.py

orttraining/orttraining/core/graph/training_op_defs.cc

onnxruntime/core/language_interop_ops/torch/torch_proxy.cc

tlh20

A few minor comments on places we could extend assertion checks. Adjacent to some of the changes in this PR there is a use of "static" that looks suspicious -- could you check if that is correct?

onnxruntime/core/language_interop_ops/torch/torch_proxy.cc

orttraining/orttraining/core/graph/gradient_builder.cc

orttraining/orttraining/core/graph/training_op_defs.cc

orttraining/orttraining/training_ops/cpu/torch/torch_custom_function_kernel_base.cc

Co-authored-by: Tim Harris <tiharr@microsoft.com>

orttraining/orttraining/core/graph/gradient_builder.cc

Refine the schema description Co-authored-by: Tim Harris <tiharr@microsoft.com>

…nxruntime into pengwa/autograd_moe

…o pengwa/autograd_moe

pengwa · 2021-07-03T05:31:55Z

A few minor comments on places we could extend assertion checks. Adjacent to some of the changes in this PR there is a use of "static" that looks suspicious -- could you check if that is correct?

Thanks @tlh20 for your time reviewing , I have addressed all of comments . :)

pengwa added 3 commits June 20, 2021 12:16

Support forward inputs orders like "Non_tensor/Tensor/Non_tensor". Co…

cfeaad9

…rrespondingly, support "None/Tensor_Grad/None" fpr backward outputs.

Report RuntimeError when PythonOp detected but _enable_custom_autogra…

446d66b

…d_function is enabled.

Fix "PoliCheck ] - Defect : Term "hang", Component : orttraining\ortt…

3ed9e6f

…raining\python\training\ortmodule\__init__.py (1 issue)"

pengwa requested review from wschin and SherlockNoMad June 20, 2021 12:20

pengwa requested review from baijumeswani, BowenBao, liqunfu, mrry, spandantiwari, thiagocrepaldi and a team as code owners June 20, 2021 12:20

pengwa requested a review from tlh20 June 20, 2021 12:22

pengwa added the component:ortmodule label Jun 20, 2021

pengwa requested a review from nbcsm June 20, 2021 12:22

pengwa added 2 commits June 21, 2021 01:30

rename call_convention->input_convention, input_tensor_requires_grads…

b66839b

…->input_requires_grads

fix minor comment

b761ca0

thiagocrepaldi reviewed Jun 21, 2021

View reviewed changes

orttraining/orttraining/python/training/ortmodule/__init__.py Outdated Show resolved Hide resolved

revert polycheck fix in case of conflict

48b5742

wschin reviewed Jul 1, 2021

View reviewed changes

orttraining/orttraining/core/graph/training_op_defs.cc Outdated Show resolved Hide resolved

wschin reviewed Jul 1, 2021

View reviewed changes

onnxruntime/core/language_interop_ops/torch/torch_proxy.cc Show resolved Hide resolved

tlh20 requested changes Jul 2, 2021

View reviewed changes

Update orttraining/orttraining/core/graph/training_op_defs.cc

7ed72bc

Co-authored-by: Tim Harris <tiharr@microsoft.com>

SherlockNoMad added the training issues related to ONNX Runtime training; typically submitted using template label Jul 2, 2021

SherlockNoMad reviewed Jul 2, 2021

View reviewed changes

orttraining/orttraining/core/graph/gradient_builder.cc Show resolved Hide resolved

SherlockNoMad previously approved these changes Jul 2, 2021

View reviewed changes

pengwa dismissed SherlockNoMad’s stale review via 50dba46 July 3, 2021 02:45

pengwa and others added 3 commits July 3, 2021 10:45

Apply suggestions from code review

50dba46

Refine the schema description Co-authored-by: Tim Harris <tiharr@microsoft.com>

Resolve review comments

8070c04

Merge branch 'pengwa/autograd_moe' of https://github.com/microsoft/on…

dc94b4b

…nxruntime into pengwa/autograd_moe

Merge branch 'master' of https://github.com/microsoft/onnxruntime int…

c24cbe9

…o pengwa/autograd_moe

tlh20 approved these changes Jul 6, 2021

View reviewed changes

baijumeswani approved these changes Jul 7, 2021

View reviewed changes

pengwa merged commit 2347a0a into master Jul 7, 2021

pengwa deleted the pengwa/autograd_moe branch July 7, 2021 00:58

garymm mentioned this pull request Jun 16, 2022

Checker should validate the node's inputs/outputs have names when its formal parameter is Variadic onnx/onnx#3979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autograd Function Fallback bug fix - moe support #8105

Autograd Function Fallback bug fix - moe support #8105

pengwa commented Jun 20, 2021 •

edited

Loading

tlh20 left a comment

pengwa commented Jul 3, 2021

Autograd Function Fallback bug fix - moe support #8105

Autograd Function Fallback bug fix - moe support #8105

Conversation

pengwa commented Jun 20, 2021 • edited Loading

tlh20 left a comment

Choose a reason for hiding this comment

pengwa commented Jul 3, 2021

pengwa commented Jun 20, 2021 •

edited

Loading