GatherElementsGrad CPU Kernel and TopKGrad CPU/CUDA Kernel by Lafi7e · Pull Request #5511 · microsoft/onnxruntime

Lafi7e · 2020-10-16T07:59:23Z

Add GatherElementsGrad CPU kernel and TopKGrad CPU/CUDA kernel using Scatter. Also fix a bug on Scatter CUDA implementation.

Motivation and Context

It's required by one of customer's models, which is small and only run on CPU.

SherlockNoMad · 2020-10-16T16:33:41Z

+      .SinceVersion(1)
+      .SetSupportLevel(OpSchema::SupportType::EXPERIMENTAL)
+      .SetDoc("TopKGrad")
+      .AllowUncheckedAttributes()


need the axis attribute?

The Indices output is index on a specific axis

Do you know if it would be possible to use GatherGrad (or maybe GatherElementsGrad... I'm not sure) to implement this gradient, instead of adding a new kernel?

I did some tests with PyTorch and found that the following equivalence seemed to hold:

x = torch.rand([2, 3, 4, 5]) topk_values, topk_indices = x.topk(3, dim=2) gather_values = x.gather(topk_indices, dim=2) print((topk_values == gather_values).all()) # prints "tensor(True)"

I think this implies that we could use the gradient of gather to implement the gradient of top-k, but there could be a corner case that I haven't seen. It would be nice to reuse GatherGrad though, because there's been a lot of performance engineering on it recently.

Thanks @mrry ! You are right that GatherElementsGrad can be reused here according to the Op definition. But current GatherElementsGrad doesn't have CPU kernel implementation, then my CPU code can be changed to GatherElementsGrad. :-) Then both GatherElementsGrad and TopKGrad will have both CPU and CUDA. Another bad news is when I use GatherElementsGrad for TopKGrad, it failed one of my UT using CUDA (my CPU implementation works OK), I then added a new UT with same data size and attributes for GatherElementsGrad and it also failed. It means current CUDA implementation has bug somewhere when axis attribute is not default. I will investigate and fix it.

Just pushed a new version to use Scatter CPU/CUDA impl for both GatherElementsGrad and TopKGrad.

SherlockNoMad · 2020-10-16T16:36:44Z

      outputs.push_back(GI(i));
    } else {
-      outputs.push_back(ArgDef("", nullptr));
+      outputs.push_back(IA("ConvInput_" + I(i).name));


Derek had a fix for this. The gradient for inputs are made optional, so this change is probably not needed anymore.

Maybe we can keep my change here as it reads better then empty string.

Just for my education: will it still be treated as an optional output if there's a non-empty string there? (If so, I'm happy with putting a more descriptive string in the graph.)

@mrry , I had a quick test and indeed empty string can skip the output calculation while non-empty string cannot. I've rollbacked the change. Thanks!

Thanks for testing this! I wonder if this means there are other places in the code base where we give names to unused outputs and end up doing wasted computation :).

SherlockNoMad · 2020-10-16T16:37:24Z

+      NodeDef(OpDef{"TopKGrad", kMSDomain, 1},
+              {GO(0), O(1), I(0)},
+              {GI(0)},
+              SrcNodeAttributes())};


Shall we make the copy of axis attributed explicit, as don't need the other two attributes.

TopKGrad CPU kernel

20748b9

Lafi7e added the training issues related to ONNX Runtime training; typically submitted using template label Oct 16, 2020

Lafi7e requested review from SherlockNoMad, mrry and nbcsm October 16, 2020 07:59

Lafi7e requested a review from a team as a code owner October 16, 2020 07:59

SherlockNoMad reviewed Oct 16, 2020

View reviewed changes

use Scatter for GatherElementsGrad and TopKGrad.

c239274

Lafi7e changed the title ~~TopKGrad CPU kernel~~ GatherElementsGrad CPU Kernel and TopKGrad CPU/CUDA Kernel Oct 19, 2020

rollback convgrad change.

29c2df0

mrry approved these changes Oct 20, 2020

View reviewed changes

Lafi7e merged commit b48f596 into master Oct 21, 2020

Lafi7e deleted the weicwang/topkgradcpu branch October 21, 2020 01:29

Lafi7e mentioned this pull request Jun 13, 2022

[CUDA] GatherElements[Grad]/ScatterElements Bugfix and Perf Improve #11374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GatherElementsGrad CPU Kernel and TopKGrad CPU/CUDA Kernel#5511

GatherElementsGrad CPU Kernel and TopKGrad CPU/CUDA Kernel#5511
Lafi7e merged 3 commits into
masterfrom
weicwang/topkgradcpu

Lafi7e commented Oct 16, 2020 •

edited

Loading

Uh oh!

SherlockNoMad Oct 16, 2020

Uh oh!

mrry Oct 16, 2020

Uh oh!

Lafi7e Oct 17, 2020

Uh oh!

Lafi7e Oct 19, 2020

Uh oh!

SherlockNoMad Oct 16, 2020

Uh oh!

Lafi7e Oct 19, 2020

Uh oh!

mrry Oct 19, 2020

Uh oh!

Lafi7e Oct 20, 2020

Uh oh!

mrry Oct 20, 2020

Uh oh!

SherlockNoMad Oct 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lafi7e commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lafi7e commented Oct 16, 2020 •

edited

Loading