Add the counter check for dynamo tests by JackCaoG · Pull Request #4603 · pytorch/xla

JackCaoG · 2023-02-10T03:11:38Z

Cleaning up the test and add the counter check back so it does not regress. There are some open issues but I want to get this one merge first

I have to remove the mark_step at the end of the optim_mod to get 3 graphs per step, I think this somehow has to do with grad
accessing data.grad.cpu() after backward after step 1 actually will trigger additional graph execution, I think this is due 1 above. It seems to me that input.grad does not fully materialized after the backward computation. It has pending IR like

print(torch_xla._XLAC._get_xla_tensors_text([data.grad]))
IR {
  %0 = f32[] prim::Constant(), xla_shape=f32[], value=1
  %1 = f32[4,3,224,224]{3,2,1,0} aten::expand(%0), xla_shape=f32[4,3,224,224]{3,2,1,0}, size=(4, 3, 224, 224)
  %2 = f32[4,3,224,224]{3,2,1,0} xla::device_data(), xla_shape=f32[4,3,224,224]{3,2,1,0}, device=CPU:0
  %3 = f32[4,3,224,224]{3,2,1,0} aten::mul(%2, %1), xla_shape=f32[4,3,224,224]{3,2,1,0}
  %4 = f32[4,3,224,224]{3,2,1,0} xla::device_data(), xla_shape=f32[4,3,224,224]{3,2,1,0}, device=CPU:0
  %5 = f32[4,3,224,224]{3,2,1,0} aten::add(%4, %3), xla_shape=f32[4,3,224,224]{3,2,1,0}, ROOT=0
}

FYI @shunting314 @wconstab

I think we didn't hit this issue in the torch bench because we only run for 1 step and we have mark_step between steps to clear things up. I will continue looking into this issue, I think the right thing to do is to add a optimizer step in the test I have, which should simulate the real use case and that can also capture the input.grad in the same graph.

FYI @shunting314 @wconstab

alanwaketan

LGTM.

shunting314 · 2023-02-10T19:11:27Z

Here is what I understand from the pending graph:

the mul is a nop since we multiply an tensor by 1
then the graph essentially only contains a add op

This looks very like gradient accumulation done by the autograd engine.

JackCaoG · 2023-02-10T19:16:02Z

@shunting314 yea, the part that's confusing to me is I would expect backward graph to contain the gradient accumulation but that might be in the optimizer graph?

shunting314 · 2023-02-10T20:29:12Z

Gradient accumulation should happen in the backward graph.

wconstab · 2023-02-10T20:59:29Z

I think gradient accumulation for one parameter happens when autograd determines that all the contributors to that parameter's grad are ready. In a simple case, there may be only one producer of a grad for a parameter, so you would expect autograd to fire its accumulate grad for that parameter soon after the backward op that produced the grad finished.

I believe, if the autograd accumulategrad is ready to fire while we are tracing the backward graph, it should be likely that we are capturing that in the backwards graph. Note that I suspect a race condition here- if we 'exit' the dynamo-AOT backward trace phase right before autograd decides to fire its hook, then instead of including it in our backward graph it would be in its own graph.

A separate reason that could cause delayed accumulategrad is if there are other forward ops using the parameter in question, meaning those corresponding separate backwards ops all have to finish before accumulategrad can fire. So it would be good to confirm which case it is.

JackCaoG · 2023-02-10T21:21:57Z

Thanks @wconstab for the input, I am wondering what's the best way to figure out which one is the case? AFAICT, dynamo only passes us two graphs, one for forward and one for backward and that part of logic is not controlled by the existing dynamo bridge. Should I dig into AOTAutograd's code and figure out how does it decide to pass xla the backward graph and force it to sleep for a while to make sure everything is captured, or is there any other way to achieve the similar goal?

wconstab · 2023-02-10T21:53:08Z

One tool you can play with is dumping the autograd graph. You'll be better off doing this on a toy model if you can repro there.

this PR has some illustration of how an autograd graph dump looks, and some context about how i used it to solve an issue in FSDP. You can ignore most of that just get an idea of what it looks like Special-case fsdp wrapped modules to be Unspecialized pytorch#89330
this code shows a simple way to produce the graph from a model https://github.com/pytorch/pytorch/blob/master/benchmarks/dynamo/distributed.py#LL23C47-L23C47

The graph dump idea is mainly to visualize which nodes are contributing grads to a given parameter. If you have more nodes contributing than you expected, maybe you are in the second case.

Should I dig into AOTAutograd's code and figure out how does it decide to pass xla the backward graph and force it to sleep for a while to make sure everything is captured

I don't know what a clean solution would look like. Indeed the thing that fires AccumulateGrad hooks during regular .backward() is a separate c++ thread in autograd engine, so it can come at any time with respect to python's tracing. There is probably some hook autograd fires after finishing all grad accumulation, which you could register a callback in python and wait for that before you exit the dynamo backward trace. But make sure there is no valid case where grads are not expected to all be ready before you wait.
cc @albanD - is there such a hook? any other ideas? Also, is aot-autograd the same as .backward, with a separate thread firing these accumulategrad operations?

shunting314 · 2023-02-10T22:07:19Z

Also, is aot-autograd the same as .backward, with a separate thread firing these accumulategrad operations?

Comment with what I see. aot-autograd calls .grad. Both .grad and .backward eventually calls into: Variable._execution_engine.run_backward which should run into the C++ autograd engine.

albanD · 2023-02-10T22:07:49Z

is there such a hook?

Yes, and funilly enough they're called engine callbacks haha. They will trigger at the very end of the backward pass once everything else has been executed.

Also, is aot-autograd the same as .backward, with a separate thread firing these accumulategrad operations?

There is no side thread after aot-autograd.
But we also enable single threaded autograd in there so there is never a side thread to begin with even when calling .backward. (https://pytorch.org/docs/master/generated/torch.autograd.set_multithreading_enabled.html?highlight=set_multithreading_enabled#torch.autograd.set_multithreading_enabled)

wconstab · 2023-02-10T22:12:18Z

There is no side thread after aot-autograd.

Interesting, so does that mean that

if all grads for a given parameter are ready during aot-autograd, then we gaurantee to also capture accumulategrad during aot-autograd trace?
or, even if grads are ready during aot-autograd, we do not expect accumulategrad to fire until someone actually calls .backwards() outside?

albanD · 2023-02-13T22:59:13Z

if grads are ready during aot-autograd, we do not expect accumulategrad to fire until someone actually calls .backwards() outside?

I'm not sure what you mean by that?

Add the counter check for dynamo tests

5b2d8c4

JackCaoG added the dynamo label Feb 10, 2023

JackCaoG requested review from alanwaketan, shunting314 and wonjoo-wj February 10, 2023 03:11

alanwaketan approved these changes Feb 10, 2023

View reviewed changes

wonjoo-wj approved these changes Feb 10, 2023

View reviewed changes

JackCaoG merged commit de2160a into master Feb 10, 2023

chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023

Add the counter check for dynamo tests (#4603)

1eea452

chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023

Add the counter check for dynamo tests (#4603)

d1608c3

mateuszlewko pushed a commit that referenced this pull request Mar 15, 2023

Add the counter check for dynamo tests (#4603)

cc1a733

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the counter check for dynamo tests#4603

Add the counter check for dynamo tests#4603
JackCaoG merged 1 commit into
masterfrom
JackCaoG/add_dynamo_test_counter_check

JackCaoG commented Feb 10, 2023 •

edited

Loading

Uh oh!

alanwaketan left a comment

Uh oh!

shunting314 commented Feb 10, 2023

Uh oh!

JackCaoG commented Feb 10, 2023

Uh oh!

shunting314 commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

JackCaoG commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

shunting314 commented Feb 10, 2023 •

edited

Loading

Uh oh!

albanD commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

albanD commented Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

JackCaoG commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanwaketan left a comment

Choose a reason for hiding this comment

Uh oh!

shunting314 commented Feb 10, 2023

Uh oh!

JackCaoG commented Feb 10, 2023

Uh oh!

shunting314 commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

JackCaoG commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

shunting314 commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Feb 10, 2023

Uh oh!

wconstab commented Feb 10, 2023

Uh oh!

albanD commented Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JackCaoG commented Feb 10, 2023 •

edited

Loading

shunting314 commented Feb 10, 2023 •

edited

Loading