[compiled autograd] use same graph node names as AOTDispatcher #133148

xmfan · 2024-08-09T23:47:25Z

Stack from ghstack (oldest at bottom):

Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, trace_wrapped HOPs, so the graphs may be slightly offset from each other

hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html
fsdp2 example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpPdKssS/rank_0/index.html
Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html

 ===== Compiled autograd graph =====                                                                                                                                                          
 <eval_with_key>.14 class CompiledAutograd(torch.nn.Module):                                                                                                                                  
    def forward(self, inputs, sizes, scalars, hooks):                                                                                                                                         
        # No stacktrace found for following nodes                                                                                                                                             
        getitem: "f32[]cpu" = inputs[0]                                                                                                                                                       
        aot1_primals_1: "f32[4]cpu" = inputs[1]                                                                                                                                               
        aot1_primals_2: "f32[4]cpu" = inputs[2]                                                                                                                                               
        aot0_sin: "f32[4]cpu" = inputs[3]                                                                                                                                                     
        aot0_cos: "f32[4]cpu" = inputs[4]                                                                                                                                                     
        getitem_5: "f32[4]cpu" = inputs[5];  inputs = None                                                                                                                                    
                                                                                                                                                                                              
         # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1)                                                       
        expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]);  getitem = None                                                                                                    
                                                                                                                                                                                              
         # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2)                                          
        aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format);  expand = None                                                          
        aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2);  aot1_primals_2 = None                                                                                          
        aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1);  aot1_sin_1 = None                                                                                                    
        aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg);  aot1_neg = None                                                                                 
        aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1);  aot1_primals_1 = None                                                                                          
        aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1);  aot1_tangents_1 = aot1_cos_1 = None                                                           
                                                                                                                                                                                              
         # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3)                                          
        aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin);  aot0_sin = None                                                                                                        
        aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg);  aot0_tangents_2 = aot0_neg = None                                                                      
        aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos);  aot0_tangents_1 = aot0_cos = None                                                                    
        aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1);  aot0_mul = aot0_mul_1 = None                                                                                
                                                                                                                                                                                              
         # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4)                                    
        accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add);  getitem_5 = aot0_add = accumulate_grad_ = None                                                  
        _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub();  _exec_final_callbacks_stub = None                                                            
        return []

where aot1 is

class GraphModule(torch.nn.Module):
    def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"):
         # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos()
        sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2);  primals_2 = None
        neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1);  sin_1 = None
        mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg);  neg = None
        cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1);  primals_1 = None
        mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1);  tangents_1 = cos_1 = None
        return (mul_1, mul)

and aot0 is

class GraphModule(torch.nn.Module):
    def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"):
         # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos()
        neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin);  sin = None
        mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg);  tangents_2 = neg = None
        
         # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin()
        mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos);  tangents_1 = cos = None
        
         # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin()
        add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1);  mul = mul_1 = None
        return (add,)

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-08-09T23:47:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133148

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 09bf218 with merge base e7b870c ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / linux-jammy-cpu-py3.12-gcc11-inductor-halide / test (inductor-halide, 1, 1, linux.12xlarge) (gh) (similar failure)
inductor/test_halide.py::CpuHalideTests::test_scalar_output_cpu
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
'test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

FIXES #132939 Assumes AOTAutograd naming scheme starts with primals_1 and increments the suffix id ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_primals_1: "f32[4]cpu" = inputs[3] aot0_primals_2: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) clone: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None sin: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None neg: "f32[4]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(clone, neg); neg = None cos: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(clone, cos); clone = cos = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) neg_1: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_primals_1); aot0_primals_1 = None mul_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(mul, neg_1); mul = neg_1 = None mul_3: "f32[4]cpu" = torch.ops.aten.mul.Tensor(mul_1, aot0_primals_2); mul_1 = aot0_primals_2 = None add: "f32[4]cpu" = torch.ops.aten.add.Tensor(mul_2, mul_3); mul_2 = mul_3 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, add); getitem_5 = add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: b877c03 Pull Request resolved: #133148

test/inductor/test_compiled_autograd.py

FIXES #132939 Assumes AOTAutograd naming scheme starts with primals_1 and increments the suffix id https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpBBRr3Z/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_primals_1: "f32[4]cpu" = inputs[3] aot0_primals_2: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) clone: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None sin: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None neg: "f32[4]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(clone, neg); neg = None cos: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(clone, cos); clone = cos = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) neg_1: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_primals_1); aot0_primals_1 = None mul_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(mul, neg_1); mul = neg_1 = None mul_3: "f32[4]cpu" = torch.ops.aten.mul.Tensor(mul_1, aot0_primals_2); mul_1 = aot0_primals_2 = None add: "f32[4]cpu" = torch.ops.aten.add.Tensor(mul_2, mul_3); mul_2 = mul_3 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:396 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, add); getitem_5 = add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…cher" FIXES #132939 Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, so the graphs may be slightly offset from each other hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_sin: "f32[4]cpu" = inputs[3] aot0_cos: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1); aot1_sin_1 = None aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg); aot1_neg = None aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1); aot1_tangents_1 = aot1_cos_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin); aot0_sin = None aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg); aot0_tangents_2 = aot0_neg = None aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos); aot0_tangents_1 = aot0_cos = None aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1); aot0_mul = aot0_mul_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add); getitem_5 = aot0_add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` where aot1 is ```python class GraphModule(torch.nn.Module): def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos() sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2); primals_2 = None neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1); sin_1 = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg); neg = None cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1); tangents_1 = cos_1 = None return (mul_1, mul) ``` and aot0 is ```python class GraphModule(torch.nn.Module): def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos() neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg); tangents_2 = neg = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos); tangents_1 = cos = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1); mul = mul_1 = None return (add,) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 07f5b41 Pull Request resolved: #133148

…cher" FIXES #132939 Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, so the graphs may be slightly offset from each other hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_sin: "f32[4]cpu" = inputs[3] aot0_cos: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1); aot1_sin_1 = None aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg); aot1_neg = None aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1); aot1_tangents_1 = aot1_cos_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin); aot0_sin = None aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg); aot0_tangents_2 = aot0_neg = None aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos); aot0_tangents_1 = aot0_cos = None aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1); aot0_mul = aot0_mul_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add); getitem_5 = aot0_add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` where aot1 is ```python class GraphModule(torch.nn.Module): def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos() sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2); primals_2 = None neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1); sin_1 = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg); neg = None cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1); tangents_1 = cos_1 = None return (mul_1, mul) ``` and aot0 is ```python class GraphModule(torch.nn.Module): def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos() neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg); tangents_2 = neg = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos); tangents_1 = cos = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1); mul = mul_1 = None return (add,) ``` cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…cher" FIXES #132939 Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, trace_wrapped HOPs, so the graphs may be slightly offset from each other hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html fsdp2 example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpPdKssS/rank_0/index.html Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_sin: "f32[4]cpu" = inputs[3] aot0_cos: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1); aot1_sin_1 = None aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg); aot1_neg = None aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1); aot1_tangents_1 = aot1_cos_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin); aot0_sin = None aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg); aot0_tangents_2 = aot0_neg = None aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos); aot0_tangents_1 = aot0_cos = None aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1); aot0_mul = aot0_mul_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add); getitem_5 = aot0_add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` where aot1 is ```python class GraphModule(torch.nn.Module): def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos() sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2); primals_2 = None neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1); sin_1 = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg); neg = None cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1); tangents_1 = cos_1 = None return (mul_1, mul) ``` and aot0 is ```python class GraphModule(torch.nn.Module): def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos() neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg); tangents_2 = neg = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos); tangents_1 = cos = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1); mul = mul_1 = None return (add,) ``` cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: ea6af35 Pull Request resolved: #133148

…cher" FIXES #132939 Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, trace_wrapped HOPs, so the graphs may be slightly offset from each other hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html fsdp2 example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpPdKssS/rank_0/index.html Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_sin: "f32[4]cpu" = inputs[3] aot0_cos: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1); aot1_sin_1 = None aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg); aot1_neg = None aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1); aot1_tangents_1 = aot1_cos_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin); aot0_sin = None aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg); aot0_tangents_2 = aot0_neg = None aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos); aot0_tangents_1 = aot0_cos = None aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1); aot0_mul = aot0_mul_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add); getitem_5 = aot0_add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` where aot1 is ```python class GraphModule(torch.nn.Module): def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos() sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2); primals_2 = None neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1); sin_1 = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg); neg = None cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1); tangents_1 = cos_1 = None return (mul_1, mul) ``` and aot0 is ```python class GraphModule(torch.nn.Module): def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos() neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg); tangents_2 = neg = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos); tangents_1 = cos = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1); mul = mul_1 = None return (add,) ``` cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

jansel · 2024-08-15T23:03:22Z

torch/_dynamo/compiled_autograd.py

+
+        def is_similar(a: torch.fx.node.Node, b: torch.fx.node.Node):
+            if callable(a.target) and callable(b.target):
+                target_match = a.target.__qualname__ == b.target.__qualname__


are all callables in python guaranteed to have a __qualname__? (include C extension functions?)

any functions yes, but for instances of classes that defined call, we have to set them manually. this actually breaks for HigherOrderOperator which doesn't set qualname, i'll move the fix into this PR

Maybe getattr(a.target, "__qualname__", "<unset>")

gonna use name instead since we still want to rename HOPs

…cher" FIXES #132939 Compiled autograd's trace of the AOT backward may result in some additional ops e.g. clone to make contiguous, trace_wrapped HOPs, so the graphs may be slightly offset from each other hf_Whisper example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpNv89Pu/index.html fsdp2 example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpPdKssS/rank_0/index.html Unit test example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpvoQsnl/index.html ```python ===== Compiled autograd graph ===== <eval_with_key>.14 class CompiledAutograd(torch.nn.Module): def forward(self, inputs, sizes, scalars, hooks): # No stacktrace found for following nodes getitem: "f32[]cpu" = inputs[0] aot1_primals_1: "f32[4]cpu" = inputs[1] aot1_primals_2: "f32[4]cpu" = inputs[2] aot0_sin: "f32[4]cpu" = inputs[3] aot0_cos: "f32[4]cpu" = inputs[4] getitem_5: "f32[4]cpu" = inputs[5]; inputs = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: SumBackward0 (NodeCall 1) expand: "f32[4]cpu" = torch.ops.aten.expand.default(getitem, [4]); getitem = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward1 (NodeCall 2) aot1_tangents_1: "f32[4]cpu" = torch.ops.aten.clone.default(expand, memory_format = torch.contiguous_format); expand = None aot1_sin_1: "f32[4]cpu" = torch.ops.aten.sin.default(aot1_primals_2); aot1_primals_2 = None aot1_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot1_sin_1); aot1_sin_1 = None aot0_tangents_2: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_neg); aot1_neg = None aot1_cos_1: "f32[4]cpu" = torch.ops.aten.cos.default(aot1_primals_1); aot1_primals_1 = None aot0_tangents_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot1_tangents_1, aot1_cos_1); aot1_tangents_1 = aot1_cos_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: CompiledFunctionBackward0 (NodeCall 3) aot0_neg: "f32[4]cpu" = torch.ops.aten.neg.default(aot0_sin); aot0_sin = None aot0_mul: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_2, aot0_neg); aot0_tangents_2 = aot0_neg = None aot0_mul_1: "f32[4]cpu" = torch.ops.aten.mul.Tensor(aot0_tangents_1, aot0_cos); aot0_tangents_1 = aot0_cos = None aot0_add: "f32[4]cpu" = torch.ops.aten.add.Tensor(aot0_mul, aot0_mul_1); aot0_mul = aot0_mul_1 = None # File: /data/users/xmfan/a/pytorch/torch/_dynamo/compiled_autograd.py:444 in set_node_origin, code: torch::autograd::AccumulateGrad (NodeCall 4) accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_5, aot0_add); getitem_5 = aot0_add = accumulate_grad_ = None _exec_final_callbacks_stub = torch__dynamo_external_utils__exec_final_callbacks_stub(); _exec_final_callbacks_stub = None return [] ``` where aot1 is ```python class GraphModule(torch.nn.Module): def forward(self, primals_1: "f32[4][1]cpu", primals_2: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2233 in torch_dynamo_resume_in_f_at_2232, code: return tmp1.sin() + tmp2.cos() sin_1: "f32[4][1]cpu" = torch.ops.aten.sin.default(primals_2); primals_2 = None neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin_1); sin_1 = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, neg); neg = None cos_1: "f32[4][1]cpu" = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos_1); tangents_1 = cos_1 = None return (mul_1, mul) ``` and aot0 is ```python class GraphModule(torch.nn.Module): def forward(self, sin: "f32[4][1]cpu", cos: "f32[4][1]cpu", tangents_1: "f32[4][1]cpu", tangents_2: "f32[4][1]cpu"): # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2231 in f, code: tmp2 = x.cos() neg: "f32[4][1]cpu" = torch.ops.aten.neg.default(sin); sin = None mul: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_2, neg); tangents_2 = neg = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() mul_1: "f32[4][1]cpu" = torch.ops.aten.mul.Tensor(tangents_1, cos); tangents_1 = cos = None # File: /data/users/xmfan/a/pytorch/test/inductor/test_compiled_autograd.py:2230 in f, code: tmp1 = x.sin() add: "f32[4][1]cpu" = torch.ops.aten.add.Tensor(mul, mul_1); mul = mul_1 = None return (add,) ``` cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Pull Request resolved: #133541 Approved by: https://github.com/yf225, https://github.com/bdhirsh ghstack dependencies: #133115, #133148

…ch#133541) Pull Request resolved: pytorch#133541 Approved by: https://github.com/yf225, https://github.com/bdhirsh ghstack dependencies: pytorch#133115, pytorch#133148

…44202) This error started popping up in HUD CA benchmarks: ```python File "/data/users/xmfan/core/b/pytorch/torch/_dynamo/compiled_autograd.py", line 371, in dce self.fx_tracer.graph.eliminate_dead_code(is_impure) File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1862, in eliminate_dead_code self.lint() File "/data/users/xmfan/core/b/pytorch/torch/fx/graph.py", line 1753, in lint raise RuntimeError(f"Node redefined name {node.name}!") RuntimeError: Node redefined name aot0_expand! ``` We added CA initial capture's renaming (#133148) to help debug issues with AOT backward, but it errors out when we have multiple instances of the same AOT backward. This likely only showed up now because of increased hierarchical graph reuse. I fix it by adding a postfix counter to the node name Pull Request resolved: #144202 Approved by: https://github.com/bdhirsh, https://github.com/jansel

[compiled autograd] rename AOTAutograd primals graph nodes

64d24b2

[ghstack-poisoned]

xmfan mentioned this pull request Aug 9, 2024

[compiled autograd] log aot id for CompiledFunctionBackward #133115

Closed

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Aug 9, 2024

xmfan marked this pull request as ready for review August 9, 2024 23:49

xmfan requested review from bdhirsh and jansel August 9, 2024 23:49

xmfan added a commit that referenced this pull request Aug 9, 2024

[compiled autograd] rename AOTAutograd primals graph nodes

f73cb7a

ghstack-source-id: b877c03 Pull Request resolved: #133148

xmfan marked this pull request as draft August 10, 2024 00:03

yf225 reviewed Aug 13, 2024

View reviewed changes

test/inductor/test_compiled_autograd.py Show resolved Hide resolved

xmfan changed the title ~~[compiled autograd] rename AOTAutograd primals graph nodes~~ [compiled autograd] rename AOTDispatcher graph nodes Aug 13, 2024

xmfan changed the title ~~[compiled autograd] rename AOTDispatcher graph nodes~~ [compiled autograd] use same graph node names as AOTDispatcher Aug 13, 2024

xmfan marked this pull request as ready for review August 13, 2024 23:14

xmfan marked this pull request as draft August 13, 2024 23:56

xmfan added a commit that referenced this pull request Aug 14, 2024

[compiled autograd] rename AOTDispatcher graph nodes

419e477

ghstack-source-id: 07f5b41 Pull Request resolved: #133148

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Aug 14, 2024

xmfan marked this pull request as ready for review August 15, 2024 00:40

xmfan added a commit that referenced this pull request Aug 15, 2024

[compiled autograd] rename AOTDispatcher graph nodes

eeaf939

ghstack-source-id: ea6af35 Pull Request resolved: #133148

xmfan mentioned this pull request Aug 15, 2024

[compiled autograd] move non-hot path logs into default logger #133541

Closed

bdhirsh mentioned this pull request Aug 15, 2024

Compiled autograd: log which saved activations came from which fw graph regions #132939

Closed

jansel approved these changes Aug 15, 2024

View reviewed changes

xmfan added the release notes: dynamo label Aug 16, 2024

pytorchmergebot added the Merged label Aug 17, 2024

pytorchmergebot closed this in 0a6cc15 Aug 17, 2024

github-actions bot deleted the gh/xmfan/76/head branch September 28, 2024 02:08

xmfan mentioned this pull request Jan 6, 2025

[ca] dedup node names when AOT bwd graph is reused multiple times #144202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiled autograd] use same graph node names as AOTDispatcher #133148

[compiled autograd] use same graph node names as AOTDispatcher #133148

Uh oh!

xmfan commented Aug 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading

Uh oh!

Uh oh!

jansel Aug 15, 2024

Uh oh!

xmfan Aug 15, 2024

Uh oh!

jansel Aug 16, 2024

Uh oh!

xmfan Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[compiled autograd] use same graph node names as AOTDispatcher #133148

[compiled autograd] use same graph node names as AOTDispatcher #133148

Uh oh!

Conversation

xmfan commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133148

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

jansel Aug 15, 2024

Choose a reason for hiding this comment

Uh oh!

xmfan Aug 15, 2024

Choose a reason for hiding this comment

Uh oh!

jansel Aug 16, 2024

Choose a reason for hiding this comment

Uh oh!

xmfan Aug 16, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xmfan commented Aug 9, 2024 •

edited

Loading

pytorch-bot bot commented Aug 9, 2024 •

edited

Loading