Fix Bad Interaction Between Aligning Inputs and CUDAGraphs in Backward (LayoutLMForMaskedLM)

### 🐛 Describe the bug

```python benchmarks/dynamo/huggingface.py --device cuda --performance --backend inductor --amp --training --only LayoutLMForMaskedLM```

Gives an assertion error:

```
│                                                                                                │
│    888 │   │   │   │   continue                                                                │
│    889 │   │   │   if data_ptr is not None:                                                    │
│    890 │   │   │   │   # static input, e.g., parameter                                         │
│ ❱  891 │   │   │   │   assert data_ptr == new_inputs[idx].data_ptr()                           │
│    892 │   │   │   else:                                                                       │
│    893 │   │   │   │   # non-static input, need to copy it into CUDA graph                     │
│    894 │   │   │   │   dst = self.reconstructed_inputs[idx]                                    │
```

Our cudagraph implementation will assume certain tensors have a fixed location - Parameters, and saved tensors from the forward of a graph which was cudagraph'd. 

We also annotate all tensors to be 16 bit aligned to triton kernels because it improves perf, and [copy over inputs](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/compile_fx.py#L415) to an aligned address if they are not already. 

This led to a bad interaction in the backward as shown above. If you add the [following prints](https://gist.github.com/eellison/688bd476b12409ec0e322f2b41dd6a11) you get:

```
runtime misaligned inputs: [] of # 212
cudagraphs removing unaligned input idxs set() of 212
runtime misaligned inputs: [31, 33] of # 326
cudagraphs removing unaligned input idxs set() of 326
runtime misaligned inputs: [] of # 212
runtime misaligned inputs: [31, 33] of # 326
```

We need to make sure we are not removing the misaligned inputs before we are checking for misalignment in cudagraphs, so we know not to expect a static input for the misaligned tensors. 

I set `cudagraph_trees` to False here because it leads to simpler repro (no extra warmup) but same issue is there regardless. 


### Versions

master

cc @mcarilli @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Bad Interaction Between Aligning Inputs and CUDAGraphs in Backward (LayoutLMForMaskedLM) #103126

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix Bad Interaction Between Aligning Inputs and CUDAGraphs in Backward (LayoutLMForMaskedLM) #103126

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions