Skip to content

Conversation

coconutruben
Copy link
Contributor

@coconutruben coconutruben commented Aug 26, 2025

Stack from ghstack (oldest at bottom):

why

  • addmm aten running with an expanded version of bias vs the regular
    bias sometimes causes numerics differences
  • to avoid this for now, we make addmm aten use inp vs inp_expanded
    depending on if we're in max-autotune or not, matching the previous
    logic

what

  • pass unexpanded bias (inp)
  • let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

testing

python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D81520581

\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161534

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 66e7441 with merge base d25c35d (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This was referenced Aug 26, 2025
@coconutruben coconutruben added the topic: not user facing topic category label Aug 26, 2025
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- expand KernelInputs to also store views of specific nodes, by names
- use that view (inp, the unexpanded version) in the heuristics to
  adjust it depending on whether we're in max-autotune or not

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- remove the view from inp_expanded when running not in max-autotune

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- remove the view from inp_expanded when running not in max-autotune

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests?

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
@coconutruben
Copy link
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
coconutruben added a commit that referenced this pull request Sep 13, 2025
\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- remove the view from inp when not in max-autotune for addmm aten

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

ghstack-source-id: 4399549
Pull Request resolved: #161534
…erly"


# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

# what

- pass unexpanded bias (inp)
- let template (heuristics) that it to be expanded (ATen in not max-autotune, Triton always) expand it

# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D81520581](https://our.internmc.facebook.com/intern/diff/D81520581)

[ghstack-poisoned]
coconutruben added a commit that referenced this pull request Sep 13, 2025
\# why

- addmm aten running with an expanded version of bias vs the regular
  bias sometimes causes numerics differences
- to avoid this for now, we make addmm aten use inp vs inp_expanded
  depending on if we're in max-autotune or not, matching the previous
  logic

\# what

- remove the view from inp when not in max-autotune for addmm aten

\# testing

```
python3 -bb -m pytest test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
```

ghstack-source-id: 208c906
Pull Request resolved: #161534
@coconutruben coconutruben marked this pull request as draft September 18, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants