FlexAttention isn't using decompositions #124643

Chillee · 2024-04-22T19:17:41Z

🚀 The feature, motivation and pitch

We don't trace with any decompositions today (https://github.com/pytorch/pytorch/blob/main/torch/_higher_order_ops/templated_attention.py#L125).

This can cause errors in the case that we use a pointwise op that's actually decomposed (e.g. q // kv).

cc: @ydwu4 who was looking at how this should be done properly.

Alternatives

N/A

Additional context

No response

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @SherlockNoMad

The text was updated successfully, but these errors were encountered:

bdhirsh · 2024-04-22T19:20:44Z

Since we stash the current decomposition table in a global while tracing here, would it be reasonable for all HOP's that use make_fx on their inner subgraphs to poke at the global if it's available?

I guess a minor downside is that this is easy to get (silently) wrong without some auditing.

ydwu4 · 2024-04-22T19:24:53Z

A related issue: #122972. I'll work on it asap.

Adds trace_subgraph to _MakefxTracer, the motivation is in #122972. Also migrate all existing usage of reenter_make_fx to the new sub-tracer. Previously, the torch function mode for creating torch_fn metadata won't be re-enetered when we're in ProxyTensorMode (since it's inside of __torch_function__). This PR reconstruct the torch function mode based on parent tracer's config and reentered the torch function mode so the metadata is shown in the graph. **Test Plan:** Existing tests. We have a bunch of make_fx tests for cond, map and while_loop. Also remove expected failure for torch_fn since reenter_make_fx is able to re-construct torch function modes. Also fixes #124643 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Adds trace_subgraph to _MakefxTracer, the motivation is in pytorch#122972. Also migrate all existing usage of reenter_make_fx to the new sub-tracer. Previously, the torch function mode for creating torch_fn metadata won't be re-enetered when we're in ProxyTensorMode (since it's inside of __torch_function__). This PR reconstruct the torch function mode based on parent tracer's config and reentered the torch function mode so the metadata is shown in the graph. **Test Plan:** Existing tests. We have a bunch of make_fx tests for cond, map and while_loop. Also remove expected failure for torch_fn since reenter_make_fx is able to re-construct torch function modes. Also fixes pytorch#124643 Pull Request resolved: pytorch#125363 Approved by: https://github.com/Chillee ghstack dependencies: pytorch#125267

ydwu4 self-assigned this Apr 22, 2024

soulitzer added the oncall: pt2 label Apr 24, 2024

jbschlosser added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: decompositions Topics related to decomposition (excluding PrimTorch) labels Apr 24, 2024

ydwu4 mentioned this issue May 10, 2024

Support trace_subgraph in _MakefxTracer #125363

Closed

pytorchmergebot closed this as completed in 314ba13 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlexAttention isn't using decompositions #124643

FlexAttention isn't using decompositions #124643

Chillee commented Apr 22, 2024 •

edited by pytorch-bot bot

Loading

bdhirsh commented Apr 22, 2024

ydwu4 commented Apr 22, 2024

FlexAttention isn't using decompositions #124643

FlexAttention isn't using decompositions #124643

Comments

Chillee commented Apr 22, 2024 • edited by pytorch-bot bot Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

bdhirsh commented Apr 22, 2024

ydwu4 commented Apr 22, 2024

Chillee commented Apr 22, 2024 •

edited by pytorch-bot bot

Loading