[Graph Partition] improve custom op output alias #163380
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For a custom op with multiple outputs, we will see the following generated code:
If
buf1is not accessed in the future, it's good to deallocate early. So we don't delaydeluntil both buf3 and buf4 are not used anymore. Note that buf3 and buf4 hold reference to the data such thatdel buf1does not prevent their usage.However, when there are mutating args, we don't see
del buf1immediately.Why? Because
buf3is a MultiOutput withbuf1as input and believesbuf1(an output of FallbackKernel op1) has inputs that alias output.pytorch/torch/_inductor/ir.py
Lines 7976 to 7982 in 72fedf0
According to
[NOTE: FallbackKernel supported operators], as a mutating op that are auto-functionalizable, buf1's output should NOT alias any of the inputs. This PR improves get_inputs_that_alias_output of Fallback Kernel.Use case: moe custom op in vllm
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben