Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AOTAutograd: avoid intermediate_base logic when all aliased outputs came from a multi_output_view #111411
AOTAutograd: avoid intermediate_base logic when all aliased outputs came from a multi_output_view #111411
Changes from 2 commits
289528d
60c955f
8947ee9
d014fd1
41dc313
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, is this at most one can escape, ON TOP of the original multi-output view op's outputs o_1, ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - if I have K multi-output view aliases of some tensor (say they all came from an unbind call), then this comment is saying that at most K+1 aliases are allowed to escape the graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. I think condition (2) is too lax and it is making it hard for me to verify that this optimization is correct.
Here is a graph for which I think it is obviously correct to attach grad_fns to the outputs:
Here is a graph where I am not sure:
In the first graph, no outputs actually aliased each other (the unbind says they alias, but this is because autograd isn't smart enough to know that actually these views are guaranteed to reference disjoint spots of memory). In the second graph, the outputs DO alias, and so we have to make sure that if you mutate y, the grad_fns for y.unbind() get updated.
Maybe there is a way we can get the second case to work, but if the first case is good enough to solve the problem we're facing in prod, I would prefer to do it first, as it is much more obviously correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after I made this PR I realized that the second case also shows up in prod sadly (returning both an intermediate, and its unbinded tensors).
So given that, I think our options are:
(1) convince ourselves that this is safe, even in the case where we return one extra alias (like the intermediate here).
(2) decide that this either violates safety, or this feels too dangerous to allow, and instead just add some one-off logic for the prod case to specifically handle regenerating unbind() at runtime.
Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check on line 1184
elif curr_storage in inp_storage_refs:
sometimes we have multi_output_views that are alias_of_input, it would fall into this if and speed up quite a bit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we're going to apply this "hide multi-output view aliasing from autograd" to aliases of intermediates, we should also apply the same logic to aliases of inputs.
This is especially true because autograd's view-replay will generate equally inefficient backward code for the alias-of-input case, if our aliases are multi-output views (view-replay will always generate an as_strided).
Let me spend some time trying to add it and add better testing for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nyeh, I think you have to be more selective about this. If hypothetically you have a multi-output view where the outputs alias each other, e.g., some sort of sliding window + unbind, then it wouldn't be right to do the logic you've introduced here, because if you did a mutation it would be necessary to error to prevent incorrect gradients from the other rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is more complicated in the overlapping case (Alban pointed out that even the unbind case can be non-overlapping, e.g.
torch.ones(4).expand(10, 4).unbind(0)
).I would agree with this, but my thought was that autograd should never let this happen, because it doesn't allow you to change the grad_fn of multi-output views (it raises an error, or replaces the multi-output-view grad_fn with an error grad_fn).
I agree this does feel hairy though, so lmk if you think this isn't water-tight / we're risking some correctness issues slipping through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're basically saying that a user will never pass torch.compile a program that doesn't run in eager mode. If this is true, I agree that the user cannot have written a naughty program. But we definitely have users passing malformed programs to torch.compile without having checked the eager version first...