-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[JIT] Make create autodiff subgraphs do in place updates to aliasDb #42141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 6ba44e5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 35 times. |
…tes to aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return a.user == b.user && a.offset == b.offset; | ||
} | ||
|
||
void copyAliasing(Node* merged_node, AliasDb& db) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: should we just store AliasDb
as a member of ValueMapper
since we are already passing it in a c-tor which simplifies copyAliasing
a little bit?
if (shouldConsiderForMerge(consumer)) { | ||
if (consumer->kind() != prim::DifferentiableGraph) { | ||
// ValueMapper preserves the aliasing information of the node's outputs | ||
ValueMapper vm(consumer, aliasDb_, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: we could probably just remember the number of the outputs of a producer (which we can get off the producer rather than pass both producer and the offset) since we will need to update aliasing for all of them and then in a consumer we could just update the aliasing information for the last consumer.outputs().size() - producer.outputs().size()
of the merged consumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm doing, see: https://github.com/pytorch/pytorch/pull/42141/files#diff-d35d4ac08adc5a54ab5b75564e9a1290R53
auto g = n->owningGraph(); | ||
// temporary node to put the aliasing properties of the node before its | ||
// merged and destroyed | ||
placeholder_node_ = g->insertNode(g->create(prim::Uninitialized, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: do we really need a node here? could we simply store all the values in a vector or will that confuse AA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values are owned by the graph, if we store them in a vector but delete the node they will be dead references
for (Value* v : new_outputs) { | ||
auto maybe_last_use = firstOrLastUse(v, /*find_first*/ false); | ||
// if it doesnt have a use it shouldnt have been added as output | ||
TORCH_INTERNAL_ASSERT(maybe_last_use); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really true? could we have a node that produces two values but one of them isn't used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outputs of the autodiff subgraph, which are added individually. Even if there's a node with two outputs and one isn't used, the unused one will not be added an output to the autodiff subgraph
// Inlining nodes may cause some subexpression to come back in the | ||
// subgraphs (for example, copying constants in repeatedly will generate | ||
// redundant prim::Constants). Run CSE to clean them up. | ||
EliminateCommonSubexpression(curNode->g(attr::Subgraph)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: should we move CSE into the next check? if we decide to inline it, the final CSE will clean all unmerged graphs up. In other words, we only need to clean a graph up if we aren't inlining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is existing logic in the file, I didn't change it. But yea, good point
|
||
ValueMapper vm(producer, aliasDb_, consumer->outputs().size()); | ||
SubgraphUtils::mergeNodeIntoSubgraph(producer, consumer); | ||
vm.copyAliasing(consumer, aliasDb_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I wonder if it makes sense to do copyAliasing
when vm goes out of scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean ?
TORCH_INTERNAL_ASSERT(i != last_uses_.size()); | ||
db.replaceWithNewValue(placeholder_node_->outputs().at(i), v); | ||
} | ||
placeholder_node_->destroy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if copyAliasing
is accidentally called twice destroy
should throw an exception and not crash, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's accidentally called twice we have other problems - this isn't an exposed API, this is only used within this file.
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
…o aliasDb" Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. Description from #37106 holds pretty well: """ Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with `x, y = f(a, b, c)` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the x and y Value*s in the process. This operation is easy to express as an update to the aliasDb--x_out just takes on all the aliasing information x used to have. In particular, since we know f and prim::fusionGroup are purely functional, we don't have to mess with any write information. """ The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. There are a few options: - attempt to make all subgraph utils & ir cloning logic update a map - mirror the subgraph utils implementation in create_autodiff_subgraph - uniquely map x, y and x_in, y_in so you can back out the correspondence. I went with the third option. This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated. Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377) [ghstack-poisoned]
Stack from ghstack:
Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior.
Description from #37106 holds pretty well:
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.
The graph fuser pass operates by pushing nodes into a fusion group. So
we start with
x, y = f(a, b, c)
and end with:
We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""
The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node.
There are a few options:
I went with the third option.
This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.
Differential Revision: D22798377