Skip to content

Conversation

eellison
Copy link
Contributor

@eellison eellison commented Jul 27, 2020

Stack from ghstack:

Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior.

Description from #37106 holds pretty well:
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

x, y = f(a, b, c)

and end with:

x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node.
There are a few options:

  • attempt to make all subgraph utils & ir cloning logic update a map
  • mirror the subgraph utils implementation in create_autodiff_subgraph
  • uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option.

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: D22798377

@eellison eellison requested a review from apaszke as a code owner July 27, 2020 23:15
eellison pushed a commit that referenced this pull request Jul 27, 2020
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 27, 2020
@eellison eellison changed the title Make create autodiff subgraphs do in place updates to aliasDb [JIT][WIP] Make create autodiff subgraphs do in place updates to aliasDb Jul 27, 2020
@dr-ci
Copy link

dr-ci bot commented Jul 27, 2020

💊 CI failures summary and remediations

As of commit 6ba44e5 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 35 times.

…tes to aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.



[ghstack-poisoned]
eellison pushed a commit that referenced this pull request Jul 28, 2020
@eellison eellison changed the title [JIT][WIP] Make create autodiff subgraphs do in place updates to aliasDb [JIT] Make create autodiff subgraphs do in place updates to aliasDb Jul 28, 2020
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.



[ghstack-poisoned]
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request Jul 28, 2020
Copy link
Contributor

@Krovatkin Krovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

return a.user == b.user && a.offset == b.offset;
}

void copyAliasing(Node* merged_node, AliasDb& db) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: should we just store AliasDb as a member of ValueMapper since we are already passing it in a c-tor which simplifies copyAliasing a little bit?

if (shouldConsiderForMerge(consumer)) {
if (consumer->kind() != prim::DifferentiableGraph) {
// ValueMapper preserves the aliasing information of the node's outputs
ValueMapper vm(consumer, aliasDb_, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we could probably just remember the number of the outputs of a producer (which we can get off the producer rather than pass both producer and the offset) since we will need to update aliasing for all of them and then in a consumer we could just update the aliasing information for the last consumer.outputs().size() - producer.outputs().size() of the merged consumer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto g = n->owningGraph();
// temporary node to put the aliasing properties of the node before its
// merged and destroyed
placeholder_node_ = g->insertNode(g->create(prim::Uninitialized, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: do we really need a node here? could we simply store all the values in a vector or will that confuse AA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values are owned by the graph, if we store them in a vector but delete the node they will be dead references

for (Value* v : new_outputs) {
auto maybe_last_use = firstOrLastUse(v, /*find_first*/ false);
// if it doesnt have a use it shouldnt have been added as output
TORCH_INTERNAL_ASSERT(maybe_last_use);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really true? could we have a node that produces two values but one of them isn't used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outputs of the autodiff subgraph, which are added individually. Even if there's a node with two outputs and one isn't used, the unused one will not be added an output to the autodiff subgraph

// Inlining nodes may cause some subexpression to come back in the
// subgraphs (for example, copying constants in repeatedly will generate
// redundant prim::Constants). Run CSE to clean them up.
EliminateCommonSubexpression(curNode->g(attr::Subgraph));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: should we move CSE into the next check? if we decide to inline it, the final CSE will clean all unmerged graphs up. In other words, we only need to clean a graph up if we aren't inlining it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing logic in the file, I didn't change it. But yea, good point


ValueMapper vm(producer, aliasDb_, consumer->outputs().size());
SubgraphUtils::mergeNodeIntoSubgraph(producer, consumer);
vm.copyAliasing(consumer, aliasDb_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I wonder if it makes sense to do copyAliasing when vm goes out of scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean ?

TORCH_INTERNAL_ASSERT(i != last_uses_.size());
db.replaceWithNewValue(placeholder_node_->outputs().at(i), v);
}
placeholder_node_->destroy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if copyAliasing is accidentally called twice destroy should throw an exception and not crash, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's accidentally called twice we have other problems - this isn't an exposed API, this is only used within this file.

…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request Jul 31, 2020
…o aliasDb"


Update alias db in-place instead of having to construct alias db from scratch on each change, causing O(n^2) behavior. 

Description from #37106 holds pretty well: 
"""
Recomputing the aliasdb on every fusion iteration + in every subblock
is hugely expensive. Instead, update it in-place when doing fusion.

The graph fuser pass operates by pushing nodes into a fusion group. So
we start with

`x, y = f(a, b, c)`

and end with:
```
x_out, y_out = prim::fusionGroup(a, b, c)
   x_in, y_in = f(a_in, b_in, c_in)
   -> x_in, y_in
```

We destroy the x and y Value*s in the process. This operation is
easy to express as an update to the aliasDb--x_out just takes on all
the aliasing information x used to have. In particular, since we know
f and prim::fusionGroup are purely functional, we don't have to mess
with any write information.
"""

The one difficulty here is mapping x, y to x_out, y_out is not trivial in merging nodes into the autodiff subgraph node. 
There are a few options:
- attempt to make all subgraph utils & ir cloning logic update a map
- mirror the subgraph utils implementation in create_autodiff_subgraph
- uniquely map x, y and x_in, y_in so you can back out the correspondence.

I went with the third option. 

This shouldn't affect the results of the pass at all. LMK if you think there's anything else I should be doing to test, I was thinking about maybe exposing an option to run create autodiff subgraphs without the post processor and check that the alias db was correctly updated.

Differential Revision: [D22798377](https://our.internmc.facebook.com/intern/diff/D22798377)

[ghstack-poisoned]
eellison pushed a commit that referenced this pull request Jul 31, 2020
@facebook-github-bot
Copy link
Contributor

@eellison merged this pull request in f502290.

@facebook-github-bot facebook-github-bot deleted the gh/eellison/92/head branch August 4, 2020 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants