-
Notifications
You must be signed in to change notification settings - Fork 25.4k
ns for fx: add support for subgraph matching #52130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 862b004 Pull Request resolved: #52130
💊 CI failures summary and remediationsAs of commit e705ae6 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb8df70 Pull Request resolved: #52130
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a66f4ce Pull Request resolved: #52130
(F.relu, F.linear), | ||
]) | ||
|
||
# TODO(future PR): we should see if we can reuse quantization's fusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good idea. In general, would the entire graph_matcher logic be shared with fx quantization passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be a good long term state. I'd imagine most of the benefits to come from sharing the fusion patterns, the matching logic itself may not generalize.
# if we match linear-relu backwards, we will always skip the | ||
# relu node and attempt to match the linear node. This can | ||
# be made configurable later if needed. | ||
for _reverse_fusion_ops in get_reversed_fusions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always pick the largest fusion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the ordering in the default would be followed, with the ability for user to customize. It doesn't handle that yet. We'd have to add that logic as soon as we have more than one fusion.
# for now, use node name. | ||
# TODO(future PR): find a better solution | ||
return node_b.name | ||
return end_node_b.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be start_node_b.name+end_node_b.name so that it is clear that it is a sub-graph and not a single node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. In general naming of nodes is something we should talk through in detail for all of these APIs, saving that for a future PR. Thankfully changing that later should be low eng cost, so it doesn't have to be perfect now.
and continuing backwards. | ||
1. Returns matchable nodes, in order | ||
2. Skips over non-matchable nodes | ||
1. Returns matchable subgraphs, in order. A subgraph is defined by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, dont follow this logic: Where is the sub-graph specified in the init? I dont see the end_node being defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in the __next__
function. We pop a potential end node off the stack. If the end node is matchable, we try to peek backwards to check for a potential fusion pattern. We return (end_node, end_node) if there is no fusion pattern, or (start_node, end_node) if there is a fusion pattern ending at end_node.
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 79add57 Pull Request resolved: #52130
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
]) | ||
|
||
class _NSGraphMatchableNodesIterator: | ||
def get_reversed_fusions() -> Set[Tuple[Callable, Callable]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this only going to support matching 2 nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main restriction is that the subgraph needs one start point and one end point. Support for a length of 3 nodes can be added here with low eng cost. This type would become Union[Tuple[Callable, Callable], Tuple[Callable, Callable, Callable]]
.
note: currently this is not a list because lists are not hashable. In general, I'm flexible on the exact data structure, happy to revise it if there is a better solution.
continue | ||
self.seen_nodes.add(cur_node) | ||
|
||
# for subgraphs which are single nodes, start_node == end_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A n00b question, how do we distinguish the subgraph of single nodes and the subgraph with two identical nodes? For example if start_node and end_node are both linear, the subgraph could be single node or it could be a subgraph with two linear nodes. Or do we assume that subgraph nodes are different, which is true for the subgraphs we want to match such as (linear, relu) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, the node here is the actual node object (not node type or name, etc). So, two distinct F.linear nodes can be used as a start point and an endpoint. For example,
linear1 -> linear2 -> linear3
# a subgraph of just linear1
(linear1, linear1)
# a subgraph of 1..3
(linear1, linear3)
if node_b.op == 'call_module': | ||
assert isinstance(node_b.target, str) | ||
return node_b.target | ||
if end_node_b.op == 'call_module': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, just wonder why need special handling for call_module and can't get node name directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable to me, happy to update that in a separate PR. In general this part is pretty self-contained, so we can make updates to this easily.
# note: relatedness is checked on the start node, i.e. | ||
# if a linear-relu pattern is checked, we would check for relatedness | ||
# of the linear | ||
if not _node_a_related_to_b(cur_start_node_a, cur_start_node_b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since subgraph matching is introduced to graph matcher, would it be natural to extend the relatedness checking to support subgraph also, which might be more general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that could make sense. Is there a use case in mind?
Since this is a private API, the eng cost of changing this in the future would be low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can think about it in later PR. I don't have specific use case in mind, just thinking if we can relate for example the subgraph (linear, relu) to linear_relu directly, then it would be self-explanatory and we may not need to note "relatedness is checked on the start node".
nodes_to_instrument_b_to_a = {} | ||
for match_name, (node_a, node_b) in matched_node_pairs.items(): | ||
nodes_to_instrument_b_to_a[node_b] = node_a | ||
node_b_to_matched_subgraph_a = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite useful to insert shadow at multiple levels. Assuming in a later PR we will support shadowing a sub-graph instead of a single node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable
# subgraph so far: | ||
# | ||
# dtype_cast_node --> node_a_copy | ||
# dtype_cast_node --> subgraph_a_copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the cast be at the input of node_c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. The code does this, but the comment is wrong, thanks for catching. Will update before landing.
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
Summary: We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D26403092](https://our.internmc.facebook.com/intern/diff/D26403092) [ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/vkuzo/225/base #52130 +/- ##
==================================================
Coverage 80.33% 80.34%
==================================================
Files 1967 1967
Lines 215670 215722 +52
==================================================
+ Hits 173262 173313 +51
- Misses 42408 42409 +1 |
This pull request has been merged in d903106. |
Summary: Pull Request resolved: pytorch#52130 We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403092 fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a
Stack from ghstack:
Summary:
We have patterns like (F.linear, F.relu) which need to match
to (toq.linear_relu). So, we need to match subgraphs.
This PR does the following:
is that subgraphs are simple, there is always a path from start_node to
end_node, and we can ignore any non-input args/kwargs of these nodes
for the purposes of matching and copying things. An example one node
subgraph is (F.linear, F.linear). An example two node subgraph
is (F.linear, F.relu).
TODO(before review) write up better, not ready for review yet
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D26403092