Partition modules #98628

angelayi · 2023-04-07T20:50:04Z

Added helper functions to match nodes in the graph that are decomposed from their source (leaf modules, or functional ops), as a result of dynamo tracing.

get_source_partitions(graph: torch.fx.Graph, wanted_sources: List[Any]) -> Dict[Any, SourcePartition]

Args:

graph: The graph we want to partition
wanted_sources: List of sources of nodes that were decomposed from this source. This can be a function (ex. torch.nn.functional.linear) or a leaf module type (ex. torch.nn.Linear)

Returns:

Dictionary mapping sources (ex. torch.nn.modules.linear.Linear) to a list of SourcePartitions that correspond to the list of nodes that were flattened from a module of that type.

@dataclass
class SourcePartition():
    # Nodes in a particular partition
    nodes: List[Node]
    # Module type
    module_type: Type
    # Nodes in the graph that are needed as inputs to the partition
    input_nodes: List[Node] = field(default_factory=list)
    # Nodes in the partition that are being used by nodes outside of the partition
    output_nodes: List[Node] = field(default_factory=list)
    # Parameters that are being used
    params: List[str] = field(default_factory=list)

Example:

Original:

x -> linear -> linear -> relu -> linear

Traced graph:

.graph():
    %arg0 : [#users=1] = placeholder[target=arg0]
    %_param_constant0 : [#users=1] = get_attr[target=_param_constant0]
    %t_default : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant0,), kwargs = {})
    %_param_constant1 : [#users=1] = get_attr[target=_param_constant1]
    %addmm_default : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1, %arg0, %t_default), kwargs = {})
    %_param_constant0_1 : [#users=1] = get_attr[target=_param_constant0]
    %t_default_1 : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant0_1,), kwargs = {})
    %_param_constant1_1 : [#users=1] = get_attr[target=_param_constant1]
    %addmm_default_1 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1_1, %addmm_default, %t_default_1), kwargs = {})
    %relu_default : [#users=1] = call_function[target=torch.ops.aten.relu.default](args = (%addmm_default_1,), kwargs = {})
    %_param_constant2 : [#users=1] = get_attr[target=_param_constant2]
    %t_default_2 : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant2,), kwargs = {})
    %_param_constant3 : [#users=1] = get_attr[target=_param_constant3]
    %addmm_default_2 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant3, %relu_default, %t_default_2), kwargs = {})
    return [addmm_default_2]

Result of get_module_partitions:

{<class 'torch.nn.modules.linear.Linear'>: [
    ModulePartition(nodes=[_param_constant0, t_default, _param_constant1, addmm_default], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[arg0], output_nodes=[addmm_default], params=["_param_constant0", "_param_constant1"]), 
    ModulePartition(nodes=[_param_constant0_1, t_default_1, _param_constant1_1, addmm_default_1], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[addmm_default], output_nodes=[addmm_default_1], params=["_param_constant0_1", "_param_constant1_1"]), 
    ModulePartition(nodes=[_param_constant2, t_default_2, _param_constant3, addmm_default_2], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[relu_default], output_nodes=[addmm_default_2], params=["_param_constant2", "_param_constant3"])], 

 <class 'torch.nn.modules.activation.ReLU'>: [
    ModulePartition(nodes=[relu_default], module_type=<class 'torch.nn.modules.activation.ReLU'>, input_nodes=[addmm_default_1], output_nodes=[relu_default], params=[])]}

Also added helper function to check if two module partitions are connected:
check_subgraphs_connected(subgraph1: SourcePartition, subgraph2: SourcePartition) -> bool

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2023-04-07T20:50:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98628

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Degradation on most runner types due to networking outage

✅ No Failures

As of commit 7434821:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel · 2023-04-10T18:23:02Z

torch/_dynamo/output_graph.py

@@ -870,13 +870,16 @@ def create_proxy(

        nn_module_stack = tx.nn_module_stack
        if nn_module_stack:
-            rv.node.meta["nn_module_stack"] = nn_module_stack.copy()
+            nn_module_stack = nn_module_stack.copy()
+            _, last_value = nn_module_stack.popitem()


Does this mean that we lose the last entry in the stack?

I see that it is not "lost", but not clear to me what is happening here.

I'm just renaming the last value in the stack 😅

Originally if we have some module like: x -> self.linear -> self.linear with the two linear pointing to the same class attribute, the nn_module_stack does not tell the two linear calls apart, but the name of rv.node does.

The torch IR will look something like:

placeholder arg_0 has nn_module_stack of "None" call_module self_linear_0 has nn_module_satck of {'self_linear': ('self.linear', <class 'torch.nn.modules.linear.Linear'>)} call_module self_linear_1 has nn_module_stack of {'self_linear': ('self.linear', <class 'torch.nn.modules.linear.Linear'>)}

So I'm renaming the first key to be the same as the node name:

placeholder arg_0 has nn_module_stack of "None" call_module self_linear_0 has nn_module_satck of {'self_linear_0': ('self.linear', <class 'torch.nn.modules.linear.Linear'>)} call_module self_linear_1 has nn_module_stack of {'self_linear_1': ('self.linear', <class 'torch.nn.modules.linear.Linear'>)}

cccclai · 2023-04-19T17:08:11Z

torch/_dynamo/output_graph.py

        elif kind == "call_module":
            # For modules we store the class
-            rv.node.meta["source_fn"] = rv.node.meta["nn_module_stack"][target][1]
+            rv.node.meta["source_fn"] = (
+                rv.node.name,


Is rv.node.name the qualified name?

yeah, it's the unique name of the node in the fx graph, so it will help us handle the case where if there are 2 linear module calls side by side in the graph.

mcr229 · 2023-04-26T18:14:47Z

torch/fx/passes/utils/source_matcher_utils.py

+@dataclass
+class SourcePartition():
+    # Nodes in a particular partition
+    nodes: List[Node]


Any reasoning for having the partition be a list of node rather than the partitioned graph itself? We can derive nodes from the graph, and having the graph can help preserve the partitions structure

You can generate a graph from the list. Do you want Graph istead of List[Node]? If so why?

is there a simple api to convert a List[Node] --> Graph? If not, in the case I might want to use something like subgraph_rewriter after to replace these partition modules, using a graph as the pattern to replace rather than a list of nodes would be easier

Yeah, there exists fuse_as_graphmodule

facebook-github-bot · 2023-04-26T21:50:33Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

Looks neat. Thanks Angela!

angelayi · 2023-05-03T14:08:10Z

@pytorchbot rebase

pytorchmergebot · 2023-05-03T14:11:13Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-05-03T14:11:20Z

Successfully rebased partition_modules onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout partition_modules && git pull --rebase)

angelayi · 2023-05-03T20:31:59Z

@pytorchbot merge

pytorchmergebot · 2023-05-03T20:34:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-03T20:34:40Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

facebook-github-bot · 2023-05-03T20:35:34Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

angelayi · 2023-05-03T21:33:26Z

@pytorchbot merge

pytorchmergebot · 2023-05-03T21:35:24Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

kimishpatel · 2023-05-08T19:15:38Z

torch/_inductor/utils.py

@@ -295,10 +295,10 @@ def get_fused_kernel_name(node_schedule):
        sources = []
        for origin in all_origins:
            if origin.op == "call_function" and "source_fn" in origin.meta:
-                if isinstance(origin.meta["source_fn"], str):
-                    sources.append(origin.meta["source_fn"])
+                if isinstance(origin.meta["source_fn"][1], str):


what ws this change for?

I modified the "source_fn" metadata to additionally return a unique qualifying name for each function that is called so that if there are 2 modules that are called one after the other then we can distinguish between the two. This change is just to make inductor compatible.

kimishpatel · 2023-05-08T19:17:16Z

torch/fx/passes/utils/source_matcher_utils.py

+    output_nodes: List[Node] = field(default_factory=list)
+
+    # Parameters that are being used
+    params: List[str] = field(default_factory=list)


why was this list of strings instead of List[Node]?

I wasn't sure how you wanted the parameters formatted so I just returned a list of the attributes of the parameters. But I can fix this

Ideally we could have NamedParameters = tuple(str, Tensor) such that weight would correspond to weight tensor. But talking to Sherlock, I remember this was harder. For now this is fine.

kimishpatel · 2023-05-08T19:23:25Z

torch/fx/passes/utils/source_matcher_utils.py

+
+
+@compatibility(is_backward_compatible=False)
+def check_subgraphs_connected(subgraph1: SourcePartition, subgraph2: SourcePartition) -> bool:


This is somewhat loose in that, two graphs maybe overlapping, right? I would expect you to check the output nodes being the input nodes to the second partition? That might be stricter?

kimishpatel · 2023-05-08T19:42:39Z

torch/fx/passes/utils/source_matcher_utils.py

+        if source_fn[1] not in wanted_sources:
+            continue
+
+        diff_modules = modules.setdefault(source_fn[1], {})


Angela, so why are we using source_fn and not nn_module_stack. Main difference I see is that source_fn trackes the leaf module/function whereas nn_module_stack tracked the entire module hierarchy. I think it is useful to use nn_module_stack so that a node can belong to multiple partitions but ecah partition it belongs to must have a strict parent->child relation.

Reason why this might be useful is that when modules like LSTM or attentention get decomposed you still can get nodes that belong to higher level module like Attention. And if I were to quantize entire attention module, I can.

kimishpatel · 2023-05-08T19:45:32Z

torch/fx/passes/utils/source_matcher_utils.py

+                    input_nodes.add(arg)
+
+            if node.op == "get_attr":
+                params.add(node.target)


For the reasons I mentioned above, like for module here, https://fburl.com/owodcrrr, if we have named parameters than it is easier to access. ALthough I dont know what happens to constants. If they are "burnt" in then it will be harder to figure out what are their "names".

pytorch-bot bot added the release notes: fx release notes category label Apr 7, 2023

angelayi requested review from SherlockNoMad and kimishpatel April 7, 2023 20:50

github-actions bot added ciflow/inductor module: dynamo labels Apr 7, 2023

angelayi requested a review from jerryzh168 April 7, 2023 20:51

kimishpatel reviewed Apr 10, 2023

View reviewed changes

angelayi force-pushed the partition_modules branch from f58a8b9 to c13bae1 Compare April 12, 2023 18:52

angelayi requested review from kimishpatel and andrewor14 April 12, 2023 20:57

cccclai reviewed Apr 19, 2023

View reviewed changes

mcr229 reviewed Apr 26, 2023

View reviewed changes

cccclai approved these changes Apr 27, 2023

View reviewed changes

angelayi force-pushed the partition_modules branch from 7d739a6 to 75633b2 Compare May 1, 2023 22:52

github-actions bot added the module: inductor label May 2, 2023

angelayi added 9 commits May 3, 2023 14:11

first

f830924

lint

a9e4401

params

7cd1d24

lint

757fc32

source instead of nn_module_stack

107f359

remove changes for nn_module_stack

2dd528a

rename

77e7b79

lint

815230f

inductor

7434821

pytorchmergebot force-pushed the partition_modules branch from b35be48 to 7434821 Compare May 3, 2023 14:11

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 3, 2023

pytorchmergebot added the merging label May 3, 2023

pytorchmergebot removed the merging label May 3, 2023

pytorchmergebot added the merging label May 3, 2023

pytorchmergebot added Merged and removed merging labels May 3, 2023

pytorchmergebot closed this in 3c5ec6a May 3, 2023

kimishpatel reviewed May 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition modules #98628

Partition modules #98628

angelayi commented Apr 7, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Apr 7, 2023 •

edited

kimishpatel Apr 10, 2023

kimishpatel Apr 10, 2023

angelayi Apr 10, 2023

cccclai Apr 19, 2023

angelayi Apr 19, 2023

mcr229 Apr 26, 2023

kimishpatel Apr 26, 2023

mcr229 Apr 26, 2023

angelayi Apr 26, 2023

facebook-github-bot commented Apr 26, 2023

cccclai left a comment

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

pytorchmergebot commented May 3, 2023

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

pytorchmergebot commented May 3, 2023

facebook-github-bot commented May 3, 2023

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

kimishpatel May 8, 2023

angelayi May 8, 2023

kimishpatel May 8, 2023

angelayi May 8, 2023

kimishpatel May 8, 2023

kimishpatel May 8, 2023

kimishpatel May 8, 2023

kimishpatel May 8, 2023



		@compatibility(is_backward_compatible=False)
		def check_subgraphs_connected(subgraph1: SourcePartition, subgraph2: SourcePartition) -> bool:

Partition modules #98628

Partition modules #98628

Conversation

angelayi commented Apr 7, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Apr 7, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98628

❗ 1 Active SEVs

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 26, 2023

cccclai left a comment

Choose a reason for hiding this comment

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

pytorchmergebot commented May 3, 2023

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

Merge started

pytorchmergebot commented May 3, 2023

Merge failed

facebook-github-bot commented May 3, 2023

angelayi commented May 3, 2023

pytorchmergebot commented May 3, 2023

Merge started

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelayi commented Apr 7, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Apr 7, 2023 •

edited