[Quantization] Add metadata porting for nodes added by quantization #107107

kimishpatel · 2023-08-13T23:22:18Z

Stack from ghstack (oldest at bottom):

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D48488416

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-08-13T23:22:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107107

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 73f9855 with merge base 0f1a225 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5f380f81cc4c0eb461fecc75b3ba6ed815a57761 Pull Request resolved: #107107

jerryzh168 · 2023-08-14T17:57:49Z

torch/ao/quantization/pt2e/port_metadata_pass.py

+__all__ = ["PortNodeMetaForQDQ"]
+
+_METADATA_TO_PORT = [
+    "nn_module_stack",


what about source_fn and val?

source_fn is not valid because it also contains nn.Module and I dont want someone to inadvertently grab nodes belonging to say nn.Linear, and now those nodes also include q/dq nodes.

Val is fair one but then we would have to do that as part of convert workflow which should actually quantize the value and dequantize it. wdyth

OK, yeah I think it makes sense to not add source_fn here, and should the same reasoning should apply for nn_module_stack as well? e.g. people might be looking for things in nn.Linear module scope, then it will get q/dq ops, will this be surprising? not too familiar with "stack_trace", maybe that one is fine

I thought "val" is storing shape and dtypes, is that true? yeah we do need to update the dtype I think

jerryzh168 · 2023-08-15T22:06:59Z

torch/ao/quantization/pt2e/port_metadata_pass.py

+          - [Q-> [DQ -> Conv -> Q] -> [DQ -> AvgPool -> Q] -> [DQ -> Linear -> Q] -> DQ]
+          - Note first Q and last DQ do not inherit metadata from any nodes
+      - Dynamically quantized patterns:
+        - Input that are dynamically quantized have chose_qparams, quantize and dequantize nodes


nit chose_qparams --> choose_qparams

jerryzh168 · 2023-08-15T22:07:30Z

torch/ao/quantization/pt2e/port_metadata_pass.py

+          - Quantized [Q-> DQ -> Conv -> Q -> DQ -> AvgPool -> Q -> DQ -> chose_params -> Q -> DQ -> Linear]
+          - Quantized [Q-> [DQ -> Conv -> Q] -> [DQ -> AvgPool -> Q] -> DQ -> [chose_params -> Q -> DQ -> Linear]]


same here for choose_qparams

torch/ao/quantization/pt2e/port_metadata_pass.py

jerryzh168

thanks, looks good in the high level, I feel we should be able to remove the dependency on annotation and qspec, also I think that could potentially simplify the implementation a bit, let me know what do you think

jerryzh168 · 2023-08-17T21:21:29Z

test/quantization/pt2e/test_metadata_porting.py

+                pass
+
+        example_inputs = (torch.randn(1, 3, 5, 5),)
+        quantizer_per_tensor_tags = set(


nit: quantize_per_tensor_tags? same for dequantizer_per_tensor_tags and dequantizer_per_channel_tags

jerryzh168 · 2023-08-17T21:24:06Z

test/quantization/pt2e/test_metadata_porting.py

+        """
+        Model under test
+        conv2d -> avgpool -> hardtanh -> linear
+        Quantize all except linear.


nit: Quantize all with static quant except linear?

jerryzh168

Record some of the things we were discussing here so that we don't forget:

right now port meta pass is relying on "quantization_annotation" right now, although I think the ideally we can do: https://github.com/pytorch/pytorch/pull/107107/files#r1295252189, that will make the information flow cleaner and easier for debugging as well, it will also prevent future uses quantization_annotation after prepare step (we could also just delete quantization_annotation in the future to prevent this), to preserve the information needed by https://github.com/pytorch/pytorch/pull/107107/files#r1295244618, we could record something like add_node.meta["precision"] = torch.float32 so that no metadata porting happens for add node. It's not clear from Edge side if this is strictly needed, also this is not blocking anything right now, so we can revisit this when this actually matters for some use cases.
right now the traversal to find the "choose_qparams -> q -> dq" and "q -> dq" patterns is from producer to consumer, but I feel the reversed order might be more robust, we can first start with just looking at the graph (i.e. identify choose_qparams -> q -> dq and q -> dq patterns) and then look around the operators that appears before and after the pattern + rely on some of the quantization_annotation information to decide whether to port metadata or not. I think this will be aligned better with the node.meta["precision"] solution, and the use of quantization annotation will be relatively minimized as well

We have agreed that user should not assume "quantization_annotation" being available in convert step. also given that we might need to align more on (1)., I'll just stamp for now and we can revisit the implementation details at a later time when this comes up. For now @kimishpatel / Edge team will maintain this code util we have reached an agreement on (1).

torch/ao/quantization/pt2e/port_metadata_pass.py

kimishpatel · 2023-08-18T15:07:24Z

2. right now the traversal to find the "choose_qparams -> q -> dq" and "q -> dq" patterns is from producer to consumer, but I feel the reversed order might be more robust, we can first start with just looking at the graph (i.e. identify choose_qparams -> q -> dq and q -> dq patterns) and then look around the operators that appears before and after the pattern + rely on some of the quantization_annotation information to decide whether to port metadata or not. I think this will be aligned better with the node.meta["precision"] solution, and the use of quantization annotation will be relatively minimized as well

I am not sure why this is more robust

kimishpatel · 2023-08-18T15:12:26Z

user should not assume "quantization_annotation" being available in convert step

while this is fair, note that the metadata porting is part of the convert step, so this is not strictly user facing.

So if "quantization_annotation" is something that has to be become user visible, in order to convey whether a set of nodes were meant to be quantized or not, then I agree that it may not be the right choice and maybe something like what you suggested, node.meta["precision"] or equivalent, might be a better way to do it.

But if it is mostly internal to quantization workflow, then I am not certain of the value added by node.meta["precision"], because that will be derived from quantization_annotation anyway.

jerryzh168 · 2023-08-18T17:14:13Z

user should not assume "quantization_annotation" being available in convert step

while this is fair, note that the metadata porting is part of the convert step, so this is not strictly user facing.

So if "quantization_annotation" is something that has to be become user visible, in order to convey whether a set of nodes were meant to be quantized or not, then I agree that it may not be the right choice and maybe something like what you suggested, node.meta["precision"] or equivalent, might be a better way to do it.

But if it is mostly internal to quantization workflow, then I am not certain of the value added by node.meta["precision"], because that will be derived from quantization_annotation anyway.

yeah I'm not sure if the node.meta["precision"] info will be needed by users (mostly in backend lowering) or not, so it can be left for future discussions. Apart from this, I think the benefit is mostly BE, easier to debug and reduce the chances that people rely on quantization_annotation, as specified in the first point. but nothing is blocking right now.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 82cd0f45ef7b265b4207e1f5b82da0f86d196152 Pull Request resolved: #107107

kimishpatel · 2023-08-18T23:25:13Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerryzh168 · 2023-08-19T00:36:58Z

torch/ao/quantization/pt2e/port_metadata_pass.py

+
+    def call(self, graph_module: torch.fx.GraphModule) -> PassResult:
+        for node in graph_module.graph.nodes:
+            annotation = node.meta.get("quantization_annotation", None)


@kimishpatel could you also add comment here saying that even we are relying on quantization annotation here, but backend developers should not assume "quantization_annotation" being available after convert?

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e0aaa2243d3d8deb285626f6a6491b8bde85ad0a Pull Request resolved: #107107

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fa2a661e99d5b7339fc0c85e626ba03c64b570 Pull Request resolved: #107107

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: aa21d35a530d170d579a7f8410085ceed08986f7 Pull Request resolved: #107107

kimishpatel · 2023-08-31T16:03:16Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5501fe41256e8a1a223dd795505727fe6ac61d7b Pull Request resolved: #107107

kimishpatel · 2023-08-31T17:28:41Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 89c64f22e184e3bf87ceb81790e94a96ec745eaa Pull Request resolved: #107107

kimishpatel · 2023-08-31T23:30:54Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1f1b1f34769d2914f4c510053d504cdcb516b6ec Pull Request resolved: #107107

kimishpatel · 2023-09-01T00:49:42Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c96e8fdbd7dd3c82b801e5d393b541eb6ce9c44 Pull Request resolved: #107107

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 22ec011ac5bd332793daba8abc3ab78896155faa Pull Request resolved: #107107

kimishpatel · 2023-09-01T15:41:03Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization" Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) [ghstack-poisoned]

Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ca2e2afb051bbbed4a69b4e4b3f5fadae2a8fd5f Pull Request resolved: #107107

kimishpatel · 2023-09-01T20:20:45Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-09-02T06:36:09Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-09-02T06:38:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

kimishpatel requested a review from jerryzh168 as a code owner August 13, 2023 23:22

This was referenced Aug 13, 2023

[Quant] Use input_qspec_map for weight quantization of linear #107105

Closed

[Quantization] Make annotation util functions return annotated nodes #107106

Closed

pytorch-bot bot added release notes: quantization release notes category labels Aug 13, 2023

jerryzh168 reviewed Aug 14, 2023

View reviewed changes

jerryzh168 reviewed Aug 15, 2023

View reviewed changes

torch/ao/quantization/pt2e/port_metadata_pass.py Show resolved Hide resolved

jerryzh168 reviewed Aug 15, 2023

View reviewed changes

torch/ao/quantization/pt2e/port_metadata_pass.py Show resolved Hide resolved

jerryzh168 reviewed Aug 15, 2023

View reviewed changes

torch/ao/quantization/pt2e/port_metadata_pass.py Show resolved Hide resolved

jerryzh168 requested changes Aug 15, 2023

View reviewed changes

jerryzh168 reviewed Aug 17, 2023

View reviewed changes

jerryzh168 approved these changes Aug 17, 2023

View reviewed changes

jerryzh168 reviewed Aug 17, 2023

View reviewed changes

torch/ao/quantization/pt2e/port_metadata_pass.py Show resolved Hide resolved

jerryzh168 reviewed Aug 19, 2023

View reviewed changes

This was referenced Aug 24, 2023

[Quant] Bug fix #107899

Closed

[Quant] Add DQ duplication pass #107900

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 2, 2023

pytorchmergebot added the merging label Sep 2, 2023

pytorchmergebot added Merged and removed merging labels Sep 2, 2023

pytorchmergebot closed this in ffc0c46 Sep 2, 2023

facebook-github-bot deleted the gh/kimishpatel/175/head branch September 5, 2023 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Add metadata porting for nodes added by quantization #107107

[Quantization] Add metadata porting for nodes added by quantization #107107

kimishpatel commented Aug 13, 2023 •

edited

pytorch-bot bot commented Aug 13, 2023 •

edited

jerryzh168 Aug 14, 2023

kimishpatel Aug 15, 2023

jerryzh168 Aug 15, 2023

jerryzh168 Aug 15, 2023

jerryzh168 Aug 15, 2023

jerryzh168 left a comment

jerryzh168 Aug 17, 2023

jerryzh168 Aug 17, 2023

jerryzh168 left a comment

kimishpatel commented Aug 18, 2023

kimishpatel commented Aug 18, 2023

jerryzh168 commented Aug 18, 2023 •

edited

kimishpatel commented Aug 18, 2023

jerryzh168 Aug 19, 2023

kimishpatel commented Aug 31, 2023

kimishpatel commented Aug 31, 2023

kimishpatel commented Aug 31, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

facebook-github-bot commented Sep 2, 2023

pytorchmergebot commented Sep 2, 2023

		- Quantized [Q-> DQ -> Conv -> Q -> DQ -> AvgPool -> Q -> DQ -> chose_params -> Q -> DQ -> Linear]
		- Quantized [Q-> [DQ -> Conv -> Q] -> [DQ -> AvgPool -> Q] -> DQ -> [chose_params -> Q -> DQ -> Linear]]

[Quantization] Add metadata porting for nodes added by quantization #107107

[Quantization] Add metadata porting for nodes added by quantization #107107

Conversation

kimishpatel commented Aug 13, 2023 • edited

pytorch-bot bot commented Aug 13, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107107

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

kimishpatel commented Aug 18, 2023

kimishpatel commented Aug 18, 2023

jerryzh168 commented Aug 18, 2023 • edited

kimishpatel commented Aug 18, 2023

Choose a reason for hiding this comment

kimishpatel commented Aug 31, 2023

kimishpatel commented Aug 31, 2023

kimishpatel commented Aug 31, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

kimishpatel commented Sep 1, 2023

facebook-github-bot commented Sep 2, 2023

pytorchmergebot commented Sep 2, 2023

Merge started

kimishpatel commented Aug 13, 2023 •

edited

pytorch-bot bot commented Aug 13, 2023 •

edited

jerryzh168 commented Aug 18, 2023 •

edited