Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] Add metadata porting for nodes added by quantization #107107

Closed
wants to merge 14 commits into from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Aug 13, 2023

Stack from ghstack (oldest at bottom):

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D48488416

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107107

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 73f9855 with merge base 0f1a225 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel added a commit that referenced this pull request Aug 13, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5f380f81cc4c0eb461fecc75b3ba6ed815a57761
Pull Request resolved: #107107
__all__ = ["PortNodeMetaForQDQ"]

_METADATA_TO_PORT = [
"nn_module_stack",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about source_fn and val?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_fn is not valid because it also contains nn.Module and I dont want someone to inadvertently grab nodes belonging to say nn.Linear, and now those nodes also include q/dq nodes.

Val is fair one but then we would have to do that as part of convert workflow which should actually quantize the value and dequantize it. wdyth

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah I think it makes sense to not add source_fn here, and should the same reasoning should apply for nn_module_stack as well? e.g. people might be looking for things in nn.Linear module scope, then it will get q/dq ops, will this be surprising? not too familiar with "stack_trace", maybe that one is fine

I thought "val" is storing shape and dtypes, is that true? yeah we do need to update the dtype I think

- [Q-> [DQ -> Conv -> Q] -> [DQ -> AvgPool -> Q] -> [DQ -> Linear -> Q] -> DQ]
- Note first Q and last DQ do not inherit metadata from any nodes
- Dynamically quantized patterns:
- Input that are dynamically quantized have chose_qparams, quantize and dequantize nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit chose_qparams --> choose_qparams

Comment on lines 201 to 202
- Quantized [Q-> DQ -> Conv -> Q -> DQ -> AvgPool -> Q -> DQ -> chose_params -> Q -> DQ -> Linear]
- Quantized [Q-> [DQ -> Conv -> Q] -> [DQ -> AvgPool -> Q] -> DQ -> [chose_params -> Q -> DQ -> Linear]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for choose_qparams

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, looks good in the high level, I feel we should be able to remove the dependency on annotation and qspec, also I think that could potentially simplify the implementation a bit, let me know what do you think

pass

example_inputs = (torch.randn(1, 3, 5, 5),)
quantizer_per_tensor_tags = set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: quantize_per_tensor_tags? same for dequantizer_per_tensor_tags and dequantizer_per_channel_tags

"""
Model under test
conv2d -> avgpool -> hardtanh -> linear
Quantize all except linear.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Quantize all with static quant except linear?

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record some of the things we were discussing here so that we don't forget:

  1. right now port meta pass is relying on "quantization_annotation" right now, although I think the ideally we can do: https://github.com/pytorch/pytorch/pull/107107/files#r1295252189, that will make the information flow cleaner and easier for debugging as well, it will also prevent future uses quantization_annotation after prepare step (we could also just delete quantization_annotation in the future to prevent this), to preserve the information needed by https://github.com/pytorch/pytorch/pull/107107/files#r1295244618, we could record something like add_node.meta["precision"] = torch.float32 so that no metadata porting happens for add node. It's not clear from Edge side if this is strictly needed, also this is not blocking anything right now, so we can revisit this when this actually matters for some use cases.

  2. right now the traversal to find the "choose_qparams -> q -> dq" and "q -> dq" patterns is from producer to consumer, but I feel the reversed order might be more robust, we can first start with just looking at the graph (i.e. identify choose_qparams -> q -> dq and q -> dq patterns) and then look around the operators that appears before and after the pattern + rely on some of the quantization_annotation information to decide whether to port metadata or not. I think this will be aligned better with the node.meta["precision"] solution, and the use of quantization annotation will be relatively minimized as well

We have agreed that user should not assume "quantization_annotation" being available in convert step. also given that we might need to align more on (1)., I'll just stamp for now and we can revisit the implementation details at a later time when this comes up. For now @kimishpatel / Edge team will maintain this code util we have reached an agreement on (1).

@kimishpatel
Copy link
Contributor Author

2. right now the traversal to find the "choose_qparams -> q -> dq" and "q -> dq" patterns is from producer to consumer, but I feel the reversed order might be more robust, we can first start with just looking at the graph (i.e. identify choose_qparams -> q -> dq and q -> dq patterns) and then look around the operators that appears before and after the pattern + rely on some of the quantization_annotation information to decide whether to port metadata or not. I think this will be aligned better with the node.meta["precision"] solution, and the use of quantization annotation will be relatively minimized as well

I am not sure why this is more robust

@kimishpatel
Copy link
Contributor Author

user should not assume "quantization_annotation" being available in convert step

while this is fair, note that the metadata porting is part of the convert step, so this is not strictly user facing.

So if "quantization_annotation" is something that has to be become user visible, in order to convey whether a set of nodes were meant to be quantized or not, then I agree that it may not be the right choice and maybe something like what you suggested, node.meta["precision"] or equivalent, might be a better way to do it.

But if it is mostly internal to quantization workflow, then I am not certain of the value added by node.meta["precision"], because that will be derived from quantization_annotation anyway.

@jerryzh168
Copy link
Contributor

jerryzh168 commented Aug 18, 2023

user should not assume "quantization_annotation" being available in convert step

while this is fair, note that the metadata porting is part of the convert step, so this is not strictly user facing.

So if "quantization_annotation" is something that has to be become user visible, in order to convey whether a set of nodes were meant to be quantized or not, then I agree that it may not be the right choice and maybe something like what you suggested, node.meta["precision"] or equivalent, might be a better way to do it.

But if it is mostly internal to quantization workflow, then I am not certain of the value added by node.meta["precision"], because that will be derived from quantization_annotation anyway.

yeah I'm not sure if the node.meta["precision"] info will be needed by users (mostly in backend lowering) or not, so it can be left for future discussions. Apart from this, I think the benefit is mostly BE, easier to debug and reduce the chances that people rely on quantization_annotation, as specified in the first point. but nothing is blocking right now.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Aug 18, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 82cd0f45ef7b265b4207e1f5b82da0f86d196152
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


def call(self, graph_module: torch.fx.GraphModule) -> PassResult:
for node in graph_module.graph.nodes:
annotation = node.meta.get("quantization_annotation", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel could you also add comment here saying that even we are relying on quantization annotation here, but backend developers should not assume "quantization_annotation" being available after convert?

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
This was referenced Aug 24, 2023
kimishpatel added a commit that referenced this pull request Aug 24, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e0aaa2243d3d8deb285626f6a6491b8bde85ad0a
Pull Request resolved: #107107
kimishpatel added a commit that referenced this pull request Aug 31, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e1fa2a661e99d5b7339fc0c85e626ba03c64b570
Pull Request resolved: #107107
…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Aug 31, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: aa21d35a530d170d579a7f8410085ceed08986f7
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Aug 31, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5501fe41256e8a1a223dd795505727fe6ac61d7b
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Aug 31, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 89c64f22e184e3bf87ceb81790e94a96ec745eaa
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1f1b1f34769d2914f4c510053d504cdcb516b6ec
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 2c96e8fdbd7dd3c82b801e5d393b541eb6ce9c44
Pull Request resolved: #107107
…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 22ec011ac5bd332793daba8abc3ab78896155faa
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…ntization"

Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416)

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Sep 1, 2023
Summary:
This diff adds adding metadata to q-dq nodes by inferring the
quatization intent from node annotations. Annotations on the node are
way for user to specify how a node or subgraph is supposed to be
quantized. We continue to use that information to copy metadata on Q/DQ
node from appropriate nodes.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: ca2e2afb051bbbed4a69b4e4b3f5fadae2a8fd5f
Pull Request resolved: #107107
@kimishpatel
Copy link
Contributor Author

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 2, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/175/head branch September 5, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: quantization release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants