[quant][graphmode] Invoke more passes in `insertObservers` #30473

jerryzh168 · 2019-11-26T19:12:55Z

Stack from ghstack:

[quant][graphmode] Add tests for quantizing traced models #30476 [quant][graphmode] Add tests for quantizing traced models
Refactor test_quantization.py and enable test_nested #30475 Refactor test_quantization.py and enable test_nested
[quant][graphmode] Refactor bias and weight check and add aten::linear pattern #30474 [quant][graphmode] Refactor bias and weight check and add aten::linear pattern
[quant][graphmode] Invoke more passes in insertObservers #30473 [quant][graphmode] Invoke more passes in insertObservers

Summary:
Invoked ConstantPooling and FuseLinear pass before
insertObservers.
ConstantPooling is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
FuseLinear is to merge the exploded linear function into aten::linear so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

Differential Revision: D18795722

Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: [ghstack-poisoned]

lly-zero-one · 2019-11-27T18:00:20Z

torch/csrc/jit/passes/quantization.cpp

  auto graph = method.graph();
  ConstantPropagation(graph);
+  // To cleanup traced graph
+  ConstantPooling(graph);


Just curious, is there any order we can follow for the optimization passes?

not sure, looks like in graph_executor constant pooling is done before constant propagation, I'll change the order

If we want to cleanup traced graphs, shouldn't we do it in tracing?

Not sure why we don't do this in traced graph, cc @jamesr66a

Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing

ZolotukhinM · 2019-12-02T20:01:46Z

torch/csrc/jit/passes/quantization.cpp

  auto graph = method.graph();
  ConstantPropagation(graph);
+  // To cleanup traced graph
+  ConstantPooling(graph);


If we want to cleanup traced graphs, shouldn't we do it in tracing?

torch/csrc/jit/passes/quantization.cpp

Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722) [ghstack-poisoned]

Pull Request resolved: #30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722/) ghstack-source-id: 94887831

jamesr66a · 2019-12-04T19:54:31Z

torch/csrc/jit/passes/quantization.cpp

  auto graph = method.graph();
  ConstantPropagation(graph);
+  // To cleanup traced graph
+  ConstantPooling(graph);


Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing

jamesr66a · 2019-12-04T19:55:18Z

torch/csrc/jit/passes/quantization.cpp

  // must do constant propagation first before replacement
  replaceConvolutionWithConv2d(graph);
+  // fuse decomposed linear into aten::linear
+  FuseLinear(graph);


Though for this pass I remember there was some back-and-forth earlier about how it's not profitable in some cases, and might mess up some downstream systems (e.g. CUDA fuser). We could add it generally and see if anything breaks

yeah we can't do this by default since it doesn't work for autodiff, but @wanchaol is working on this.

Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch#30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. ghstack-source-id: 94887831 Test Plan: This is needed for quantizing traced model tests to pass Imported from OSS Differential Revision: D18795722 fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829

jerryzh168 requested a review from apaszke as a code owner November 26, 2019 19:12

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Nov 26, 2019

jerryzh168 requested review from ZolotukhinM, jamesr66a and lly-zero-one November 26, 2019 19:19

lly-zero-one reviewed Nov 27, 2019

View reviewed changes

ZolotukhinM reviewed Dec 2, 2019

View reviewed changes

jerryzh168 added 2 commits December 2, 2019 12:07

jerryzh168 requested review from ZolotukhinM and lly-zero-one December 3, 2019 17:39

jamesr66a approved these changes Dec 4, 2019

View reviewed changes

facebook-github-bot closed this in 3c1bb21 Dec 5, 2019

facebook-github-bot deleted the gh/jerryzh168/148/head branch December 10, 2019 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant][graphmode] Invoke more passes in `insertObservers` #30473

[quant][graphmode] Invoke more passes in `insertObservers` #30473

Uh oh!

jerryzh168 commented Nov 26, 2019 •

edited

Loading

Uh oh!

lly-zero-one Nov 27, 2019

Uh oh!

jerryzh168 Nov 27, 2019

Uh oh!

ZolotukhinM Dec 2, 2019

Uh oh!

jerryzh168 Dec 2, 2019

Uh oh!

jamesr66a Dec 4, 2019

Uh oh!

ZolotukhinM Dec 2, 2019

Uh oh!

Uh oh!

jamesr66a Dec 4, 2019

Uh oh!

jamesr66a Dec 4, 2019

Uh oh!

jerryzh168 Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[quant][graphmode] Invoke more passes in insertObservers #30473

[quant][graphmode] Invoke more passes in insertObservers #30473

Uh oh!

Conversation

jerryzh168 commented Nov 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[quant][graphmode] Invoke more passes in `insertObservers` #30473

[quant][graphmode] Invoke more passes in `insertObservers` #30473

jerryzh168 commented Nov 26, 2019 •

edited

Loading