-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[quant][graphmode] Invoke more passes in insertObservers
#30473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: [ghstack-poisoned]
auto graph = method.graph(); | ||
ConstantPropagation(graph); | ||
// To cleanup traced graph | ||
ConstantPooling(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is there any order we can follow for the optimization passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, looks like in graph_executor constant pooling is done before constant propagation, I'll change the order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to cleanup traced graphs, shouldn't we do it in tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we don't do this in traced graph, cc @jamesr66a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing
auto graph = method.graph(); | ||
ConstantPropagation(graph); | ||
// To cleanup traced graph | ||
ConstantPooling(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to cleanup traced graphs, shouldn't we do it in tracing?
Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722) [ghstack-poisoned]
Pull Request resolved: #30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722/) ghstack-source-id: 94887831
auto graph = method.graph(); | ||
ConstantPropagation(graph); | ||
// To cleanup traced graph | ||
ConstantPooling(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing
// must do constant propagation first before replacement | ||
replaceConvolutionWithConv2d(graph); | ||
// fuse decomposed linear into aten::linear | ||
FuseLinear(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though for this pass I remember there was some back-and-forth earlier about how it's not profitable in some cases, and might mess up some downstream systems (e.g. CUDA fuser). We could add it generally and see if anything breaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can't do this by default since it doesn't work for autodiff, but @wanchaol is working on this.
Summary: Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. Test Plan: This is needed for quantizing traced model tests to pass Reviewers: mvz Subscribers: Tasks: Tags: Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. ghstack-source-id: 94887831 Test Plan: This is needed for quantizing traced model tests to pass Imported from OSS Differential Revision: D18795722 fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829
Stack from ghstack:
test_nested
#30475 Refactor test_quantization.py and enabletest_nested
insertObservers
#30473 [quant][graphmode] Invoke more passes ininsertObservers
Summary:
Invoked
ConstantPooling
andFuseLinear
pass beforeinsertObservers
.ConstantPooling
is for cleanning up traced graph, e.g. when wehave to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
FuseLinear
is to merge the exploded linear function intoaten::linear
sothat we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.
Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz
Subscribers:
Tasks:
Tags:
Differential Revision: D18795722