Skip to content

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Nov 26, 2019

Stack from ghstack:

Summary:
Invoked ConstantPooling and FuseLinear pass before
insertObservers.
ConstantPooling is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
FuseLinear is to merge the exploded linear function into aten::linear so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

Differential Revision: D18795722

Summary:
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
auto graph = method.graph();
ConstantPropagation(graph);
// To cleanup traced graph
ConstantPooling(graph);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there any order we can follow for the optimization passes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, looks like in graph_executor constant pooling is done before constant propagation, I'll change the order

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to cleanup traced graphs, shouldn't we do it in tracing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we don't do this in traced graph, cc @jamesr66a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing

auto graph = method.graph();
ConstantPropagation(graph);
// To cleanup traced graph
ConstantPooling(graph);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to cleanup traced graphs, shouldn't we do it in tracing?

Summary:
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 4, 2019
Pull Request resolved: #30473



Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722/)
ghstack-source-id: 94887831
auto graph = method.graph();
ConstantPropagation(graph);
// To cleanup traced graph
ConstantPooling(graph);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was speaking with @suo yesterday. It's a pretty common request to add more "clean-up" passes to Module emission (i.e. tracing or scripting) as opposed to just in the GraphExecutor. I think this PR is good, but it would also be valid to just add these things to tracing

// must do constant propagation first before replacement
replaceConvolutionWithConv2d(graph);
// fuse decomposed linear into aten::linear
FuseLinear(graph);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though for this pass I remember there was some back-and-forth earlier about how it's not profitable in some cases, and might mess up some downstream systems (e.g. CUDA fuser). We could add it generally and see if anything breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can't do this by default since it doesn't work for autodiff, but @wanchaol is working on this.

Summary:
Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.

Test Plan:
This is needed for quantizing traced model tests to pass
Reviewers:
mvz

Subscribers:

Tasks:

Tags:

Differential Revision: [D18795722](https://our.internmc.facebook.com/intern/diff/D18795722)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/148/head branch December 10, 2019 15:19
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
Pull Request resolved: pytorch#30473

Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.
ghstack-source-id: 94887831

Test Plan:
This is needed for quantizing traced model tests to pass

Imported from OSS

Differential Revision: D18795722

fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants