-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp #68149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 5cfd6e1 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! ghstack-source-id: 143023620 Pull Request resolved: #68149
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143023620 Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)!
deferring to @malfet and @desertfire |
To clarify on requesting changes - global initializer, as it is currently implemented, is wrong, isn't it? |
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143245262 Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)!
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143292971 Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)!
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143314335 Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)!
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143676384 Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like comments are addressed
LGTM in general. One question is how extensible this solution is. Say if we want to add a similar fusion pass for MKLDNN (#64639), would the initializations (MKLDNN and CUDA) conflict with each other? |
This pull request has been merged in cfc75c2. |
@desertfire afaik this could be extended with other passes (maybe use a vector<function<>> instead) without issues |
This pull request has been reverted by cd043c3. To re-land this change, follow these steps. |
…68149) JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143676384 Test Plan: In the following script, conv_add_relu fusion is not observed without this change, but is observed when this change is added. ``` from typing import List, Optional import torch class Model(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) self.add_tensor = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) def forward( self, inp: torch.Tensor, bias: Optional[torch.Tensor], stride: List[int], padding: List[int], dilation: List[int], groups: int, ): # weight = torch.zeros((3, 3, 7, 7), device="cuda") inp = inp.to("cuda") conv_result = torch.conv2d( inp, self.weight, bias, stride, padding, dilation, groups ) add_result = conv_result.add_(self.add_tensor) return add_result.relu_() torch.jit.export def make_prediction(self, inp: torch.Tensor): bias = None groups = 1 stride = (1, 1) padding = (0, 0) dilation = (1, 1) return self.forward(inp, bias, stride, padding, dilation, groups) if __name__ == "__main__": # generate some sample input groups = 1 channels_in = 3 channels_out = 3 kernel_size = (7, 7) stride = (1, 1) padding = (0, 0) dilation = (1, 1) inp = torch.rand((64, 3, 432, 432)) weight = torch.rand( (channels_out, channels_in, kernel_size[0], kernel_size[1]), device="cuda" ) bias = None model = Model() model.eval() script = torch.jit.script(model) script = torch.jit.freeze(script) script = torch.jit.optimize_for_inference(script) print("~~~~ FORWARD ~~~~") print(script.graph) print("with preserved_attrs") print(torch.sum(script.forward(inp, bias, stride, padding, dilation, groups))) ``` fbshipit-source-id: c0f10da4b9540c588819efe3ec540baa0fae4b35 [ghstack-poisoned]
…68149) JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143676384 Test Plan: In the following script, conv_add_relu fusion is not observed without this change, but is observed when this change is added. ``` from typing import List, Optional import torch class Model(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) self.add_tensor = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) def forward( self, inp: torch.Tensor, bias: Optional[torch.Tensor], stride: List[int], padding: List[int], dilation: List[int], groups: int, ): # weight = torch.zeros((3, 3, 7, 7), device="cuda") inp = inp.to("cuda") conv_result = torch.conv2d( inp, self.weight, bias, stride, padding, dilation, groups ) add_result = conv_result.add_(self.add_tensor) return add_result.relu_() torch.jit.export def make_prediction(self, inp: torch.Tensor): bias = None groups = 1 stride = (1, 1) padding = (0, 0) dilation = (1, 1) return self.forward(inp, bias, stride, padding, dilation, groups) if __name__ == "__main__": # generate some sample input groups = 1 channels_in = 3 channels_out = 3 kernel_size = (7, 7) stride = (1, 1) padding = (0, 0) dilation = (1, 1) inp = torch.rand((64, 3, 432, 432)) weight = torch.rand( (channels_out, channels_in, kernel_size[0], kernel_size[1]), device="cuda" ) bias = None model = Model() model.eval() script = torch.jit.script(model) script = torch.jit.freeze(script) script = torch.jit.optimize_for_inference(script) print("~~~~ FORWARD ~~~~") print(script.graph) print("with preserved_attrs") print(torch.sum(script.forward(inp, bias, stride, padding, dilation, groups))) ``` fbshipit-source-id: c0f10da4b9540c588819efe3ec540baa0fae4b35 ghstack-source-id: e921804 Pull Request resolved: #69253
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
…fusion.cpp" JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Differential Revision: [D32329330](https://our.internmc.facebook.com/intern/diff/D32329330/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D32329330/)! [ghstack-poisoned]
Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 9c93680
@davidberard98 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…68149) Summary: Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32773666 Pulled By: davidberard98 fbshipit-source-id: c83dbb88804bdef23dc60a6299acbfa76d5c1495
…68149) Summary: Pull Request resolved: #68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32773666 Pulled By: davidberard98 fbshipit-source-id: c83dbb88804bdef23dc60a6299acbfa76d5c1495
In a CPU-only build of Torch, I get these warnings on every run of a new model: Is this normal? If it is normal to hit this path, should it not be silent instead? |
Removing the warning in #72441 |
Stack from ghstack:
JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available)
Differential Revision: D32773666