-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add fused add_relu op. #39342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fused add_relu op. #39342
Conversation
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit a5c46a5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 51 times. |
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
| } | ||
|
|
||
| void add_clamp_kernel(TensorIterator& iter, Scalar alpha_scalar, Scalar min_val, Scalar max_val) { | ||
| AT_DISPATCH_ALL_TYPES(iter.dtype(), "add_relu_cpu", [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"add_clamp_cpu"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will fix it.
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
|
Do you have any numbers on how much this improves resnet performance compared to unfused (but vec256-ized) |
|
(or any other model like unet) |
I havent tried on resnet. For some reason on unet it did not make any difference. I did not dig further yet to find out why. I will try it out on resnet. |
|
@dreiss, I checked perf on unet model on s9 phone. |
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
Summary: Many networks such as resnet have adds followed by relu. This op is the first step in enabling this fused implementation. Once we have the fused add_relu op, a JIT pass will be written to replace add + relu patterns with add_relu. Test Plan: python test/test_nn.py TestAddRelu Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397) [ghstack-poisoned]
|
This pull request has been merged in 82c9f79. |
Stack from ghstack:
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.
Test Plan:
python test/test_nn.py TestAddRelu
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D21822397