Skip to content

Conversation

@kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Jun 1, 2020

Stack from ghstack:

Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D21822397

Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jun 1, 2020

💊 CI failures summary and remediations

As of commit a5c46a5 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 51 times.

Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
}

void add_clamp_kernel(TensorIterator& iter, Scalar alpha_scalar, Scalar min_val, Scalar max_val) {
AT_DISPATCH_ALL_TYPES(iter.dtype(), "add_relu_cpu", [&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"add_clamp_cpu"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will fix it.

Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
@dreiss
Copy link
Contributor

dreiss commented Jun 17, 2020

Do you have any numbers on how much this improves resnet performance compared to unfused (but vec256-ized) add then relu?

@dreiss
Copy link
Contributor

dreiss commented Jun 17, 2020

(or any other model like unet)

@kimishpatel
Copy link
Contributor Author

Do you have any numbers on how much this improves resnet performance compared to unfused (but vec256-ized) add then relu?

I havent tried on resnet. For some reason on unet it did not make any difference. I did not dig further yet to find out why. I will try it out on resnet.

@kimishpatel
Copy link
Contributor Author

kimishpatel commented Jun 18, 2020

@dreiss, I checked perf on unet model on s9 phone.
Before this stack: 19ms
After this stack and applying add_relu fusion pass (since I am yet to add this to optimizeForMobile, will stack another diff on top for that): 15ms

Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
Summary:
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.

Test Plan:
python test/test_nn.py TestAddRelu

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D21822397](https://our.internmc.facebook.com/intern/diff/D21822397)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 82c9f79.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/28/head branch July 13, 2020 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants