-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hardswish/hardsigmoid on MKLDNN tensors #55218
Conversation
💊 CI failures summary and remediationsAs of commit 52b86f5 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
std::function<Tensor(Tensor)> fallback) { | ||
return [aten_op, fallback](Stack* stack) { | ||
auto a = pop(stack).toTensor(); | ||
if (a.numel() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is just to put numel() == 0 on the hot path instead of for correctness reasons is it worth it ?
@@ -633,6 +692,10 @@ class MKLDNNSubgraphSlicer { | |||
return true; | |||
} | |||
|
|||
if (n->kind() == Symbol::aten("hardswish")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: register aten::hardswish as an interned operator
?
@@ -456,6 +510,11 @@ void ComputeSubgraphInMKLDNN(Node* subgraph_node) { | |||
body_node->replaceInput(1, node->outputs().at(1)); | |||
} | |||
|
|||
if (body_node->kind() == Symbol::aten("hardswish")) { | |||
body_node->replaceWithNewSymbol(Symbol::aten("MKLDNNHardSwish")); | |||
body_node->destroy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be adding hardswish_
here
static std::unordered_set<Symbol> mkldnn_ops = { |
@@ -238,6 +281,17 @@ Operation BroadOp(const Node* node) { | |||
}; | |||
} | |||
|
|||
const RegisterOperators MKLDNNHardSwishOpReg({ | |||
torch::jit::Operator( | |||
"aten::MKLDNNHardSwish(Tensor a) -> Tensor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a way of inplacing this :
if (k == aten::relu || k == aten::sigmoid || k == aten::dropout) { |
Maybe just by adding an inplace=true
attribute to the node
@@ -238,6 +281,17 @@ Operation BroadOp(const Node* node) { | |||
}; | |||
} | |||
|
|||
const RegisterOperators MKLDNNHardSwishOpReg({ | |||
torch::jit::Operator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we have prim::mkldnn_convolution
, reason not to put this in as prim::hardswish
for consistency ?
649fcd1
to
32765f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
test/jit/test_freezing.py
Outdated
g = parse_ir(graph_str) | ||
m = self.createFunctionFromGraph(g) | ||
x = torch.rand(size) | ||
x_copy = x.detach().clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why are you detaching and cloning x_copy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for in-place tests, the reference implementations modify the input in aten_op(x, inplace=inplace)
. I guess, I could just always use F.hardswish(inplace=False)
, might look a bit cleaner.
test/jit/test_freezing.py
Outdated
mod = self.freezeAndConvert(mod_eager) | ||
FileCheck().check("mkldnn_convolution").check_next("aten::relu_").check_next("aten::relu_").run(mod.graph) | ||
print(mod.graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a great catch!
mod = self.freezeAndConvert(mod_eager) | ||
FileCheck().check("mkldnn_convolution").check_next("aten::relu_").check_next("aten::relu_").run(mod.graph) | ||
print(mod.graph) | ||
FileCheck().check("mkldnn_convolution").check_next("prim::MKLDNNHardSwish_").check_next("aten::relu_").run(mod.graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the hardswish still necessary
e389016
to
d9b7c11
Compare
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and I think is right (great job, this was tricky) but can you add a little more testing and more of a comment ? Reading through the code right now I wouldn't know what invariants we have to maintain and why the code is written how it is
g = parse_ir(graph_str) | ||
m = self.createFunctionFromGraph(g) | ||
x = torch.rand(size) | ||
# `inplace=False` is intentional, otherwise we modify the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the inplace version somehow? Ideally we'd also test this after we'd run it with a Conv where we know it's going to output a packed tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a conv test and we already have inplace tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have inplace tests that arent testing these new ops which have a different way of running them, maybe as a follow up at least in next pr add more tests there
c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); | ||
// we cast `a` to an `ideep::tensor`, so we can get at its descriptor | ||
// which we then use to set up `out` tensor w/ the same props as a | ||
auto a_it = at::native::itensor_from_mkldnn(a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to suggest a bunch of things here for readability just because this is IMO very tricky code that is easy to get wrong (we almost did), and which requires an understanding of both mkldnn packed format and aten tensors. Feel free to not accept the comments obviously.
nit: a_it -> a_ideep_tensor
// we cast `a` to an `ideep::tensor`, so we can get at its descriptor | ||
// which we then use to set up `out` tensor w/ the same props as a | ||
auto a_it = at::native::itensor_from_mkldnn(a); | ||
auto raw_data = a_it.get_data_handle(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe mkldnn_raw_data_handle
?
// which we then use to set up `out` tensor w/ the same props as a | ||
auto a_it = at::native::itensor_from_mkldnn(a); | ||
auto raw_data = a_it.get_data_handle(); | ||
auto topt = a.options().layout(c10::kStrided); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename topt
? I don't know where t
or opt
is coming from. Maybe: a_options_with_strided
auto raw_data = a_it.get_data_handle(); | ||
auto topt = a.options().layout(c10::kStrided); | ||
// we also wrap `a` storage into an aten tensor | ||
auto t = at::from_blob(raw_data, {a.numel()}, topt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give t
a more verbose name ?
if (!inplace) { | ||
// `a_it.get_desc()` will allocate a tensor | ||
// of the right physical size. | ||
auto it_empty = ideep::tensor(a_it.get_desc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an assertion that it_empty physical size equals input physical size?
// this node doesnt handle string padding yet... | ||
if (!body_node->namedInput("padding")->type()->cast<StringType>()) { | ||
body_node->replaceWithNewSymbol(Symbol::prim("mkldnn_convolution")); | ||
auto true_pred = [](Node*) { return true; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can you keep the conv handling how it was originally? I think it was more readable originally and we've lost the this node doesnt handle string padding yet
comment. If we start getting a lot more predicates than maybe we should refactor.
"prim::MKLDNNRelu6_(Tensor(a!) self) -> Tensor(a!)", | ||
createUnaryOp( | ||
[](at::Tensor output, at::Tensor input) { | ||
at::hardtanh_out(output, input, 0.f, 6.f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, just use relu6 since it exists
"prim::MKLDNNRelu6(Tensor(a!) self) -> Tensor(a!)", | ||
createUnaryOp( | ||
[](at::Tensor output, at::Tensor input) { | ||
at::hardtanh_out(output, input, 0.f, 6.f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an at::relu6
, can you use it instead of hardtanh
?
"prim::MKLDNNHardSwish_(Tensor(a!) self) -> Tensor(a!)", | ||
createUnaryOp( | ||
[](at::Tensor output, at::Tensor input) { | ||
at::hardswish_out(output, input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these operators, we could just call at::cpu::hardswish_out
. Since we are already creating a custom operator we might as well avoid the overhead
a1be3ba
to
2dc9bce
Compare
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Codecov Report
@@ Coverage Diff @@
## master #55218 +/- ##
=======================================
Coverage 77.13% 77.13%
=======================================
Files 1912 1912
Lines 189312 189356 +44
=======================================
+ Hits 146024 146058 +34
- Misses 43288 43298 +10 |
@Krovatkin merged this pull request in 9d3d169. |
Summary: Adding hardwish and hardsigmoid improves mobilenetv3 by ~13% | hardswish | base | -- | -- | -- | -- run 1 | 1305.032 | 1486.013 | run 2 | 1290.142 | 1491.001 | run 3 | 1305.51 | 1491.66 | run 4 | 1308.788 | 1495.577 | avg | 1302.368 | 1491.063 | 0.873449 Pull Request resolved: pytorch#55218 Reviewed By: albanD Differential Revision: D27701276 Pulled By: Krovatkin fbshipit-source-id: cde78da71d327e65461e80fbb6c3bb3429505410
Adding hardwish and hardsigmoid improves mobilenetv3 by ~13%