Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement hardswish/hardsigmoid on MKLDNN tensors #55218

Closed
wants to merge 11 commits into from

Conversation

Krovatkin
Copy link
Contributor

@Krovatkin Krovatkin commented Apr 2, 2021

Adding hardwish and hardsigmoid improves mobilenetv3 by ~13%

  hardswish base  
run 1 1305.032 1486.013  
run 2 1290.142 1491.001  
run 3 1305.51 1491.66  
run 4 1308.788 1495.577  
avg 1302.368 1491.063 0.873449

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Apr 2, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 2, 2021

💊 CI failures summary and remediations

As of commit 52b86f5 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

std::function<Tensor(Tensor)> fallback) {
return [aten_op, fallback](Stack* stack) {
auto a = pop(stack).toTensor();
if (a.numel() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is just to put numel() == 0 on the hot path instead of for correctness reasons is it worth it ?

@@ -633,6 +692,10 @@ class MKLDNNSubgraphSlicer {
return true;
}

if (n->kind() == Symbol::aten("hardswish")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: register aten::hardswish as an interned operator ?

@@ -456,6 +510,11 @@ void ComputeSubgraphInMKLDNN(Node* subgraph_node) {
body_node->replaceInput(1, node->outputs().at(1));
}

if (body_node->kind() == Symbol::aten("hardswish")) {
body_node->replaceWithNewSymbol(Symbol::aten("MKLDNNHardSwish"));
body_node->destroy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be adding hardswish_ here

static std::unordered_set<Symbol> mkldnn_ops = {

@@ -238,6 +281,17 @@ Operation BroadOp(const Node* node) {
};
}

const RegisterOperators MKLDNNHardSwishOpReg({
torch::jit::Operator(
"aten::MKLDNNHardSwish(Tensor a) -> Tensor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a way of inplacing this :

if (k == aten::relu || k == aten::sigmoid || k == aten::dropout) {
.

Maybe just by adding an inplace=true attribute to the node

@@ -238,6 +281,17 @@ Operation BroadOp(const Node* node) {
};
}

const RegisterOperators MKLDNNHardSwishOpReg({
torch::jit::Operator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have prim::mkldnn_convolution, reason not to put this in as prim::hardswish for consistency ?

@Krovatkin Krovatkin changed the title [WIP] Krovatkin/recompute grad Implement hardswish/hardsigmoid on MKLDNN tensors Apr 6, 2021
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM!

g = parse_ir(graph_str)
m = self.createFunctionFromGraph(g)
x = torch.rand(size)
x_copy = x.detach().clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why are you detaching and cloning x_copy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for in-place tests, the reference implementations modify the input in aten_op(x, inplace=inplace). I guess, I could just always use F.hardswish(inplace=False) , might look a bit cleaner.

mod = self.freezeAndConvert(mod_eager)
FileCheck().check("mkldnn_convolution").check_next("aten::relu_").check_next("aten::relu_").run(mod.graph)
print(mod.graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a great catch!

mod = self.freezeAndConvert(mod_eager)
FileCheck().check("mkldnn_convolution").check_next("aten::relu_").check_next("aten::relu_").run(mod.graph)
print(mod.graph)
FileCheck().check("mkldnn_convolution").check_next("prim::MKLDNNHardSwish_").check_next("aten::relu_").run(mod.graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the hardswish still necessary

@facebook-github-bot
Copy link
Contributor

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@Krovatkin Krovatkin requested a review from eellison April 12, 2021 20:02
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and I think is right (great job, this was tricky) but can you add a little more testing and more of a comment ? Reading through the code right now I wouldn't know what invariants we have to maintain and why the code is written how it is

g = parse_ir(graph_str)
m = self.createFunctionFromGraph(g)
x = torch.rand(size)
# `inplace=False` is intentional, otherwise we modify the input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test the inplace version somehow? Ideally we'd also test this after we'd run it with a Conv where we know it's going to output a packed tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a conv test and we already have inplace tests.

Copy link
Contributor

@eellison eellison Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have inplace tests that arent testing these new ops which have a different way of running them, maybe as a follow up at least in next pr add more tests there

c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset);
// we cast `a` to an `ideep::tensor`, so we can get at its descriptor
// which we then use to set up `out` tensor w/ the same props as a
auto a_it = at::native::itensor_from_mkldnn(a);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to suggest a bunch of things here for readability just because this is IMO very tricky code that is easy to get wrong (we almost did), and which requires an understanding of both mkldnn packed format and aten tensors. Feel free to not accept the comments obviously.

nit: a_it -> a_ideep_tensor

// we cast `a` to an `ideep::tensor`, so we can get at its descriptor
// which we then use to set up `out` tensor w/ the same props as a
auto a_it = at::native::itensor_from_mkldnn(a);
auto raw_data = a_it.get_data_handle();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe mkldnn_raw_data_handle ?

// which we then use to set up `out` tensor w/ the same props as a
auto a_it = at::native::itensor_from_mkldnn(a);
auto raw_data = a_it.get_data_handle();
auto topt = a.options().layout(c10::kStrided);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename topt ? I don't know where t or opt is coming from. Maybe: a_options_with_strided

auto raw_data = a_it.get_data_handle();
auto topt = a.options().layout(c10::kStrided);
// we also wrap `a` storage into an aten tensor
auto t = at::from_blob(raw_data, {a.numel()}, topt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give t a more verbose name ?

if (!inplace) {
// `a_it.get_desc()` will allocate a tensor
// of the right physical size.
auto it_empty = ideep::tensor(a_it.get_desc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an assertion that it_empty physical size equals input physical size?

// this node doesnt handle string padding yet...
if (!body_node->namedInput("padding")->type()->cast<StringType>()) {
body_node->replaceWithNewSymbol(Symbol::prim("mkldnn_convolution"));
auto true_pred = [](Node*) { return true; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you keep the conv handling how it was originally? I think it was more readable originally and we've lost the this node doesnt handle string padding yet comment. If we start getting a lot more predicates than maybe we should refactor.

"prim::MKLDNNRelu6_(Tensor(a!) self) -> Tensor(a!)",
createUnaryOp(
[](at::Tensor output, at::Tensor input) {
at::hardtanh_out(output, input, 0.f, 6.f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, just use relu6 since it exists

"prim::MKLDNNRelu6(Tensor(a!) self) -> Tensor(a!)",
createUnaryOp(
[](at::Tensor output, at::Tensor input) {
at::hardtanh_out(output, input, 0.f, 6.f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an at::relu6, can you use it instead of hardtanh?

"prim::MKLDNNHardSwish_(Tensor(a!) self) -> Tensor(a!)",
createUnaryOp(
[](at::Tensor output, at::Tensor input) {
at::hardswish_out(output, input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these operators, we could just call at::cpu::hardswish_out. Since we are already creating a custom operator we might as well avoid the overhead

remove dead code

clang-format

remove recompute files

more diag

add tests, fix warnings

fix clang-format warnings

fix aliasing

fxi test failures after fixing the schema
@facebook-github-bot
Copy link
Contributor

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Krovatkin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Apr 15, 2021

Codecov Report

Merging #55218 (52b86f5) into master (09c0bb4) will increase coverage by 0.00%.
The diff coverage is 93.33%.

@@           Coverage Diff           @@
##           master   #55218   +/-   ##
=======================================
  Coverage   77.13%   77.13%           
=======================================
  Files        1912     1912           
  Lines      189312   189356   +44     
=======================================
+ Hits       146024   146058   +34     
- Misses      43288    43298   +10     

@facebook-github-bot
Copy link
Contributor

@Krovatkin merged this pull request in 9d3d169.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Adding hardwish and hardsigmoid improves mobilenetv3 by ~13%

  | hardswish | base |
-- | -- | -- | --
run 1 | 1305.032 | 1486.013 |
run 2 | 1290.142 | 1491.001 |
run 3 | 1305.51 | 1491.66 |
run 4 | 1308.788 | 1495.577 |
avg | 1302.368 | 1491.063 | 0.873449

Pull Request resolved: pytorch#55218

Reviewed By: albanD

Differential Revision: D27701276

Pulled By: Krovatkin

fbshipit-source-id: cde78da71d327e65461e80fbb6c3bb3429505410
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants