[TensorExpr] Add aten::matmuls to TE fuser. #54605

ZolotukhinM · 2021-03-24T17:57:16Z

Stack from ghstack:

[TensorExpr] Add aten::matmuls to TE fuser. #54605 [TensorExpr] Add aten::matmuls to TE fuser.

For small sizes we generate a naive 3-layer loopnest, for bigger sizes
we generate an external call.

Differential Revision: D27298364

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. [ghstack-poisoned]

facebook-github-bot · 2021-03-24T17:57:24Z

💊 CI failures summary and remediations

As of commit d0f31f7 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-scanned failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. ghstack-source-id: 10e40f57f18c5fd221e4fa6029dd9a2ac78ae18c Pull Request resolved: #54605

eellison

You should be aware that matmul has broadcasting behavior, so either make sure that's handled or opt out in fuser

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

ZolotukhinM · 2021-04-14T18:53:32Z

You should be aware that matmul has broadcasting behavior, so either make sure that's handled or opt out in fuser

That's a good point, I totally forgot about that! I've added checks into the fuser and tests to verify that we don't fuse what we can't correctly lower yet. Could you please take a look? Are these tests sufficient?

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. ghstack-source-id: 2c46db959dfa4ca85cb6428cb0adf4794768e3d1 Pull Request resolved: #54605

eellison

LGTM, just had one question about b[0]

eellison · 2021-04-16T00:10:19Z

torch/csrc/jit/tensorexpr/kernel.cpp

@@ -187,6 +187,26 @@ bool conv2dIsSupported(const torch::jit::Node* node) {
  return true;
 }

+// The fuser currently only supports matmul of 2D x 2D matrices
+bool matmulIsSupported(const torch::jit::Node* node) {


Is there a reason for putting this in kernel instead of tensorexpr_fuser ?

Only because we already have conv2dIsSupported there. I wouldn't mind moving it to tensorexpr_fuser.cpp, but in this PR for consistency I'd prefer to keep them in kernel.cpp

Yeah I put conv2dIsSupported here b/c I wanted to refer to it from both kernel and fuser, and we already have other dependences going from kernel->fuser. But it's really fine either way.

eellison · 2021-04-16T00:11:03Z

torch/csrc/jit/tensorexpr/kernel.cpp

+  const IntImm* total_size = dynamic_cast<const IntImm*>(
+      IRSimplifier::simplify((size_a[0] * size_a[1] * size_b[1])).node());
+
+  if (total_size && total_size->value() < 1000) {


Can you add a comment here about why we're using external calls for n > 1000 ?

lol clang-tidy "1000 is a magic number". I want to troll the linter by doing constexpr int kMagicNumber = 1000;

eellison · 2021-04-16T00:11:31Z

torch/csrc/jit/tensorexpr/kernel.cpp

+  const Node* n = v->node();
+  auto const& shape = sizesForValue(v);
+  Dtype dtype = kFloat;
+  auto maybe_stype = findDtypeForValue(v);


What is maybe_stype?

"maybe" is for optionality of returned value, "stype" is for scalar-type. I think we've been using this name in surrounding code too.

eellison · 2021-04-16T00:13:28Z

torch/csrc/jit/tensorexpr/kernel.cpp

+  auto size_a = ExprVectorToExprHandleVector(a->dims());
+  auto size_b = ExprVectorToExprHandleVector(b->dims());
+  const IntImm* total_size = dynamic_cast<const IntImm*>(
+      IRSimplifier::simplify((size_a[0] * size_a[1] * size_b[1])).node());


Should we be multiplying by size_b[0] as well ?

It's a rough estimate of amount of work matmul needs to do, which is NMK (size_a[1] == size_b[0] == M). Would I not pass an interview with such analysis? 😜

eellison · 2021-04-16T00:13:58Z

torch/csrc/jit/tensorexpr/kernel.cpp

+          BufHandle bh(b);
+          return Load::make(ah, {m, k}) * Load::make(bh, {k, n});
+        },
+        {{size_a[1], "K"}});


In the spirit of https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/: there are a very small number of possible inputs to test, you might try writing a script to see how large you can make the total size while nnc is as fast as aten matmul (with a little bit of a leeway for possible fusion, say 5%)

This is a very basic naive heuristic and we for sure would like to tune it (and hopefully add better schedules for matmul as well). But I think it deserves a separate PR too, this one is mostly to lay down the foundation for this work.

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. ghstack-source-id: 377223dd80342ec1d1623219205dc05f2e6c0a60 Pull Request resolved: #54605

bertmaher

Looks good to me. So if we fuse matmul and use an external call, are we still as fast as eager mode?

bertmaher · 2021-04-16T03:21:16Z

torch/csrc/jit/tensorexpr/kernel.cpp

@@ -187,6 +187,26 @@ bool conv2dIsSupported(const torch::jit::Node* node) {
  return true;
 }

+// The fuser currently only supports matmul of 2D x 2D matrices
+bool matmulIsSupported(const torch::jit::Node* node) {


Yeah I put conv2dIsSupported here b/c I wanted to refer to it from both kernel and fuser, and we already have other dependences going from kernel->fuser. But it's really fine either way.

bertmaher · 2021-04-16T04:01:41Z

torch/csrc/jit/tensorexpr/kernel.cpp

+  const IntImm* total_size = dynamic_cast<const IntImm*>(
+      IRSimplifier::simplify((size_a[0] * size_a[1] * size_b[1])).node());
+
+  if (total_size && total_size->value() < 1000) {


lol clang-tidy "1000 is a magic number". I want to troll the linter by doing constexpr int kMagicNumber = 1000;

ZolotukhinM · 2021-04-16T18:29:20Z

So if we fuse matmul and use an external call, are we still as fast as eager mode?

I would expect so, but honestly I haven't done any rigorous measurements yet.

facebook-github-bot · 2021-04-16T19:55:55Z

@ZolotukhinM merged this pull request in 5f19385.

Summary: Pull Request resolved: pytorch#54605 For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27298364 Pulled By: ZolotukhinM fbshipit-source-id: 2ddf275ff68d6fca16a3befca5ce5c26aef462b5

[TensorExpr] Add aten::matmuls to TE fuser.

0a78bb4

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. [ghstack-poisoned]

ZolotukhinM mentioned this pull request Mar 24, 2021

[TensorExpr] Add plumbing for conv2d fusion. #54439

Closed

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Mar 24, 2021

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

0e5d953

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

ZolotukhinM requested a review from bertmaher March 24, 2021 18:52

eellison reviewed Mar 26, 2021

View reviewed changes

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

fec7d19

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

bcf82f9

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

ZolotukhinM requested a review from eellison April 14, 2021 18:53

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

21c4b72

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

eb36afb

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

eellison reviewed Apr 16, 2021

View reviewed changes

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

b255252

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

Update on "[TensorExpr] Add aten::matmuls to TE fuser."

d0f31f7

For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Differential Revision: [D27298364](https://our.internmc.facebook.com/intern/diff/D27298364) [ghstack-poisoned]

ZolotukhinM requested a review from eellison April 16, 2021 03:45

bertmaher approved these changes Apr 16, 2021

View reviewed changes

facebook-github-bot closed this in 5f19385 Apr 16, 2021

facebook-github-bot added the Merged label Apr 16, 2021

facebook-github-bot deleted the gh/ZolotukhinM/415/head branch April 20, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorExpr] Add aten::matmuls to TE fuser. #54605

[TensorExpr] Add aten::matmuls to TE fuser. #54605

ZolotukhinM commented Mar 24, 2021 •

edited

Loading

facebook-github-bot commented Mar 24, 2021 •

edited

Loading

eellison left a comment

ZolotukhinM commented Apr 14, 2021

eellison left a comment

eellison Apr 16, 2021

ZolotukhinM Apr 16, 2021

bertmaher Apr 16, 2021

eellison Apr 16, 2021

ZolotukhinM Apr 16, 2021

bertmaher Apr 16, 2021

eellison Apr 16, 2021

ZolotukhinM Apr 16, 2021

eellison Apr 16, 2021

ZolotukhinM Apr 16, 2021

eellison Apr 16, 2021

ZolotukhinM Apr 16, 2021

bertmaher left a comment

bertmaher Apr 16, 2021

bertmaher Apr 16, 2021

ZolotukhinM commented Apr 16, 2021

facebook-github-bot commented Apr 16, 2021

[TensorExpr] Add aten::matmuls to TE fuser. #54605

[TensorExpr] Add aten::matmuls to TE fuser. #54605

Conversation

ZolotukhinM commented Mar 24, 2021 • edited Loading

facebook-github-bot commented Mar 24, 2021 • edited Loading

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

eellison left a comment

Choose a reason for hiding this comment

ZolotukhinM commented Apr 14, 2021

eellison left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bertmaher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZolotukhinM commented Apr 16, 2021

facebook-github-bot commented Apr 16, 2021

ZolotukhinM commented Mar 24, 2021 •

edited

Loading

facebook-github-bot commented Mar 24, 2021 •

edited

Loading