Enabling concat fast path for channels last inputs #39448

kimishpatel · 2020-06-03T14:13:32Z

Summary:
Updates concat kernel for contiguous input to support channels_last contig tensors.

This was tried on squeezenet model on pixel-2 device. It improves model perf by about 25%.

Test Plan:
test_cat_in_channels_last

Reviewers:

Subscribers:

Tasks:

Tags:

kimishpatel · 2020-06-03T14:42:06Z

Note that, I think this can be combined with the exist kernel in CatKernel.cpp. However it would be good to parallelize that one. The original motivation for that one was to fix a perf regression but it did not have to be serial I believe.
I will try that out as well.

dr-ci · 2020-06-03T16:12:59Z

💊 CI failures summary and remediations

As of commit 914fe8a (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 54 times.

kimishpatel · 2020-06-03T18:50:10Z

cc: @mingfeima

zou3519 · 2020-06-04T15:14:17Z

I don't know much about memory format so I am removing myself from the reviewers list. @VitalyFedyunin could you take a look at this?

kimishpatel · 2020-06-04T15:15:44Z

I don't know much about memory format so I am removing myself from the reviewers list. @VitalyFedyunin could you take a look at this?

Ya no worries. I added you because gh showed me you reviewed those files recently. I will let other reviewers chime in.

kimishpatel · 2020-06-15T16:17:14Z

@VitalyFedyunin, can you please take a look at this? Thanks.

Note that, I think this can be combined with the exist kernel in CatKernel.cpp. However it would be good to parallelize that one. The original motivation for that one was to fix a perf regression but it did not have to be serial I believe.
I will try that out as well.

I tried parallelizing the serial kernel. The perf results were inconclusive and variance very high in some cases. For now just leaving that as is.

kimishpatel · 2020-06-15T16:26:12Z

@VitalyFedyunin, can you please take a look at this?

kimishpatel · 2020-06-17T19:42:59Z

@colesbury, can you help review this PR? Thanks.

test/test_torch.py

VitalyFedyunin

I'm concerned about changing math for inner and outer, please add tests covering various dim= values.

aten/src/ATen/native/cpu/CatKernel.cpp

test/test_torch.py

colesbury

The approach looks reasonable, but I'm not super familiar with the cat code. I noticed a few issues mentioned inline.

Additionally, in the PR description you wrote:

Enables that kernel by default for contig tensors even for large inputs and when more than one thread is available.

I don't see that change in the PR. The call to cat_serial_stub looks like it's still guarded by use_serial_kernel.

aten/src/ATen/native/cpu/CatKernel.cpp

kimishpatel · 2020-06-17T23:00:46Z

The approach looks reasonable, but I'm not super familiar with the cat code. I noticed a few issues mentioned inline.

Additionally, in the PR description you wrote:

Enables that kernel by default for contig tensors even for large inputs and when more than one thread is available.

I don't see that change in the PR. The call to cat_serial_stub looks like it's still guarded by use_serial_kernel.

Yes, my bad. Initially I did try what the old summary said but it did not seem to be provably better so I kept changes to the minimal. I have updated the summary.

kimishpatel · 2020-06-17T23:01:38Z

@VitalyFedyunin @colesbury, I have addressed your comments.

VitalyFedyunin · 2020-06-21T20:09:51Z

test/test_torch.py

@@ -7045,6 +7045,56 @@ def test_cat_out_channels_last(self, device):
        res2 = torch.cat((x, y), out=z)
        self.assertEqual(res1, res2)

+    def test_cat_in_channels_last(self, device):


Please move test to other class where device is unnecessary or add cpu only decorator or use device appropriatly.

VitalyFedyunin

Tests still requires some fixes.

VitalyFedyunin · 2020-06-21T20:10:42Z

test/test_torch.py

+        self.assertTrue(res2.is_contiguous(memory_format=torch.channels_last))
+        self.assertEqual(res1, res2)
+
+        # Concat across dim 2


Could be simple for loop by dimensions.

kimishpatel · 2020-06-21T23:58:52Z

@VitalyFedyunin, thanks a lot of the comments. I have incorporated them. Please let me know if there is anything else.

VitalyFedyunin

Windows test failures look real

kimishpatel · 2020-06-22T14:02:10Z

@VitalyFedyunin, looks like this PR fixes the windows failures: #40369.
I have rebased. Lets see.

facebook-github-bot

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

colesbury · 2020-06-22T14:11:25Z

aten/src/ATen/native/cpu/CatKernel.cpp

    }
  }
 }

 void cat_serial_kernel(Tensor& result, TensorList tensors, int64_t dim) {
-  AT_DISPATCH_FLOATING_TYPES(result.scalar_type(), "cat_serial_kernel", [&]() {
+  AT_DISPATCH_FLOATING_TYPES(result.scalar_type(), "cat_contig_kernel", [&]() {


s/cat_contig_kernel/cat_serial_kernel

Summary: Existing cat implementation produces output tensor in contig format disregarding in the input memory format. This PR fixes the kernel as well as op implementation to account for that. Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-06-23T20:14:50Z

@kimishpatel merged this pull request in 6a421d5.

facebook-github-bot · 2020-06-23T20:14:53Z

@kimishpatel merged this pull request in 6a421d5.

kimishpatel requested review from zou3519 and VitalyFedyunin June 3, 2020 14:14

kimishpatel force-pushed the channels_last_concat branch from fcc4c8d to d2e2786 Compare June 3, 2020 18:45

kimishpatel changed the title ~~Add fast path for channel dim concat when input memory format is~~ Enabling concat fast path for channels last inputs Jun 3, 2020

kimishpatel force-pushed the channels_last_concat branch from d2e2786 to 1bd3cd5 Compare June 4, 2020 00:56

zou3519 removed their request for review June 4, 2020 15:13

kimishpatel force-pushed the channels_last_concat branch 2 times, most recently from cf15124 to bd4663e Compare June 15, 2020 16:14

kimishpatel force-pushed the channels_last_concat branch from bd4663e to 8fae83e Compare June 15, 2020 16:23

kimishpatel requested a review from apaszke as a code owner June 15, 2020 16:23

kimishpatel force-pushed the channels_last_concat branch from 8fae83e to 670dd58 Compare June 15, 2020 16:24

kimishpatel requested a review from colesbury June 17, 2020 19:42

VitalyFedyunin reviewed Jun 17, 2020

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

VitalyFedyunin suggested changes Jun 17, 2020

View reviewed changes

aten/src/ATen/native/cpu/CatKernel.cpp Outdated Show resolved Hide resolved

test/test_torch.py Outdated Show resolved Hide resolved

colesbury reviewed Jun 17, 2020

View reviewed changes

aten/src/ATen/native/cpu/CatKernel.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/cpu/CatKernel.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/cpu/CatKernel.cpp Outdated Show resolved Hide resolved

kimishpatel force-pushed the channels_last_concat branch from 670dd58 to 5be24f3 Compare June 17, 2020 22:57

kimishpatel requested review from VitalyFedyunin and colesbury June 19, 2020 18:11

VitalyFedyunin reviewed Jun 21, 2020

View reviewed changes

VitalyFedyunin suggested changes Jun 21, 2020

View reviewed changes

kimishpatel force-pushed the channels_last_concat branch from 5be24f3 to 1fff6cb Compare June 21, 2020 23:56

kimishpatel requested a review from VitalyFedyunin June 21, 2020 23:58

VitalyFedyunin approved these changes Jun 22, 2020

View reviewed changes

VitalyFedyunin suggested changes Jun 22, 2020

View reviewed changes

kimishpatel force-pushed the channels_last_concat branch from 1fff6cb to 0bc58ba Compare June 22, 2020 14:00

facebook-github-bot reviewed Jun 22, 2020

View reviewed changes

colesbury reviewed Jun 22, 2020

View reviewed changes

kimishpatel force-pushed the channels_last_concat branch from 0bc58ba to 43400c9 Compare June 22, 2020 14:16

VitalyFedyunin approved these changes Jun 22, 2020

View reviewed changes

kimishpatel force-pushed the channels_last_concat branch from 43400c9 to cd82017 Compare June 22, 2020 18:48

kimishpatel force-pushed the channels_last_concat branch from 914278f to 914fe8a Compare June 23, 2020 14:48

facebook-github-bot reviewed Jun 23, 2020

View reviewed changes

facebook-github-bot closed this in 6a421d5 Jun 23, 2020

facebook-github-bot added the merged label Jun 23, 2020

facebook-github-bot deleted the channels_last_concat branch July 13, 2020 17:53

mruberry added the Merged label Oct 28, 2020

bdhirsh mentioned this pull request Dec 10, 2021

Port cat kernel to structured kernels. #68640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling concat fast path for channels last inputs #39448

Enabling concat fast path for channels last inputs #39448

kimishpatel commented Jun 3, 2020 •

edited

kimishpatel commented Jun 3, 2020 •

edited

dr-ci bot commented Jun 3, 2020 •

edited

kimishpatel commented Jun 3, 2020

zou3519 commented Jun 4, 2020

kimishpatel commented Jun 4, 2020

kimishpatel commented Jun 15, 2020

kimishpatel commented Jun 15, 2020

kimishpatel commented Jun 17, 2020

VitalyFedyunin left a comment

colesbury left a comment

kimishpatel commented Jun 17, 2020

kimishpatel commented Jun 17, 2020

VitalyFedyunin Jun 21, 2020

VitalyFedyunin left a comment

VitalyFedyunin Jun 21, 2020

kimishpatel commented Jun 21, 2020

VitalyFedyunin left a comment

kimishpatel commented Jun 22, 2020

facebook-github-bot left a comment

colesbury Jun 22, 2020

facebook-github-bot left a comment

facebook-github-bot commented Jun 23, 2020

facebook-github-bot commented Jun 23, 2020

Enabling concat fast path for channels last inputs #39448

Enabling concat fast path for channels last inputs #39448

Conversation

kimishpatel commented Jun 3, 2020 • edited

kimishpatel commented Jun 3, 2020 • edited

dr-ci bot commented Jun 3, 2020 • edited

💊 CI failures summary and remediations

kimishpatel commented Jun 3, 2020

zou3519 commented Jun 4, 2020

kimishpatel commented Jun 4, 2020

kimishpatel commented Jun 15, 2020

kimishpatel commented Jun 15, 2020

kimishpatel commented Jun 17, 2020

VitalyFedyunin left a comment

Choose a reason for hiding this comment

colesbury left a comment

Choose a reason for hiding this comment

kimishpatel commented Jun 17, 2020

kimishpatel commented Jun 17, 2020

VitalyFedyunin Jun 21, 2020

Choose a reason for hiding this comment

VitalyFedyunin left a comment

Choose a reason for hiding this comment

VitalyFedyunin Jun 21, 2020

Choose a reason for hiding this comment

kimishpatel commented Jun 21, 2020

VitalyFedyunin left a comment

Choose a reason for hiding this comment

kimishpatel commented Jun 22, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

colesbury Jun 22, 2020

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 23, 2020

facebook-github-bot commented Jun 23, 2020

kimishpatel commented Jun 3, 2020 •

edited

kimishpatel commented Jun 3, 2020 •

edited

dr-ci bot commented Jun 3, 2020 •

edited