Optimize backward for torch.repeat #46726

ejguan · 2020-10-22T17:58:34Z

Stack from ghstack:

Optimize backward for torch.repeat #46726 Optimize backward for torch.repeat

Fixes #43192

Differential Revision: D24739840

[ghstack-poisoned]

ghstack-source-id: db0cc44 Pull Request resolved: #46726

albanD

Change looks good. Just small comments.

torch/csrc/autograd/FunctionsManual.cpp

ejguan · 2020-10-22T22:16:03Z

Tests (using torch.utils.benchmark):

x = torch.randn(3, 4, requires_grad=True)
def test_repeat(x):
    y = x.repeat(20, 10, 15, 25)
    out = y.sum()
    out.backward()

Average time for 10 * 10000 runs.

	Before	After	Improvement
CPU	4.236 ms	4.124 ms	2.64%
GPU	760.262 us	259.652 us	65.85%

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: 0026638 Pull Request resolved: #46726

torch/csrc/autograd/FunctionsManual.cpp

zou3519 · 2020-10-27T15:52:08Z

Tests (using torch.utils.benchmark):

I'm curious to see the equivalent torch.expand / torch.repeat_interleave call for the examples in #43192 (comment).

Also, it would be nice to see some benchmark figures on larger tensors (a tensor of size [3, 4] is pretty small, in practice users have larger tensors)

zou3519

this lgtm (with some minor comments). I'm curious to see some more performance numbers

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

dr-ci · 2020-10-27T19:46:18Z

💊 CI failures summary and remediations

As of commit 565d4b2 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/caffe2-pytorch-linux-bionic-rocm3.8-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 31 times.

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: 52b835e Pull Request resolved: #46726

ejguan · 2020-10-28T19:56:02Z

Benchmark for two different implementations.
Test 1:

x = torch.randn(320, 480, requires_grad=True)
x_t = torch.randn(320, 480, requires_grad=True).t() # non-contiguous
...
y = x.repeat(16, 16)
...

Test 2:

x = torch.randn(160, 240, requires_grad=True)
x_t = torch.randn(160, 240, requires_grad=True).t() # non-contiguous
...
y = x.repeat(32, 32)
...

Test 3:

x = torch.randn(80, 120, requires_grad=True)
x_t = torch.randn(80, 120, requires_grad=True).t() # non-contiguous
...
y = x.repeat(64, 64)
...

Test 4:

x = torch.randn(3, 4, requires_grad=True)
x_t = torch.randn(3, 4, requires_grad=True).t() # non-contiguous
...
y = x.repeat(16, 32)
...

Average time for 1,000 runs:

	CPU	GPU	CPU NC	GPU NC
Before (1)	125.31 ms	4.46 ms	155.39 ms	4.72 ms
Multi-time (1)	127.16 ms	4.02 ms	165.37 ms	4.14 ms
One-time (1)	117.31 ms	3.97 ms	155.49 ms	4.22 ms
Before (2)	115.17 ms	4.57 ms	138.76 ms	4.98 ms
Multi-time (2)	122.30 ms	4.02 ms	146.28 ms	4.25 ms
One-time (2)	114.66 ms	3.98 ms	139.86 ms	4.42 ms
Before (3)	115.41 ms	4.86 ms	141.28 ms	6.57 ms
Multi-time (3)	122.91 ms	3.97 ms	146.72 ms	5.71 ms
One-time (3)	118.73 ms	3.99 ms	144.87 ms	5.66 ms
Before (4)	324.34 us	822.94 us	399.30 us	779.03 us
Multi-time (4)	129.49 us	189.01 us	120.85 us	203.14 us
One-time (4)	105.63 us	178.14 us	124.81 us	182.41 us

NC refers to non-contiguous.
Multi-time refers to the implementation with multi-time reshape and sum.
One-time refers to the implementation with one-time reshape and sum.

Conclusion:

Apparently, both of new implementations have better performance.
One-time strategy has slightly better performance, especially on CPU.

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: b7505d5 Pull Request resolved: #46726

ejguan · 2020-10-28T22:13:29Z

Benchmark for repeat/repeat_interleave/expand:

x = torch.rand((1, 1920)).requires_grad_()

repeated = x.repeat(1280, 1)
repeated = x.repeat_interleave(1280, dim=0)
repeated = x.expand(1280, 1920)

Average time for 10,000 runs:

	previous repeat	repeat	repeat_interleave	expand
CPU	6.71 ms	2.13 ms	1.37 ms	1.34 ms
GPU	15.72 ms	256.01 us	566.06 us	159.51 us

cc: @zou3519

doing a re-review

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: b86f13d Pull Request resolved: #46726

ejguan · 2020-10-29T19:59:00Z

I have some suggestions on reframing the algorithm in terms of input_shape and repeats instead of using grad_shape and repeats that may make the logic easier to implement. Aside from that, I think there are some UBs in the code as-is, let me know what you think

PR and the benchmark table are updated.

torch/csrc/autograd/FunctionsManual.cpp

zou3519

lgtm!

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: 5844ff3 Pull Request resolved: #46726

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: efcafb1 Pull Request resolved: #46726

facebook-github-bot · 2020-11-03T21:14:57Z

@ejguan merged this pull request in 4e6f244.

ejguan · 2020-11-04T20:23:09Z

torch/csrc/autograd/FunctionsManual.cpp

+  // grad_size          [4, 2,    3, 9, 4, 3, 5]
+  // sum_dims           [0,          3,    5]
+  grad = grad.reshape(grad_size);
+  grad = grad.sum(sum_dims);


When we do repeat 1 at all dimensions, sum_dims becomes empty and leads to sum over the whole grad rather than keep it there.

#29137 strikes again

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ghstack-source-id: be24a44 Pull Request resolved: #46726

torch/csrc/autograd/FunctionsManual.cpp

Fixes #43192 Differential Revision: [D24739840](https://our.internmc.facebook.com/intern/diff/D24739840) [ghstack-poisoned]

ghstack-source-id: 4c638c0 Pull Request resolved: #46726

zou3519 · 2020-11-09T19:45:28Z

If "import to phabricator" or ghimport fails, you might have to unlink this github PR from the original diff.

[WIP] Optimize backward for torch.repeat

8f7a9e4

[ghstack-poisoned]

ejguan requested review from albanD and apaszke as code owners October 22, 2020 17:58

ejguan added a commit that referenced this pull request Oct 22, 2020

[WIP] Optimize backward for torch.repeat

9b58ba1

ghstack-source-id: db0cc44 Pull Request resolved: #46726

ejguan linked an issue Oct 22, 2020 that may be closed by this pull request

backward of torch.repeat slower than for torch.repeat_interleave #43192

Closed

albanD reviewed Oct 22, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Show resolved Hide resolved

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

Update on "[WIP] Optimize backward for torch.repeat"

84045d5

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Oct 23, 2020

[WIP] Optimize backward for torch.repeat (Add comment)

657fbc8

ghstack-source-id: 0026638 Pull Request resolved: #46726

ejguan changed the title ~~[WIP] Optimize backward for torch.repeat~~ Optimize backward for torch.repeat Oct 26, 2020

ejguan requested a review from zou3519 October 26, 2020 15:20

zou3519 reviewed Oct 27, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Show resolved Hide resolved

zou3519 reviewed Oct 27, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

zou3519 reviewed Oct 27, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

zou3519 reviewed Oct 27, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

zou3519 reviewed Oct 27, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

zou3519 previously approved these changes Oct 27, 2020

View reviewed changes

Update on "Optimize backward for torch.repeat"

e9056de

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

Update on "Optimize backward for torch.repeat"

b54df33

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Oct 27, 2020

Optimize backward for torch.repeat (Use DimVector)

c058f15

ghstack-source-id: 52b835e Pull Request resolved: #46726

Update on "Optimize backward for torch.repeat"

8fe9bde

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Oct 28, 2020

Optimize backward for torch.repeat (Single-time Reshape&Sum)

619ff75

ghstack-source-id: b7505d5 Pull Request resolved: #46726

ejguan requested review from zou3519 and removed request for apaszke October 29, 2020 14:12

Update on "Optimize backward for torch.repeat"

38fe595

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Oct 29, 2020

Optimize backward for torch.repeat (Better Algo)

5078c7d

ghstack-source-id: b86f13d Pull Request resolved: #46726

zou3519 reviewed Oct 30, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

zou3519 approved these changes Oct 30, 2020

View reviewed changes

facebook-github-bot added the cla signed label Oct 30, 2020

Update on "Optimize backward for torch.repeat"

79934ff

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

Update on "Optimize backward for torch.repeat"

f63018f

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Oct 30, 2020

Optimize backward for torch.repeat (Improve Comment)

0add0ee

ghstack-source-id: 5844ff3 Pull Request resolved: #46726

ejguan mentioned this pull request Nov 2, 2020

[PT_BREAK] Optimize backward performance of torch.repeat failing for XLA and two other tests. pytorch/xla#2599

Closed

Update on "Optimize backward for torch.repeat"

f0380a5

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Nov 2, 2020

Optimize backward for torch.repeat (Fix CI)

aa85832

ghstack-source-id: efcafb1 Pull Request resolved: #46726

facebook-github-bot closed this in 4e6f244 Nov 3, 2020

facebook-github-bot added the Merged label Nov 3, 2020

ejguan reopened this Nov 4, 2020

ejguan commented Nov 4, 2020

View reviewed changes

Update on "Optimize backward for torch.repeat"

dfb5b71

Fixes #43192 Differential Revision: [D24481801](https://our.internmc.facebook.com/intern/diff/D24481801) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Nov 4, 2020

Optimize backward for torch.repeat (Fix empty sum_dims)

bc2ed75

ghstack-source-id: be24a44 Pull Request resolved: #46726

zou3519 reviewed Nov 4, 2020

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Show resolved Hide resolved

Update on "Optimize backward for torch.repeat"

565d4b2

Fixes #43192 Differential Revision: [D24739840](https://our.internmc.facebook.com/intern/diff/D24739840) [ghstack-poisoned]

ejguan added a commit that referenced this pull request Nov 5, 2020

Optimize backward for torch.repeat (Add empty sum_dims tests)

d17a352

ghstack-source-id: 4c638c0 Pull Request resolved: #46726

zou3519 self-requested a review November 9, 2020 18:49

zou3519 approved these changes Nov 9, 2020

View reviewed changes

facebook-github-bot closed this in 86bb413 Nov 9, 2020

facebook-github-bot deleted the gh/ejguan/5/head branch November 13, 2020 15:22

Optimize backward for torch.repeat #46726

Optimize backward for torch.repeat #46726

Uh oh!

Conversation

ejguan commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ejguan commented Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Oct 27, 2020

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

dr-ci bot commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

ejguan commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ejguan commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ejguan commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 3, 2020

Uh oh!

ejguan Nov 4, 2020

Choose a reason for hiding this comment

Uh oh!

zou3519 Nov 4, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zou3519 commented Nov 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ejguan commented Oct 22, 2020 •

edited

Loading

ejguan commented Oct 22, 2020 •

edited

Loading

dr-ci bot commented Oct 27, 2020 •

edited

Loading

ejguan commented Oct 28, 2020 •

edited

Loading

ejguan commented Oct 28, 2020 •

edited

Loading

ejguan commented Oct 29, 2020 •

edited

Loading