Make btriunpack work for high dimensional batches and faster than before #15286

vishwakftw · 2018-12-16T16:41:13Z

Changelog:

Optimize btriunpack by using torch.where instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list.

Test plan:

Added tests for btriunpack in test_torch.py (and a port to test_cuda.py)

This should help unblock testing in #14964 . I created a separate PR so that reviewing can be done efficiently.

Changelog: - Optimize btriunpack by using torch.where instead of indexing, inplace operations and avoiding costly permutations Test plan: - Added tests for btriunpack in test_torch.py (and a port to test_cuda.py)

vishwakftw · 2018-12-17T10:16:25Z

Failures are unrelated.

vishwakftw · 2018-12-19T03:17:19Z

@zou3519 is it possible to get someone to review this? Not high-pri, but some feedback would be helpful. Thanks.

ezyang · 2018-12-20T20:50:18Z

test/test_torch.py

+
+    @skipIfNoLapack
+    def test_btriunpack(self):
+        self._test_btriunpack(self, lambda t: t)


Ah, the pleasure of making your code general, but then not making use of the generality :>

I’m sorry, did I do something wrong here?

Nope, just observing a quirk of the existing tests.

torch/functional.py

ezyang · 2018-12-20T21:21:16Z

torch/functional.py

-                t = P[i, :, j].clone()
-                P[i, :, j] = P[i, :, k]
-                P[i, :, k] = t
+        P = torch.eye(sz, device=LU_data.device, dtype=LU_data.dtype).expand_as(LU_data).clone()


I guess we should probably add repeat_as at some point ;)

torch/functional.py

ezyang · 2018-12-20T21:27:54Z

If you want to be super cool, copy paste the old implementation into the test suite and do some "reference implementation versus new optimized implementation" tests.

vishwakftw · 2018-12-21T18:51:34Z

@ezyang I don't think that is necessary, because the reconstruction takes care of the correctness of the implementation.

However, if you insist, I don't mind adding the old-impl vs. new-impl tests as well.

ezyang · 2018-12-21T19:39:59Z

I don't think that is necessary, because the reconstruction takes care of the correctness of the implementation.

However, if you insist, I don't mind adding the old-impl vs. new-impl tests as well.

Nope, sounds good to me

vishwakftw · 2018-12-27T01:52:01Z

@ezyang @zou3519 is this good to go?

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ore (pytorch#15286) Summary: Changelog: - Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list. Pull Request resolved: pytorch#15286 Differential Revision: D13562038 Pulled By: soumith fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6

…ore (#15286) Summary: Changelog: - Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list. Pull Request resolved: #15286 Differential Revision: D13562038 Pulled By: soumith fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6

vishwakftw added 2 commits December 16, 2018 22:07

Make btriunpack work for high dimensional batches and faster than before

7fb8692

Changelog: - Optimize btriunpack by using torch.where instead of indexing, inplace operations and avoiding costly permutations Test plan: - Added tests for btriunpack in test_torch.py (and a port to test_cuda.py)

Fix lint

9d5f30e

ezyang reviewed Dec 20, 2018

View reviewed changes

torch/functional.py Show resolved Hide resolved

ezyang reviewed Dec 20, 2018

View reviewed changes

torch/functional.py Outdated Show resolved Hide resolved

Make p contiguous

be7d0f6

Fix device issue in index_select indices

03f1289

soumith approved these changes Dec 30, 2018

View reviewed changes

facebook-github-bot reviewed Dec 30, 2018

View reviewed changes

facebook-github-bot closed this in 7bb41e3 Dec 30, 2018

vishwakftw deleted the btriunpack-fast-many-dims branch December 31, 2018 02:46

ezyang added open source merged labels Jun 24, 2019

vishwakftw mentioned this pull request Aug 7, 2019

Improve the performance of btriunpack #1791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make btriunpack work for high dimensional batches and faster than before #15286

Make btriunpack work for high dimensional batches and faster than before #15286

vishwakftw commented Dec 16, 2018

vishwakftw commented Dec 17, 2018

vishwakftw commented Dec 19, 2018

ezyang Dec 20, 2018

vishwakftw Dec 20, 2018

ezyang Dec 20, 2018

ezyang Dec 20, 2018

ezyang commented Dec 20, 2018

vishwakftw commented Dec 21, 2018

ezyang commented Dec 21, 2018

vishwakftw commented Dec 27, 2018

facebook-github-bot left a comment

Navigation Menu

Make btriunpack work for high dimensional batches and faster than before #15286

Make btriunpack work for high dimensional batches and faster than before #15286

Conversation

vishwakftw commented Dec 16, 2018

vishwakftw commented Dec 17, 2018

vishwakftw commented Dec 19, 2018

ezyang Dec 20, 2018

Choose a reason for hiding this comment

vishwakftw Dec 20, 2018

Choose a reason for hiding this comment

ezyang Dec 20, 2018

Choose a reason for hiding this comment

ezyang Dec 20, 2018

Choose a reason for hiding this comment

ezyang commented Dec 20, 2018

vishwakftw commented Dec 21, 2018

ezyang commented Dec 21, 2018

vishwakftw commented Dec 27, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment