Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradgradcheck for torch.repeat and torch.tile is outrageously slow #49962

Closed
mruberry opened this issue Dec 30, 2020 · 2 comments
Closed

gradgradcheck for torch.repeat and torch.tile is outrageously slow #49962

mruberry opened this issue Dec 30, 2020 · 2 comments
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@mruberry
Copy link
Collaborator

mruberry commented Dec 30, 2020

torch.repeat and torch.tile (which is implemented using torch.repeat) are relatively fast compared to NumPy's torch.tile, but attempting to gradgradcheck them is incredible slow in some cases. For example:

x = torch.randn(5, 5, 5, requires_grad=True, dtype=torch.double)

def partial(x):
    return x.repeat(5, 5, 5, 5)

gradgradcheck(partial, x)

takes 77.93s on my devfair! While not an apples to apples comparison, computing the function's Hessian is relatively fast:

x = torch.randn(5, 5, 5, requires_grad=True, dtype=torch.double)

def partial(x):
    return x.repeat(5, 5, 5, 5).sum()

torch.autograd.functional.hessian(partial, x)

takes only .07s to run. That is, it is 1000x faster than the gradgradcheck.

gradgradcheck being so slow appears to have a real impact. See those two tests on ASAN:

Dec 30 02:18:32   test_tile_more_reps_dims_cpu (__main__.TestAutogradDeviceTypeCPU) ... ok (1023.734s)
Dec 30 02:20:21   test_tile_same_reps_dims_cpu (__main__.TestAutogradDeviceTypeCPU) ... ok (109.164s)

or on the clang build:

Dec 30 01:51:11   test_tile_more_reps_dims_cpu (__main__.TestAutogradDeviceTypeCPU) ... ok (115.829s)
Dec 30 01:51:24   test_tile_same_reps_dims_cpu (__main__.TestAutogradDeviceTypeCPU) ... ok (13.091s)

cc @ezyang @albanD @zou3519 @gqchen @pearu @nikitaved @soulitzer @mruberry @VitalyFedyunin @walterddr

@mruberry mruberry added module: autograd Related to torch.autograd, and the autograd engine in general module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 30, 2020
@albanD
Copy link
Collaborator

albanD commented Dec 30, 2020

Well, the .sum() makes all the difference.
You can reduce the size of the input/output to reduce this.

@anjali411
Copy link
Contributor

reduced the input size for tile tests and they don't timeout anymore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants