Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.autograd.grad slow on tpu #2702

Closed
matthewchung74 opened this issue Dec 22, 2020 · 4 comments
Closed

torch.autograd.grad slow on tpu #2702

matthewchung74 opened this issue Dec 22, 2020 · 4 comments

Comments

@matthewchung74
Copy link

I am using grad for calculating PL lengths for stylegan, but besides checking tensor shapes before calling grad, I'm not sure if I am doing something wrong.

on a gpu, for my grad function, it's taking about .02 seconds on calls after the first
colab (calc_pl_lengths section):
https://colab.research.google.com/drive/1Pg-kKt6qhXz39PjiHDHTRhq5ViCcttra?usp=sharing

where as on a tpu, it's taking about 9 seconds on calls after the first
colab (calc_pl_lengths section):
https://colab.research.google.com/drive/1MEyQ2KMDn1IjxJ2FLHcEgmySBlWNuR5q?usp=sharing

@taylanbil
Copy link
Collaborator

There's aten::upsample_bilinear2d and aten::upsample_bilinear2d_backward in the metrics report. These ops need to be lowered to xla per https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#understand-the-metrics-report/

Are you using nightly?

@matthewchung74
Copy link
Author

@taylanbil no I'm not. I'm installing using
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp36-cp36m-linux_x86_64.whl

I'm going to test now using nightly.

@matthewchung74
Copy link
Author

using the nightly, things are slower, about 62 seconds. when looking at the metrics report I see

Counter: aten::conj_out
  Value: 70
Counter: aten::upsample_bilinear2d
  Value: 60
Counter: aten::upsample_bilinear2d_backward
  Value: 60
Counter: aten::view_as_real

to request those calls be lowered to xla, do I need to file a separate issue?

@taylanbil
Copy link
Collaborator

urgh, you can ignore the conj and view as real, those are tracked in #2688

but yeah it looks like the upsample ops need to be lowered. you can close this issue and open a separate one about those lowerings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants