-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foreach op don't follow aten level debug asserts #93940
Comments
Setting high priority as this will prevent anyone with a debug build from using clip_grad_norm by default. |
the implementation is based off of apex's related part: pytorch/aten/src/ATen/native/cuda/ForeachReduceOp.cu Lines 180 to 184 in 98e1b3e
|
Ok, that's what I was expecting. We should remove the debug assert for this function then. |
I'm not aware of any other foreach with the "trick" :) |
Should we instead fix schema to return |
Well, technically the outputs are not views as they look at independent parts of the Tensor. And (unless you do shady stuff) you cannot change the other Tensors from one Tensor. So they don't have to be marked as views. |
Cool, @crcrpar can you please send a PR exempting for_each_norm from that assert in autograd_not_implemented_fallback.cpp? |
Updating the fallback kernel to special case for this one sounds ok to me: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/autograd_not_implemented_fallback.cpp#L189 |
I get nervous though because functionalization uses storage to determine views and these are definitely sharing storages. |
Well, these are views from the point of view of functionalization but not from the point of view of autograd? :) |
Here, the outputs are views into the intermediate tensor, so no one can modify the base, so for the purposes of functionalization it should be fine (unless someone does naughty stuff and reaches into a different tensor via |
Since #91846 has moved to make torch.nn.utils.clip_grad_norm_() use foreach ops, it throws the error message:
*** RuntimeError: t.storage().use_count() == 1 INTERNAL ASSERT FAILED at "caffe2/torch/csrc/autograd/autograd_not_implemented_fallback.cpp":189, please report a bug to PyTorch.
when PyTorch is built with debug asserts.
Given the assert, the error is that this foreach op should be returning a brand new Tensor but it is actually returning a Tensor that shares storage with at least another one.
cc @ezyang @gchanan @zou3519 @crcrpar @mcarilli @ngimel
The text was updated successfully, but these errors were encountered: