-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Move GradMode / AutoGradMode / NoGradGuard to ATen core #18573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note from offline discussion with @gchanan : we should look into how to avoid moving |
This would make a really good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey dokey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the note on the offline discussion says we don't actually need this, right?
1fe8022
to
b0f096d
Compare
#22473 requires that we check |
b0f096d
to
3f89042
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
f8739c0
to
d6adf6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen. Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning: Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example: ```python # torch/tensor.py with torch.no_grad(): ... new_tensor = self.new() # `at::GradMode` is false at this point ... ``` ```cpp // tools/autograd/templates/python_variable_methods.cpp static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs) { ... // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error) return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs)); ... } ``` For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes. Pull Request resolved: pytorch/pytorch#18573 Differential Revision: D16134413 Pulled By: yf225 fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a
requires_grad=true
tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since theGradMode
thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to moveGradMode
/AutoGradMode
/NoGradGuard
to ATen.Note that we intentionally don't merge
at::GradMode
andat::NonVariableTypeMode
, with the following reasoning:Semantically,
at::GradMode
andat::NonVariableTypeMode
actually mean different things:at::GradMode
controls whether a tensor should accumulate gradients, andat::NonVariableTypeMode
controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we don't want the tensor to accumulate gradients, but still want the Variable to be treated as a Variable. Here is one example:For the above reason, we cannot merge
at::GradMode
andat::NonVariableTypeMode
, as they have different purposes.