New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() #20496
Conversation
2f2470f
to
9769e08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty confused. This is the second time this PR has added a parameter to the function and always passed in the same value. What was the point of adding the parameter, then? Is this setting up some future improvement?
aten/src/ATen/OpaqueTensorImpl.h
Outdated
@@ -80,13 +80,14 @@ struct CAFFE2_API OpaqueTensorImpl : public TensorImpl { | |||
// 1. the AutogradMeta pointer, because it is unique for each Variable. | |||
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed | |||
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what | |||
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details. | |||
// the version counter should be for each new TensorImpl, by passing the correct version counter as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what "the version counter should be for each new TensorImpl" is supposed to mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know what this means. Can you just say something like "the version counter is set to the passed inversion_counter
.
torch/csrc/autograd/variable.cpp
Outdated
@@ -170,7 +170,7 @@ void Variable::Impl::set_data(const at::Tensor &new_data) { | |||
device_opt_ = new_data.device(); | |||
type_id_ = new_data.dispatch_type().type_id(); | |||
|
|||
auto new_data_impl_copy = new_data.getIntrusivePtr()->shallow_copy_and_detach(); | |||
auto new_data_impl_copy = new_data.getIntrusivePtr()->shallow_copy_and_detach(/*version_counter=*/0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand -- don't you want to pass in exactly what saved_version_counter
is below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it was an oversight -- fixed.
aten/src/ATen/OpaqueTensorImpl.h
Outdated
@@ -80,13 +80,14 @@ struct CAFFE2_API OpaqueTensorImpl : public TensorImpl { | |||
// 1. the AutogradMeta pointer, because it is unique for each Variable. | |||
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed | |||
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what | |||
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details. | |||
// the version counter should be for each new TensorImpl, by passing the correct version counter as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know what this means. Can you just say something like "the version counter is set to the passed inversion_counter
.
torch/csrc/autograd/variable.h
Outdated
@@ -608,7 +607,7 @@ inline Variable make_variable( | |||
!data.is_variable(), | |||
"Must not create a new variable from a variable, use its .data()"); | |||
if (data.defined()) { | |||
auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach(); | |||
auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach(/*version_counter=*/0); | |||
data_impl_copy->set_allow_tensor_metadata_change(allow_tensor_metadata_change); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there cases where we need to not set_allow_tensor_metadata_change
and then set it later? Or is it always "set" correctly at the shallow_copy_and_detach
step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always "set" correctly at the shallow_copy_and_detach
step except for this one case:
pytorch/torch/csrc/autograd/variable.h
Lines 587 to 601 in da3e74b
inline Variable make_variable_consuming( | |
at::Tensor data, | |
bool requires_grad = false, | |
bool allow_tensor_metadata_change = true) { | |
TORCH_CHECK( | |
!data.is_variable(), | |
"Must not create a new variable from a variable, use its .data()"); | |
if (data.defined()) { | |
AT_ASSERT(data.getIntrusivePtr().use_count() == 1); | |
data.unsafeGetTensorImpl()->set_allow_tensor_metadata_change(allow_tensor_metadata_change); | |
auto autograd_meta = c10::guts::make_unique<Variable::AutogradMeta>(); | |
return Variable(c10::make_intrusive<Variable::Impl>(std::move(data), std::move(autograd_meta), requires_grad)); | |
} | |
return Variable(); | |
} |
I think we should add allow_tensor_metadata_change
as another parameter into shallow_copy_and_detach()
, to zip up this API further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I added the allow_tensor_metadata_change
parameter to shallow_copy_and_detach()
.
@@ -99,6 +100,7 @@ c10::intrusive_ptr<TensorImpl> shallow_copy_and_detach() const override { | |||
impl->is_contiguous_ = is_contiguous_; | |||
impl->is_wrapped_number_ = is_wrapped_number_; | |||
impl->reserved_ = reserved_; | |||
impl->set_version_counter(version_counter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there now calls to set_version_counter that aren't contained in shallow_copy_and_detach
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still two of them:
- https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.cpp#L196. For this one, because
diff_view_meta->base_
can be set tobase
orbase.base()
depending on whetherbase
is already a view, we don't know the right value for the view's version counter when we callshallow_copy_and_detach
in https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L549, and we will have to wait until this step to set the version counter to the correct value. - https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/saved_variable.cpp#L87. For this one, since the
shallow_copy_and_detach
call is insidemake_variable()
, we can either allow passingversion_counter
as parameter tomake_variable()
, or treat this as a one-off case and allow callingset_version_counter
outside ofshallow_copy_and_detach
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it sounds like both of these cases should go away soon:
- We shouldn't need make_variable, because everything will be a variable. And if we have a similar API -- we can just pass the version counter, as you mentioned.
- I'm not sure why we can't just save the variable, instead of the tensor in the future -- maybe there is some infinite loop thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sounds great, yes we will be able to simplify this after we make everything a variable.
- I think currently we don't save the original variable because of https://bit.ly/2w367de (I wasn't able to use the Github link in this comment).
I added the task for looking into this in #13638.
35023e2
to
4549960
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…hallow_copy_and_detach() (#20496) Summary: Previously, the caller of `shallow_copy_and_detach()` is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of `shallow_copy_and_detach()` to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought. For similar reasons, we want to enforce users of `shallow_copy_and_detach()` to pass `allow_tensor_metadata_change` to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought. Pull Request resolved: pytorch/pytorch#20496 Differential Revision: D15363620 Pulled By: yf225 fbshipit-source-id: a65e74738b10452668d6dc644b43aad5b3d8c9e6
Previously, the caller of
shallow_copy_and_detach()
is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users ofshallow_copy_and_detach()
to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.For similar reasons, we want to enforce users of
shallow_copy_and_detach()
to passallow_tensor_metadata_change
to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.