Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() #20496

Closed

Conversation

yf225
Copy link
Contributor

@yf225 yf225 commented May 14, 2019

Previously, the caller of shallow_copy_and_detach() is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of shallow_copy_and_detach() to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.

For similar reasons, we want to enforce users of shallow_copy_and_detach() to pass allow_tensor_metadata_change to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.

@yf225 yf225 requested a review from gchanan May 14, 2019 18:46
@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen oncall: quantization Quantization support in PyTorch labels May 14, 2019
@yf225 yf225 mentioned this pull request May 14, 2019
22 tasks
@yf225 yf225 force-pushed the shallow_copy_version_counter_param branch from 2f2470f to 9769e08 Compare May 15, 2019 12:16
c10/core/TensorImpl.h Outdated Show resolved Hide resolved
aten/src/ATen/OpaqueTensorImpl.h Outdated Show resolved Hide resolved
torch/csrc/autograd/variable.cpp Outdated Show resolved Hide resolved
@yf225 yf225 changed the title Add ShallowCopyVersionCounterMode to shallow_copy_and_detach() Require passing version counter to shallow_copy_and_detach() May 15, 2019
Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confused. This is the second time this PR has added a parameter to the function and always passed in the same value. What was the point of adding the parameter, then? Is this setting up some future improvement?

@@ -80,13 +80,14 @@ struct CAFFE2_API OpaqueTensorImpl : public TensorImpl {
// 1. the AutogradMeta pointer, because it is unique for each Variable.
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details.
// the version counter should be for each new TensorImpl, by passing the correct version counter as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "the version counter should be for each new TensorImpl" is supposed to mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know what this means. Can you just say something like "the version counter is set to the passed inversion_counter.

@@ -170,7 +170,7 @@ void Variable::Impl::set_data(const at::Tensor &new_data) {
device_opt_ = new_data.device();
type_id_ = new_data.dispatch_type().type_id();

auto new_data_impl_copy = new_data.getIntrusivePtr()->shallow_copy_and_detach();
auto new_data_impl_copy = new_data.getIntrusivePtr()->shallow_copy_and_detach(/*version_counter=*/0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand -- don't you want to pass in exactly what saved_version_counter is below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it was an oversight -- fixed.

@@ -80,13 +80,14 @@ struct CAFFE2_API OpaqueTensorImpl : public TensorImpl {
// 1. the AutogradMeta pointer, because it is unique for each Variable.
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details.
// the version counter should be for each new TensorImpl, by passing the correct version counter as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know what this means. Can you just say something like "the version counter is set to the passed inversion_counter.

@@ -608,7 +607,7 @@ inline Variable make_variable(
!data.is_variable(),
"Must not create a new variable from a variable, use its .data()");
if (data.defined()) {
auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach();
auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach(/*version_counter=*/0);
data_impl_copy->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there cases where we need to not set_allow_tensor_metadata_change and then set it later? Or is it always "set" correctly at the shallow_copy_and_detach step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always "set" correctly at the shallow_copy_and_detach step except for this one case:

inline Variable make_variable_consuming(
at::Tensor data,
bool requires_grad = false,
bool allow_tensor_metadata_change = true) {
TORCH_CHECK(
!data.is_variable(),
"Must not create a new variable from a variable, use its .data()");
if (data.defined()) {
AT_ASSERT(data.getIntrusivePtr().use_count() == 1);
data.unsafeGetTensorImpl()->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
auto autograd_meta = c10::guts::make_unique<Variable::AutogradMeta>();
return Variable(c10::make_intrusive<Variable::Impl>(std::move(data), std::move(autograd_meta), requires_grad));
}
return Variable();
}

I think we should add allow_tensor_metadata_change as another parameter into shallow_copy_and_detach(), to zip up this API further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I added the allow_tensor_metadata_change parameter to shallow_copy_and_detach().

@@ -99,6 +100,7 @@ c10::intrusive_ptr<TensorImpl> shallow_copy_and_detach() const override {
impl->is_contiguous_ = is_contiguous_;
impl->is_wrapped_number_ = is_wrapped_number_;
impl->reserved_ = reserved_;
impl->set_version_counter(version_counter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there now calls to set_version_counter that aren't contained in shallow_copy_and_detach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still two of them:

  1. https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.cpp#L196. For this one, because diff_view_meta->base_ can be set to base or base.base() depending on whether base is already a view, we don't know the right value for the view's version counter when we call shallow_copy_and_detach in https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/variable.h#L549, and we will have to wait until this step to set the version counter to the correct value.
  2. https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/saved_variable.cpp#L87. For this one, since the shallow_copy_and_detach call is inside make_variable(), we can either allow passing version_counter as parameter to make_variable(), or treat this as a one-off case and allow calling set_version_counter outside of shallow_copy_and_detach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it sounds like both of these cases should go away soon:

  1. We shouldn't need make_variable, because everything will be a variable. And if we have a similar API -- we can just pass the version counter, as you mentioned.
  2. I'm not sure why we can't just save the variable, instead of the tensor in the future -- maybe there is some infinite loop thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sounds great, yes we will be able to simplify this after we make everything a variable.
  2. I think currently we don't save the original variable because of https://bit.ly/2w367de (I wasn't able to use the Github link in this comment).

I added the task for looking into this in #13638.

@yf225 yf225 force-pushed the shallow_copy_version_counter_param branch from 35023e2 to 4549960 Compare May 15, 2019 18:41
@yf225 yf225 changed the title Require passing version counter to shallow_copy_and_detach() Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach() May 15, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yf225 merged this pull request in 456b889.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 16, 2019
…hallow_copy_and_detach() (#20496)

Summary:
Previously, the caller of `shallow_copy_and_detach()` is responsible for deciding whether the shallow-copy should share the source TensorImpl's version counter, or have its own new version counter. However, since this decision is crucial for ensuring the correctness of the shallow-copy's version counter, we want to enforce users of `shallow_copy_and_detach()` to pass a version counter to the function call, so that they are required to make the decision at the time of API usage, not as an afterthought.

For similar reasons, we want to enforce users of `shallow_copy_and_detach()` to pass `allow_tensor_metadata_change` to the function call, so that they are required to decide "whether the TensorImpl shallow-copy should allow tensor metadata change" at the time of API usage, not as an afterthought.
Pull Request resolved: pytorch/pytorch#20496

Differential Revision: D15363620

Pulled By: yf225

fbshipit-source-id: a65e74738b10452668d6dc644b43aad5b3d8c9e6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen oncall: quantization Quantization support in PyTorch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants