Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
4d65bce
Move version_counter_ to TensorImpl
Mar 20, 2019
cd378ec
Bump tensor version in non-Variable dispatch paths
Mar 26, 2019
bb95573
Merge branch 'master' of https://github.com/yf225/pytorch into move_v…
Mar 26, 2019
07c897a
fix bug
Mar 26, 2019
6a4c907
fix comment
Mar 26, 2019
c6d5609
Fix lint
Mar 26, 2019
7212564
preserve version in set_data
Mar 26, 2019
31581a6
Try to fix bug
Mar 27, 2019
919d522
clean up comment
Mar 27, 2019
8021735
clean up comment
Mar 27, 2019
6222ad0
DEBUG
Mar 27, 2019
03910ac
don't copy version_counter in shallow copy
Mar 27, 2019
a351fba
fix test_nn
Mar 27, 2019
f6e416e
fix test_einsum
Mar 27, 2019
9ce6b5a
nicer placeholder
Mar 27, 2019
bec5c06
Add version increment to MSNPUType.cpp and XLAType.cpp
Mar 27, 2019
b020120
fix cuda tests
Mar 27, 2019
2f466db
Merge branch 'master' of https://github.com/yf225/pytorch into move_v…
Mar 28, 2019
1c9151a
add test
Mar 28, 2019
27917a4
add guard for disabling version update; copy version counter in shall…
Apr 2, 2019
03e6486
clean up
Apr 2, 2019
9c54ad3
remove version API from Tensor
Apr 2, 2019
fdb5680
better docs
Apr 2, 2019
8750f5e
clean test
Apr 2, 2019
fb8f996
try to also share version in set_data()
Apr 2, 2019
35c0806
clean up
Apr 2, 2019
0128a17
make TensorImpl version counter methods non-virtual
Apr 2, 2019
c93251a
nit
Apr 2, 2019
92603b4
nit
Apr 2, 2019
9a568b7
better comments for get_numerical_jacobian
Apr 2, 2019
379da62
better comment
Apr 2, 2019
c65a081
better comment
Apr 2, 2019
2e94856
nit
Apr 2, 2019
1204b19
fix comment
Apr 2, 2019
940ab21
Merge branch 'master' of https://github.com/yf225/pytorch into move_v…
Apr 3, 2019
bbd8104
Don't expose version update mode to user
Apr 3, 2019
21cb700
lint
Apr 3, 2019
0e1f9f9
fix
Apr 3, 2019
eb8f1d5
set data
Apr 3, 2019
939349a
extra guards
Apr 3, 2019
60f520e
check is_variable() when doing shallow_copy_and_detach
Apr 3, 2019
9cadb1e
try to fix diff view meta
Apr 3, 2019
9778abd
Revert "set data"
Apr 3, 2019
e0f8e9d
DEBUG comment
Apr 3, 2019
7e19358
Revert "extra guards"
Apr 3, 2019
5ee8185
add torch._autograd_internal.no_version_update()
Apr 3, 2019
1983ca5
try to fix
Apr 3, 2019
5504057
better comment
Apr 3, 2019
75ee500
fix lint
Apr 3, 2019
d08f490
add _ to version_update_enabled functions
Apr 4, 2019
34c6e1e
don't bump version in TH functions
Apr 4, 2019
107b58d
try to check old_version + 1 instead of greater than old_version
Apr 4, 2019
f089c6e
better checks
Apr 4, 2019
54f79c4
don't do bump version in non-Variable functions
Apr 4, 2019
68814a3
Merge branch 'master' into move_version_counter_
yf225 Apr 8, 2019
1ba87d5
Remove is_variable() check in SparseTensorImpl shallow_copy_and_detach()
Apr 9, 2019
e573d35
Merge branch 'master' of https://github.com/yf225/pytorch into move_v…
Apr 9, 2019
6cdf7f4
improve comments
Apr 9, 2019
f424cf1
Improve comment
Apr 9, 2019
46e22e4
Improve comment
Apr 10, 2019
49b5104
improve comments
Apr 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions aten/src/ATen/OpaqueTensorImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,12 @@ struct CAFFE2_API OpaqueTensorImpl : public TensorImpl {
AT_ERROR("opaque tensors do not have storage");
}

// NOTE: `shallow_copy_and_detach()` does not copy the AutogradMeta pointer
// because it is unique for each Variable.
// NOTE: `shallow_copy_and_detach()` does not copy the following TensorImpl fields:
// 1. the AutogradMeta pointer, because it is unique for each Variable.
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details.
//
// NOTE: We don't set `allow_tensor_metadata_change_` to false here, because there are call sites
// to this function that need to change the shallow copy's size or storage afterwards, and setting
// `allow_tensor_metadata_change_` to false would prevent those changes from happening and is
Expand Down
8 changes: 6 additions & 2 deletions aten/src/ATen/SparseTensorImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,12 @@ struct CAFFE2_API SparseTensorImpl : public TensorImpl {
// make it happen
void set_indices_and_values_unsafe(const Tensor& indices, const Tensor& values);

// NOTE: `shallow_copy_and_detach()` does not copy the AutogradMeta pointer
// because it is unique for each Variable.
// NOTE: `shallow_copy_and_detach()` does not copy the following TensorImpl fields:
// 1. the AutogradMeta pointer, because it is unique for each Variable.
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details.
//
// NOTE: We don't set `allow_tensor_metadata_change_` to false here, because there are call sites
// to this function that need to change the shallow copy's size or storage afterwards, and setting
// `allow_tensor_metadata_change_` to false would prevent those changes from happening and is
Expand Down
83 changes: 80 additions & 3 deletions c10/core/TensorImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,61 @@ struct C10_API AutogradMetaInterface {
virtual ~AutogradMetaInterface();
};

// NOTE [ Version Counter Sharing ]
//
// Every Tensor has a version counter. Version counters are incremented whenever the
// data or size of a tensor changes through in-place Variable operations. Version
// counters are used to detect modifications to saved variables which would result in
// incorrect gradient calculations. Version counters may be shared between Variables:
//
// 1. A view shares the version counter of the base Variable,
// 2. `x.detach()` shares the version counter of `x`,
// 3. Unpacked saved variables share the version counter of the source.
//
// Version counters are not shared in these scenarios:
//
// 1. When we replace a `Variable`'s underlying `Tensor` by calling `set_data(...)`,
// 2. `x.data` does not share the version counter of `x`. (See discussion at
// https://github.com/pytorch/pytorch/issues/5396)
//
// Question: Why do we put the version counter in TensorImpl instead of AutogradMeta?
//
// Answer: After the Variable/Tensor merge, a tensor will not have AutogradMeta when
// its `requires_grad_` is false, but when we use this tensor in the forward pass of
// a function that requires saving this tensor for backward, we need to keep track of
// this tensor's version to make sure it's always valid in the autograd graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note will be difficult for future readers to understand. The reason it is difficult to understand is it is making a comment relative to a change, but in the future, no one will see the change, they will see the code as is! Sometimes knowing the historical context is helpful to know why code is setup this way, but in this case, the issue should be explained from first principles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the "After the Variable/Tensor merge" phrase after the merge is completed, change the future tense to present tense and rework the comment to make it easier to understand.

//
// To achieve this goal, we put the version counter in TensorImpl instead of AutogradMeta,
// and have it always be available. This allows us to have the optimization of not
// carrying AutogradMeta when a tensor doesn't require gradient.
//
// A hypothetical alternative way to achieve this goal is to initialize AutogradMeta and
// create the version counter for the non-requires-grad tensor only when it's saved for
// backward. However, since saving a tensor for backward happens in the forward pass, and
// our invariant is that forward pass needs to be thread-safe, lazy-initializing AutogradMeta
// when saving a tensor can introduce race conditions when we are running the forward
// pass in multi-thread scenarios, thus making the forward pass not thread-safe anymore,
// which breaks the invariant.
struct C10_API VariableVersion {
public:
// NOTE: As of C++11 and 14, default-constructing a std::atomic variable
// leaves it in a persistently undefined state. See
// https://cplusplus.github.io/LWG/issue2334.
VariableVersion(uint32_t version = 0)
: version_block_(std::make_shared<std::atomic<uint32_t>>(version)) {}

void bump() noexcept {
version_block_->fetch_add(1);
}

uint32_t current_version() const noexcept {
return version_block_->load();
}

private:
std::shared_ptr<std::atomic<uint32_t>> version_block_;
};

/**
* The low-level representation of a tensor, which contains a pointer
* to a storage (which contains the actual data) and metadata (e.g., sizes and
Expand Down Expand Up @@ -845,13 +900,18 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
return std::move(autograd_meta_);
}

// NOTE: `shallow_copy_and_detach()` does not copy the AutogradMeta pointer
// because it is unique for each Variable.
// NOTE: `shallow_copy_and_detach()` does not copy the following TensorImpl fields:
// 1. the AutogradMeta pointer, because it is unique for each Variable.
// 2. the version counter, because although it lives in TensorImpl, the version counter is managed
// by autograd, and the call sites of `shallow_copy_and_detach()` (from autograd) should decide what
// the version counter should be for each new TensorImpl. See NOTE [ Version Counter Sharing ] for details.
//
// NOTE: We don't set `allow_tensor_metadata_change_` to false here, because there are call sites
// to this function that need to change the shallow copy's size or storage afterwards, and setting
// `allow_tensor_metadata_change_` to false would prevent those changes from happening and is
// undesirable.
virtual c10::intrusive_ptr<TensorImpl> shallow_copy_and_detach() const {
AT_ASSERT(!is_variable()); // TODO: remove this when Variable and Tensor are merged
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

auto impl = c10::make_intrusive<TensorImpl>(Storage(storage()), type_id(), is_variable());
impl->set_sizes_and_strides(sizes(), strides());
impl->storage_offset_ = storage_offset_;
Expand All @@ -862,6 +922,19 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
return impl;
}

void set_version_counter(
const c10::VariableVersion& version_counter) noexcept {
version_counter_ = version_counter;
}

const c10::VariableVersion& version_counter() const noexcept {
return version_counter_;
}

void bump_version() noexcept {
version_counter_.bump();
}

inline void set_pyobj(PyObject* pyobj) noexcept {
pyobj_ = pyobj;
}
Expand Down Expand Up @@ -1384,6 +1457,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
// at a time).
std::unique_ptr<c10::AutogradMetaInterface> autograd_meta_ = nullptr;

c10::VariableVersion version_counter_;

PyObject* pyobj_ = nullptr; // weak reference

// We could save a word or two by combining the SmallVector structs,
Expand Down Expand Up @@ -1471,6 +1546,8 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
// weak refcount
// storage pointer
// autograd metadata pointer
// version counter (word 0)
// version counter (word 1)
// PyObject pointer
// sizes SmallVector (begin)
// sizes SmallVector (end)
Expand All @@ -1495,7 +1572,7 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
// miscellaneous bitfield
//
static_assert(sizeof(void*) != sizeof(int64_t) || // if 64-bit...
sizeof(TensorImpl) == sizeof(int64_t) * 27,
sizeof(TensorImpl) == sizeof(int64_t) * 29,
"You changed the size of TensorImpl on 64-bit arch."
"See Note [TensorImpl size constraints] on how to proceed.");

Expand Down
3 changes: 1 addition & 2 deletions torch/csrc/autograd/saved_variable.h
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#pragma once

#include <torch/csrc/WindowsTorchApiMacro.h>
#include <torch/csrc/autograd/variable_version.h>

#include <ATen/ATen.h>

Expand Down Expand Up @@ -47,7 +46,7 @@ class TORCH_API SavedVariable {
// passed in to the unpack function when reconstructing the Variable.
std::shared_ptr<Function> grad_fn_;
std::weak_ptr<Function> grad_accumulator_;
VariableVersion version_counter_;
c10::VariableVersion version_counter_;

uint32_t saved_version_ = 0;
uint32_t output_nr_ = 0;
Expand Down
16 changes: 10 additions & 6 deletions torch/csrc/autograd/variable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
#include <torch/csrc/autograd/functions/tensor.h>
#include <torch/csrc/autograd/generated/Functions.h>
#include <torch/csrc/autograd/generated/VariableType.h>
#include <torch/csrc/autograd/variable_version.h>

#include <ATen/ATen.h>
#include <c10/util/Exception.h>
Expand Down Expand Up @@ -172,8 +171,13 @@ void Variable::Impl::set_data(const at::Tensor &new_data) {
type_id_ = new_data.dispatch_type().type_id();
is_variable_ = true;

auto new_data_copy = at::Tensor(new_data.getIntrusivePtr()->shallow_copy_and_detach());
data_ = std::move(new_data_copy);
auto new_data_impl_copy = new_data.getIntrusivePtr()->shallow_copy_and_detach();
// Version counter is not shared when we replace a `Variable`'s underlying `Tensor`
// by calling `set_data(...)`. The original version of the `Variable` is always preserved.
// See NOTE [ Version Counter Sharing ] for details.
auto saved_version_ = data_.unsafeGetTensorImpl()->version_counter().current_version();
new_data_impl_copy->set_version_counter(saved_version_);
data_ = std::move(at::Tensor(new_data_impl_copy));
}

void Variable::Impl::release_resources() {
Expand All @@ -190,8 +194,8 @@ Variable::DifferentiableViewImpl::DifferentiableViewImpl(Variable base, at::Tens
diff_view_meta->base_ = diff_view_meta->base_.base();
}
diff_view_meta->is_view_ = true;
diff_view_meta->version_counter_ = diff_view_meta->base_.version_counter();
diff_view_meta->attr_version = diff_view_meta->version_counter_.current_version();
data_.unsafeGetTensorImpl()->set_version_counter(diff_view_meta->base_.version_counter());
diff_view_meta->attr_version = data_.unsafeGetTensorImpl()->version_counter().current_version();
}

const std::shared_ptr<Function>& Variable::grad_fn() const {
Expand All @@ -201,7 +205,7 @@ const std::shared_ptr<Function>& Variable::grad_fn() const {
if (!diff_view_meta->grad_fn_ && !diff_view_meta->base_.requires_grad()) {
return diff_view_meta->grad_fn_;
}
auto current_version = diff_view_meta->version_counter_.current_version();
auto current_version = this->current_version();
if (diff_view_meta->attr_version != current_version) {
AT_ASSERT(diff_view_meta->output_nr_ == 0);
auto fn = std::make_shared<generated::AsStridedBackward>();
Expand Down
18 changes: 8 additions & 10 deletions torch/csrc/autograd/variable.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
#include <torch/csrc/WindowsTorchApiMacro.h>
#include <torch/csrc/autograd/edge.h>
#include <torch/csrc/autograd/function_hook.h>
#include <torch/csrc/autograd/variable_version.h>

#include <ATen/ATen.h>
#include <c10/util/Exception.h>
Expand Down Expand Up @@ -257,10 +256,10 @@ struct TORCH_API Variable : public at::Tensor {

/// Increments the version count of this `Variable`.
void bump_version() noexcept;
void set_version_counter(const VariableVersion& version_counter) noexcept;
void set_version_counter(const c10::VariableVersion& version_counter) noexcept;

/// Retrieves this `Variable`s version counter.
const VariableVersion& version_counter() const noexcept;
const c10::VariableVersion& version_counter() const noexcept;

/// Retrieves the current value of the `Variable`'s version counter.
/// Equivalent to calling `version_counter().current_version()`.
Expand Down Expand Up @@ -335,7 +334,6 @@ struct TORCH_API Variable::AutogradMeta : public c10::AutogradMetaInterface {
std::shared_ptr<Function> grad_fn_;
std::weak_ptr<Function> grad_accumulator_;

VariableVersion version_counter_;
std::vector<std::shared_ptr<FunctionPreHook>> hooks_;

// Only meaningful on leaf variables (must be false otherwise)
Expand Down Expand Up @@ -692,20 +690,20 @@ inline bool Variable::is_leaf() const noexcept {
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

inline void Variable::set_version_counter(
const VariableVersion& version_counter) noexcept {
get_autograd_meta()->version_counter_ = version_counter;
const c10::VariableVersion& version_counter) noexcept {
data().unsafeGetTensorImpl()->set_version_counter(version_counter);
}

inline void Variable::bump_version() noexcept {
get_autograd_meta()->version_counter_.bump();
data().unsafeGetTensorImpl()->bump_version();
}

inline uint32_t Variable::current_version() const noexcept {
return get_autograd_meta()->version_counter_.current_version();
return data().unsafeGetTensorImpl()->version_counter().current_version();
}

inline const VariableVersion& Variable::version_counter() const noexcept {
return get_autograd_meta()->version_counter_;
inline const c10::VariableVersion& Variable::version_counter() const noexcept {
return data().unsafeGetTensorImpl()->version_counter();
}

// Hooks
Expand Down
38 changes: 0 additions & 38 deletions torch/csrc/autograd/variable_version.h

This file was deleted.