Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable/Tensor Merge Proposal #13638

Closed
16 of 22 tasks
yf225 opened this issue Nov 6, 2018 · 15 comments
Closed
16 of 22 tasks

Variable/Tensor Merge Proposal #13638

yf225 opened this issue Nov 6, 2018 · 15 comments

Comments

@yf225
Copy link
Contributor

@yf225 yf225 commented Nov 6, 2018

馃殌 High-level changes:

  1. IMPORTANT: Both Variable and Variable::Impl are removed, and at::Tensor is always the tensor that's passed around in PyTorch, and it can record autograd history when its autograd metadata (AutogradMeta) is not null.
  2. IMPORTANT: Autograd-related function implementations in Variable will be moved to VariableType.
  3. Autograd metadata now lives in an AutogradMeta struct that TensorImpl has a pointer to, and the AutogradMeta is only populated when the at::Tensor requires gradient.
  4. We decide whether to dispatch to VariableType / non-VariableType functions using the at::AutoNonVariableTypeMode in appropriate places internally. (We only dispatch to VariableType functions if we need profiling/JIT-tracing/autograd)
  5. Common Tensor functions (e.g. numel()聽/聽sizes()聽/聽dim()) are de-virtualized in TensorImpl and have their runtime reduced by 43%-86%.
  6. tensor.is_variable() and options.is_variable() always return true, because every at::Tensor is a variable (and can record autograd history when its AutogradMeta is not null). (We keep options.is_variable(...) for backward compatibility, and raise warning if it's set to false.)
  7. API behavior change: changing shape/storage on tensor.data in Python or tensor.data() in C++ will no longer update tensor.

Pitch

Currently, the distinction between at::Tensor and Variable (subclass of at::Tensor that contains autograd metadata and functions) creates unnecessary cognitive overhead for PyTorch core development. We want to remove this distinction and make it possible to use at::Tensor everywhere in PyTorch. After merging Variable into at::Tensor, here are the common end-user APIs:

  • When C++ user wants to create a non-history-recording at::Tensor from another at::Tensor:
    Current API (unchanged):
auto t = torch::ones({2, 2}, torch::requires_grad()); // t is recording history
auto t_detached = t.detach() // t_detached is the non-history-recording version of t

When the user calls t.detach(), we do the following under the hood:

  1. We do the shallow copy of t's TensorImpl, which copies the storage pointer and all other TensorImpl fields (e.g. size / stride).
    • Note that subclasses of TensorImpl (e.g. SparseTensorImpl) need to know how to make a shallow copy of themselves, and we dispatch this operation to each TensorImpl subclass' own shallow_copy_and_detach() function (by making the shallow_copy_and_detach() function virtual in TensorImpl and overriding it in TensorImpl subclasses).
  2. We set the AutogradMeta pointer to NULL, to indicate that it doesn't need to record history.
  3. We return an at::Tensor that wraps the new TensorImpl.

  • When C++ user wants to enable/disable history-recording for an at::Tensor:
    Proposed API:
auto t = torch::ones({2, 2});  // t is not recording history (this already works)
t.requires_grad_(true);  // t is recording history now (new API)
t.requires_grad_(false); // t is not recording history anymore (new API)

When the user calls t.requires_grad_(true), we do the following under the hood:

  1. We initialize a struct called AutogradMeta, which stores autograd-specific fields (such as grad_/grad_fn_/grad_accumulator_).
  2. We assign the struct to the AutogradMeta pointer in t's TensorImpl.

When the user calls t.requires_grad_(false), we do the following under the hood:

  1. We set the AutogradMeta pointer in t's TensorImpl to NULL.

  • When C++ user wants to call non-Variable operations on an at::Tensor when dispatching through type()
    Proposed API:
{
  auto t_type = t.type();  // `t_type` is a Variable type if `t` contains AutogradMeta
}
{
  at::AutoNonVariableTypeMode grad_mode(false);  // thread-local guard (new API)
  auto non_var_type = t.type();  // "non_var_type" is a non-Variable type
}
{
  at::AutoNonVariableTypeMode grad_mode(true);  // thread-local guard (new API)
  auto var_type = t.type();  // "var_type" is a Variable type
}

Under the hood, type() checks whether the at::AutoNonVariableTypeMode thread-local guard is enabled when determining the type of the variable.


  • When C++ user wants to change content of an at::Tensor that has AutogradMeta, without affecting the tensor's grad_fn or version_counter_
    Proposed behavior:
auto t = torch::ones({2, 2});
t.requires_grad_(true);
AT_ASSERT(t.current_version() == 0);
t.data().add_(1);  // This is consistent with Python `.data` behavior: changing `.data` of a tensor in Python doesn't affect the tensor's `grad_fn` or `version_counter_`
AT_ASSERT(t.current_version() == 0);

Motivation

  • Overly Complex OOP design: Currently the distinction between Variable and Tensor is hard to grasp: Variable::Impl is a subclass of TensorImpl, but it also has an at::Tensor data member which internally wraps another TensorImpl. This co-existence of "is-a" and "has-a" relationship makes the code complicated and adds cognitive overhead. In particular, it's difficult to track which functions we have overridden in Variable::Impl, and which functions are applicable to Tensor vs. Variable (e.g. is_wrapped_number() is only valid on Tensor, not Variable) (for more context, also see note: We regret making Variable hold a Tensor). Ideally, we want to use the same tensor type everywhere in PyTorch code.

  • Unused data members in Variable::Impl take up cache/memory space: Since Variable::Impl is a subclass of TensorImpl, it contains all of the data members that a normal TensorImpl would have (such as sizes_ / strides_ / etc.). However, the Variable::Impl functions always call into the underlying at::Tensor and ignores the rest of the fields, which causes a lot of wasted cache/memory space.

  • Virtual functions are slow: We care about how much time it takes to execute common Tensor functions such as numel() / sizes() / dim(). Currently, these functions are virtual in TensorImpl, so that Variable::Impl (a subclass of TensorImpl) can override them and dispatch those calls to the Variable::Impl's underlying at::Tensor. Virtual function calls are slow because they involve an extra vtable lookup. Specifically, we did the following comparison on the most common Tensor functions (all timings are in ns):

Benchmark Time (no flush) Time (flush L1) Time (flush L1+L2) Time (flush L1+L2+L3)
Tensor.dim() - non-virtual 1.3 3.33 7.6 58
Variable.dim() - virtual 4.5 24.4 52 173.67
Runtime Savings -71.11111% -86.35246% -85.38462% -66.60333%
Tensor.numel() - non-virtual 22.6 63.89 109.22 294.5
Variable.numel() - virtual 80.33 133.1 192 810.9
Runtime Savings -71.86605% -51.9985% -43.11458% -63.68233%
Tensor.size(0) - non-virtual 30.4 60.1 100.44 384.3
Variable.size(0) - virtual 75.4 127.67 203.8 875.9
Runtime Savings -59.6817% -52.92551% -50.71639% -56.12513%
Tensor.sizes() - non-virtual 2 4.25 13.25 67.6
Variable.sizes() - virtual 5.2 28.44 62.1 254.78
Runtime Savings -61.53846% -85.05626% -78.66345% -73.46731%
Tensor.resize_({0}) no-op - non-virtual 23.11 86.44 105.44 332.33
Variable.resize_({0}) no-op - virtual 168.4 254.22 348.56 890.9
Runtime Savings -86.27672% -65.99795% -69.74983% -62.69727%
Tensor.resize_({64, 2048}) no-op - non-virtual 33.4 102.56 129.56 407.22
Variable.resize_({64, 2048}) no-op - virtual 193 278.1 364.9 936.6
Runtime Savings -82.6943% -63.12118% -64.49438% -56.52146%

Benchmarked commit: f000101
Benchmark script: https://github.com/yf225/benchmark/blob/tensor_functions/timing/cpp2/benchmarks/aten_overheads.cpp
Non-virtual code: master...yf225:nonvirtual_tensorimpl
Virtual code: master...yf225:virtual_tensorimpl

Based on our current implementation, the runtime difference for dim(), numel(), size(), sizes(), and no-op resize() comes from the virtual function call overhead and the at::Tensor data member indirection in Variable::Impl. If we de-virtualize those functions, we would be able to cut the runtime by 43%-86% on the most common Tensor functions.

Breaking changes

Note that this change will break the current API in the following way:

In the old world, whenever we want to create a Variable that shares the same data with another Variable, we simply do auto var_new = make_variable(var.data()) or auto var_new = var.detach(), and any shape / data / storage pointer changes to var_new will be reflected in var automatically, because internally they share the same underlying at::Tensor.

However, in the new world, there is no concept of the "underlying at::Tensor" of a Variable, since the Variable itself is the Tensor. When we want to create an at::Tensor that shares the same data with another at::Tensor, we can still call auto t_new = t.detach(), but in this case, only the tensor storage data is shared (via ref-counted pointer) between t_new and t, but not the tensor size/stride information (they are copied by value). In other words, changing anything (e.g. size / stride / storage_ptr ) in the detached Tensor (t_new) that are not bits inside tensor storage won't update the original Tensor (t), and we should no longer expect those data to be shared.

This has implications for Python call sites that do

tensor.data.in_place_operation_()

or

tensor_detached = tensor.detach()
tensor_detached.in_place_operation_()

If in_place_operation_() only updates the data inside the tensor (such as zeros_()), such operation will still work properly; if the in-place operation changes the size, stride or the storage pointer inside the TensorImpl (e.g. resize_ / resize_as_ / set_ / transpose_), such operation on tensor.data or tensor_detached will no longer update the tensor. We will address this inconsistency in the following ways:

  1. Add an allow_tensor_metadata_change_ flag to TensorImpl to disallow size/stride/storage_ptr changes from in-place operations such as resize_ / resize_as_ / set_ / transpose_, and set this flag to true when people call tensor.data in Python.
  2. Write text in the docs to actively discourage changing the shape or storage of tensor_detached and expecting tensor to also be updated.

Finished changes

  1. Add a flag to TensorImpl to disallow size/stride/storage_ptr changes from in-place operations such as resize_ / resize_as_ / set_ / transpose_, and set this flag to true when people call tensor.data in Python.
  2. Write text in the docs to actively discourage changing the shape or storage of tensor_detached and expecting tensor to also be updated.
  3. Move Variable::Impl data members into TensorImpl as AutogradMeta struct
  4. Change Variable::Impl functions to use data members in AutogradMeta struct
  5. Add shallow_copy() function to each subclass of TensorImpl
  6. Do shallow copy when the user calls make_variable(tensor) / variable.detach() (Reason: now that autograd metadata lives in TensorImpl, in order to create a new history for for the Variable returned from variable.detach() we not only need to create a new AutogradMeta struct, but we also need to create a new TensorImpl object that stores pointer to the new AutogradMeta struct (which we obtain by shallow-copying the original TensorImpl). Otherwise, changing history of the detached Variable will also change the history of the original Variable, which is not the correct behavior.)
  7. Add AutogradMetaInterface class, and make AutogradMeta a subclass of it, so that we can make autograd_meta_ a unique_ptr in TensorImpl
  1. Move set_requires_grad() / requires_grad() / grad() from Variable::Impl to AutogradMeta
  2. Move Variable::Impl functions such as backward() / rebase_history() / grad_accumulator() / grad_fn() out of Variable::Impl and into AutogradMeta.
  3. Note: we need to make these changes so that we can remove Variable::Impl class in the next PR.
  1. Add thread-local guard (at::AutoNonVariableTypeMode) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters are now Variables
  1. Make gesv_out return the original input tensor instead of a new tensor (currently by copying the result tensor into the original input tensor, because a true in-place gesv is more difficult to implement. NOTE: also open an issue for this).
  2. In VariableType.cpp, after each in-place function on the "unpacked" tensor, check pointer address equality for storage in the original input variable's TensorImpl (check this for all arguments in unpacked_args)
  1. Remove .type() calls as much as possible, to reduce the need of using the at::AutoNonVariableTypeMode guard
  1. Make JIT attributes t_ and ts_ store Variable instead of Tensor (and in t_ and ts_ use sites, don't wrap the tensor into Variable again) (global search make_variable( in jit/ to find places where we are doing double-wrapping for t_ and ts_ attributes)
  1. tril_ and triu_ should not change the input tensor's TensorImpl pointer
  1. Move pyobj_ to TensorImpl itself, because we always need to be able to convert to and from the Python representation.
  1. Move version_counter_ to storage or TensorImpl, because we may capture non-requires-grad variables inside an autograd function, and we need a working version counter in these cases.
  2. We should not share version counter in shallow_copy_and_detach(), because a pure Tensor doesn't have concept of version counter, and it's managed by autograd instead.
  3. We should preserve the API semantics of tensor.data in Python, and allow it as an escape route for in-place operations without bumping version counter.
  1. tensor.is_variable() should check whether the TensorImpl has AutogradMeta. is_variable_ should be removed.
  • PR: Fix version counter sharing in Variable.set_data(...) #20391

  • PR: Move at::NonVariableTypeMode to TensorImpl, and check it in TensorImpl is_variable() #20392

  • PR: Require passing version_counter and allow_tensor_metadata_change to shallow_copy_and_detach(): #20496

  • PR: Shallow-copy indices and values in sparse tensor constructor #20330

  • PR: Remove Variable::Impl (#17072)

  1. Remove the at::Tensor data member (data_) from Variable::Impl
  2. In Variable construction and in Variable.set_data(), copy all data from data.impl to the variable's TensorImpl.
  3. Make Variable.data() the same semantics as tensor.data in Python. Notice breakage in any Variable.data() call sites
  4. Remove the Variable::Impl class and the DifferentiableViewImpl class
  5. Remove mentions of Variable::Impl and DifferentiableViewImpl
  6. Fix comments in [Tensor versus Variable in C++], [We regret making Variable hold a Tensor], [ Autograd View Variables ]. Go through all comments in variable.h and variable.cpp and fix any inconsistency.
  7. NOTE: we don't need to add SparseVariableImpl that handles how to copy SparseTensorImpl, because SparseTensorImpl already implements the shallow_copy_and_detach() function that Variable factory functions can call.
  8. In places where we need to ensure the tensor is not requiring gradient, we should check !requires_grad() || at::NonVariableTypeMode::is_enabled(), instead of !requires_grad() || !at::GradMode::is_enabled(), because we don't want to move at::GradMode to ATen.

Changes remaining:

  • Make AutogradMeta optional, so that Variable and Tensor become the same. (Tracking issue: #23032)

  • Miscellaneous cleanup

  1. Remove unpack() in VariableType*.cpp.
  2. Clean up the unpack_args logic in gen_variable_type.py, since we are not doing unpack anymore.
  3. Fix comments for use_derived in gen_variable_type.py
  4. Remove requires_tensor: True in native_functions.yaml. Figure out how to fix _dimV, _dimS case (torch.randn(2, 3)._dimV() shouldn't hit that error)
  • TensorImpl de-virtualization (tracking issue: #22815)

  • Sparse invariant fix (tracking issue: #22778)

  • Remove tensor_data() API (@yf225 is working on it)

  • Python / C++ Tensor API parity (@yf225 is working on it)

  1. Any Python Tensor API should also work on C++ Tensor, without explicit casting to Variable
  • C++ API doc fix: (@yf225 is working on it)
  1. Remove https://pytorch.org/cppdocs/#aten section, and replace all at::Tensor with torch::Tensor, and remove/fix all mentions of ATen in cpp docs and tutorials.
facebook-github-bot added a commit that referenced this issue Dec 27, 2018
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`

Changes moved from #13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.

This is the 1st+2nd PR mentioned in #13638.
Pull Request resolved: #13827

Differential Revision: D13507173

Pulled By: yf225

fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
zdevito pushed a commit to zdevito/ATen that referenced this issue Dec 27, 2018
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`

Changes moved from pytorch/pytorch#13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.

This is the 1st+2nd PR mentioned in pytorch/pytorch#13638.
Pull Request resolved: pytorch/pytorch#13827

Differential Revision: D13507173

Pulled By: yf225

fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
facebook-github-bot added a commit that referenced this issue Dec 28, 2018
Summary:
In this PR, we are moving all functions away from `Variable::Impl`, in order to get rid of `Variable::Impl` (and the `data_` Tensor in it) in the next PR. Some of the functions (such as `set_requires_grad` / `requires_grad` / `grad`) will be living in `AutogradMeta` class, while others (such as `backward()` / `rebase_history()` / `grad_accumulator()` / `grad_fn()`) will be living in `Variable` class.

This is the 2nd PR mentioned in #13638.
Pull Request resolved: #15487

Differential Revision: D13553173

Pulled By: yf225

fbshipit-source-id: 691f9432d0cd0640af380c757f3e3a2f64f8851c
@apaszke
Copy link
Contributor

@apaszke apaszke commented Jan 5, 2019

also see note: We regret making Variable hold a Tensor

The link gives a 404 馃槥

@yf225
Copy link
Contributor Author

@yf225 yf225 commented Jan 5, 2019

@apaszke Fixed with a permalink to the correct file =)

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jan 16, 2019

@yf225: @zdevito, @smessmer and I were talking about at::AutoGradMode, and we think this is actually exactly the same thing as the existing at::NoGradGuard guard in C++. So might not even need a separate thread local for this one :)

@yf225
Copy link
Contributor Author

@yf225 yf225 commented Jan 17, 2019

@ezyang Currently torch::NoGradGuard only exists in libtorch, but for the base type dispatch to work we need to check the guard in ATen. I am planning to work on merging them as one of the last few PRs of the Variable/Tensor merge.

facebook-github-bot added a commit that referenced this issue Jan 24, 2019
Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: #13638.
Pull Request resolved: #15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f
zdevito pushed a commit to zdevito/ATen that referenced this issue Jan 24, 2019
Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: pytorch/pytorch#13638.
Pull Request resolved: pytorch/pytorch#15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f
@gqchen gqchen assigned gqchen and unassigned gqchen Jan 30, 2019
@gqchen gqchen pinned this issue Feb 4, 2019
@ezyang ezyang unpinned this issue Feb 4, 2019
facebook-github-bot added a commit that referenced this issue Feb 7, 2019
Summary:
Discussed with zdevito and we want to use Variable (with `set_requires_grad(false)`) instead of Tensor in all parts of JIT, to eliminate the distinction and the conceptual overhead when trying to figure out which one to use.

This also helps with the Variable/Tensor merge work tracked at #13638, which will make common functions (such as `numel()` / `sizes()` / `dim()`) on Variable much faster when finished.
Pull Request resolved: #16596

Differential Revision: D13979971

Pulled By: yf225

fbshipit-source-id: c69119deec5bce0c22809081115f1012fdbb7d5a
facebook-github-bot added a commit that referenced this issue Feb 11, 2019
Summary:
In VariableType.cpp, when a function modifies its input tensors, it should only change the input tensors' storage data in-place, and should never change the input tensors' storage pointers. This PR adds checks for this, and also fixes functions that fail this test.

This is part of the Variable/Tensor merge work (#13638).
Pull Request resolved: #16305

Differential Revision: D13897855

Pulled By: yf225

fbshipit-source-id: 0c4fc7eb530d30db88037b1f0981f6f8454d3b79
zdevito pushed a commit to zdevito/ATen that referenced this issue Feb 11, 2019
Summary:
In VariableType.cpp, when a function modifies its input tensors, it should only change the input tensors' storage data in-place, and should never change the input tensors' storage pointers. This PR adds checks for this, and also fixes functions that fail this test.

This is part of the Variable/Tensor merge work (pytorch/pytorch#13638).
Pull Request resolved: pytorch/pytorch#16305

Differential Revision: D13897855

Pulled By: yf225

fbshipit-source-id: 0c4fc7eb530d30db88037b1f0981f6f8454d3b79
@apaszke
Copy link
Contributor

@apaszke apaszke commented Mar 2, 2019

@gchanan for the last question I think it will create a tensor that requires grad, but no operators performed before the grad mode is enabled back again will be differentiated. Once this block ends it will look as any tensor that requires grad.

@gchanan
Copy link
Contributor

@gchanan gchanan commented Mar 6, 2019

@apaszke that sounds correct to me, thanks!

@yf225
Copy link
Contributor Author

@yf225 yf225 commented Mar 13, 2019

  1. For getType(TensorOptions), we check GradMode::is_enabled() to decide whether to choose Variable path. And we add NoGradGuard to all proper places to ensure we are dispatching to non-Variable path when appropriate.

Why do we need getType(TensorOptions)? From what I can tell, is_variable() today is always false in this case, so we can just always return the "non-variable" version. AFAIK this is only called in construction and there is no difference in constructing a Tensor vs a Variable. Is that correct?

is_variable() in getType(TensorOptions) is actually not always false. For example:

// At build/aten/src/ATen/Functions.h:3590
static inline Tensor empty(IntArrayRef size, const TensorOptions & options) {
    return at::getType(options).empty(size, options);
}

This function can be called with options coming from a Variable, and we expect it to dispatch to VariableType::empty(...). If we let getType(TensorOptions) always return the "non-variable" version, it will dispatch to the wrong path:

Before:

#1  0x00007fffe341a30d in at::native::empty_cpu (size=..., options=...) at ../aten/src/ATen/native/TensorFactories.cpp:93
#2  0x00007fffe3602d8a in at::CPULongType::empty (this=0x555556257130, size=..., options=...) at aten/src/ATen/CPULongType.cpp:2070
#3  0x00007fffe0f18904 in torch::autograd::VariableType::<lambda()>::operator()(void) const (__closure=0x7fffffffb900)
    at ../torch/csrc/autograd/generated/VariableType_4.cpp:9375
#4  0x00007fffe0f18ba2 in torch::autograd::VariableType::empty (this=0x55555624f0c0, size=..., options=...)
    at ../torch/csrc/autograd/generated/VariableType_4.cpp:9376

/\ 
|| at::getType(options) returns variable version, which causes dispatch to torch::autograd::VariableType::empty(...), and it's correct

#5  0x00007fffe36b8c2a in at::empty (size=..., options=...) at aten/src/ATen/Functions.h:3590
#6  0x00007fffe36b8fd1 in at::TypeDefault::copy (this=0x55555624f0c0, src=..., non_blocking=false, to_device=...) at aten/src/ATen/TypeDefault.cpp:37
#7  0x00007fffe3416b50 in at::native::to_impl (self=..., options=..., non_blocking=false) at ../aten/src/ATen/native/TensorConversions.cpp:24
#8  0x00007fffe34173ee in at::native::to (self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../aten/src/ATen/native/TensorConversions.cpp:69
#9  0x00007fffe37025b5 in at::TypeDefault::to (this=0x55555624ee70, self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at aten/src/ATen/TypeDefault.cpp:4358
#10 0x00007fffe0d56979 in torch::autograd::VariableType::to (this=0x55555624ee70, self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../torch/csrc/autograd/generated/VariableType_2.cpp:15946
#11 0x00007fffe83ce9f9 in at::Tensor::to (this=0x7fffdd9d7490, dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../aten/src/ATen/core/TensorMethods.h:788
#12 0x00007fffe8337e03 in torch::autograd::dispatch_to (self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:275
#13 0x00007fffe8338e4f in torch::autograd::THPVariable_to_type (self=0x7fffdd9d7480, scalarType=c10::ScalarType::Long)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:311
#14 0x00007fffe833941f in torch::autograd::THPVariable_long (self=0x7fffdd9d7480, args=0x0)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:339

If we let getType(TensorOptions) always return the "non-variable" version:

#0  __cxxabiv1::__cxa_throw (obj=0x5555569489a0, tinfo=0x7fffe8ce6d10 <typeinfo for c10::Error>, dest=0x7fffe31c824a <c10::Error::~Error()>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:80
#1  0x00007fffe341a2d2 in at::native::empty_cpu (size=..., options=...) at ../aten/src/ATen/native/TensorFactories.cpp:93
#2  0x00007fffe3602cc2 in at::CPULongType::empty (this=0x555556257130, size=..., options=...) at aten/src/ATen/CPULongType.cpp:2070

/\ 
|| at::getType(options) returns non-variable version, which causes dispatch to at::CPULongType::empty, and it's incorrect

#3  0x00007fffe36b8b62 in at::empty (size=..., options=...) at aten/src/ATen/Functions.h:3590
#4  0x00007fffe36b8efc in at::TypeDefault::copy (this=0x55555624f0c0, src=..., non_blocking=false, to_device=...) at aten/src/ATen/TypeDefault.cpp:36
#5  0x00007fffe3416a8e in at::native::to_impl (self=..., options=..., non_blocking=false) at ../aten/src/ATen/native/TensorConversions.cpp:24
#6  0x00007fffe341732c in at::native::to (self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../aten/src/ATen/native/TensorConversions.cpp:69
#7  0x00007fffe37024e1 in at::TypeDefault::to (this=0x55555624ee70, self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at aten/src/ATen/TypeDefault.cpp:4357
#8  0x00007fffe0d56979 in torch::autograd::VariableType::to (this=0x55555624ee70, self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../torch/csrc/autograd/generated/VariableType_2.cpp:15946
#9  0x00007fffe83ce9f9 in at::Tensor::to (this=0x7fffdd7209e8, dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../aten/src/ATen/core/TensorMethods.h:788
#10 0x00007fffe8337e03 in torch::autograd::dispatch_to (self=..., dtype=c10::ScalarType::Long, non_blocking=false, copy=false)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:275
#11 0x00007fffe8338e4f in torch::autograd::THPVariable_to_type (self=0x7fffdd7209d8, scalarType=c10::ScalarType::Long)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:311
#12 0x00007fffe833941f in torch::autograd::THPVariable_long (self=0x7fffdd7209d8, args=0x0)
    at ../torch/csrc/autograd/generated/python_variable_methods.cpp:339

I think dispatching based on "variable" vs. "non-variable" version of TensorOptions still matters after we merge the internals of Variable and Tensor and remove options.is_variable(), since the dispatch actually controls whether we should use the autograd history recording path or not. If we check at::NonVariableTypeMode::is_enabled() in getType(TensorOptions), we can control this dispatch by using at::AutoNonVariableTypeMode in the right places, which I think results in pretty clear semantics, and it's the same reason why we need to check !at::NonVariableTypeMode::is_enabled() in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Context.cpp#L118.

@yf225
Copy link
Contributor Author

@yf225 yf225 commented Mar 14, 2019

Update: Based on the offline discussion, we will want to check !at::NonVariableTypeMode::is_enabled() in getType(TensorOptions) for doing the correct dispatch based on variable/non-variable type, similar to our implementation of getType(TensorImpl).

@yf225 yf225 changed the title VariableImpl/TensorImpl Merge Proposal Variable/Tensor Merge Proposal Apr 10, 2019
facebook-github-bot added a commit that referenced this issue Apr 11, 2019
Summary:
According to #13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: #18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
zdevito pushed a commit to zdevito/ATen that referenced this issue Apr 11, 2019
Summary:
According to pytorch/pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: pytorch/pytorch#18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this issue May 6, 2019
Summary:
According to pytorch#13638 (comment), after the Variable/Tensor merge, we may capture variables without autograd metadata inside an autograd function, and we need a working version counter in these cases. This PR makes it possible by moving `version_counter_` out of autograd metadata and into TensorImpl, so that variables without autograd metadata still have version counters.
Pull Request resolved: pytorch#18223

Differential Revision: D14735123

Pulled By: yf225

fbshipit-source-id: 15f690311393ffd5a53522a226da82f5abb6c65b
zdevito pushed a commit to zdevito/ATen that referenced this issue May 24, 2019
Summary:
As part of the Variable/Tensor merge work: pytorch/pytorch#13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.

After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.

**Note that this PR is BC-breaking in the following use cases:**

**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.

**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
    # Change gradient to a sparse tensor
    params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))

grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad)  # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: pytorch/pytorch#17072

Differential Revision: D14075257

Pulled By: yf225

fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
@ezyang
Copy link
Contributor

@ezyang ezyang commented Dec 4, 2019

Most of the work here is done, the rest is cleanup, and making good on the devirtualization promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants