Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Check why shallow_copy_from is called on a wrong object #2

Open
nunoplopes opened this issue Jun 29, 2021 · 1 comment
Open

Check why shallow_copy_from is called on a wrong object #2

nunoplopes opened this issue Jun 29, 2021 · 1 comment

Comments

@nunoplopes
Copy link
Owner

It required this patch:

diff --git a/c10/core/TensorImpl.cpp b/c10/core/TensorImpl.cpp
index a0c7673641..3c027a4a17 100644
--- a/c10/core/TensorImpl.cpp
+++ b/c10/core/TensorImpl.cpp
@@ -480,9 +480,12 @@ void TensorImpl::copy_tensor_metadata_except_version_counter(
     const TensorImpl* src_impl,
     TensorImpl* dest_impl,
     bool allow_tensor_metadata_change) {
-  dest_impl->storage_ = src_impl->storage_;
-  dest_impl->sizes_and_strides_ = src_impl->sizes_and_strides_;
-  dest_impl->storage_offset_ = src_impl->storage_offset_;
+  dest_impl->storage_ = src_impl->storage();
+  dest_impl->sizes_and_strides_.set_sizes(src_impl->sizes());
+  auto strides = src_impl->strides();
+  memcpy(dest_impl->sizes_and_strides_.strides_data(), strides.begin(),
+         sizeof(int64_t) * strides.size());
+  dest_impl->storage_offset_ = src_impl->storage_offset();
   dest_impl->data_type_ = src_impl->data_type_;
   dest_impl->device_opt_ = src_impl->device_opt_;
   dest_impl->key_set_ = src_impl->key_set_;

But ideally it wouldn't be needed, as shallow_copy_from would be called between Torchy objects. So why pickle used different objects?

This is the backtrace, while executing a TorchVision model:

(gdb) bt
#0  c10::TensorImpl::shallow_copy_from (this=0x555558e79400, impl=...)
    at ../c10/core/TensorImpl.h:1270
#1  0x00007fffbc5610d6 in torch::autograd::VariableHooks::set_data (
    this=<optimized out>, self=..., new_data=...)
    at ../torch/csrc/autograd/variable.cpp:440
#2  0x00007fffc2eeac63 in THPVariable_set_data (self=0x7fff7d5b7840,
    data=0x7fff7d5b4980, unused=<optimized out>)
    at ../torch/csrc/autograd/python_variable.cpp:316
#3  0x00005555556cf597 in _PyObject_GenericSetAttrWithDict ()
    at /tmp/build/80754af9/python_1599203911753/work/Objects/object.c:1366
#4  0x00005555556cf687 in PyObject_GenericSetAttr (value=0x7fff7d5b4980,
    name=<optimized out>, obj=0x7fff7d5b7840)
    at /tmp/build/80754af9/python_1599203911753/work/Objects/object.c:1416
#5  PyObject_SetAttr ()
    at /tmp/build/80754af9/python_1599203911753/work/Objects/object.c:1045
#6  0x00005555557156b7 in _PyEval_EvalFrameDefault ()
    at /tmp/build/80754af9/python_1599203911753/work/Python/ceval.c:2372
#7  0x00005555556df86b in function_code_fastcall (globals=<optimized out>,
    nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:283
#8  _PyFunction_Vectorcall.localalias.355 ()
    at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:410
#9  0x00005555556dfe79 in _PyObject_Vectorcall (kwnames=0x0, nargsf=2,
    args=0x7fffffffb610, callable=0x7fff848710d0)
    at /tmp/build/80754af9/python_1599203911753/work/Include/cpython/abstract.h:127
#10 method_vectorcall ()
    at /tmp/build/80754af9/python_1599203911753/work/Objects/classobject.c:89
#11 0x00005555555d22d6 in _PyObject_Vectorcall (kwnames=0x0, nargsf=1,
    args=0x7fffffffb6b0, callable=0x7fff7f353a40)
    at /tmp/build/80754af9/python_1599203911753/work/Include/cpython/abstract.h:127
#12 _PyObject_FastCall ()
    at /tmp/build/80754af9/python_1599203911753/work/Include/cpython/abstract.h:147
#13 object_vacall (base=<optimized out>, callable=0x7fff7f353a40,
    vargs=<optimized out>)
    at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:1186
#14 0x0000555555691e1e in PyObject_CallFunctionObjArgs (
    callable=<optimized out>)
    at /tmp/build/80754af9/python_1599203911753/work/Objects/call.c:1259
#15 0x00007fff84762615 in _Pickle_FastCall (obj=0x7fff7d5b0770,
    func=0x7fff7f353a40)
    at /usr/local/src/conda/python-3.8.5/Modules/_pickle.c:362
#16 load_build.isra.38 ()
    at /usr/local/src/conda/python-3.8.5/Modules/_pickle.c:6707
#17 load () at /usr/local/src/conda/python-3.8.5/Modules/_pickle.c:6961
#18 0x00005555556c3e6a in method_vectorcall_NOARGS ()
@nunoplopes
Copy link
Owner Author

I've investigated this issue and there's no way around it without change PyTorch itself.
A python program may do non_torchy_tensor.set_(tochy_tensor). If the torchy tensor isn't materialized, we won't store non_torchy_tensor as a shared tensor on the trace. So when the trace is flushed this tensor hasn't the metadata updated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant