Possible use-after-free of Tensor in JIT generated code

### 🐛 Describe the bug

I got occasional crashes in `test_cpp_extensions_jit` which I could easily trigger with `python test_cpp_extensions_jit.py -k test_warning`. Digging deeper I found the cause to be a potential use-after-free leading to heap corruption and a later crash in a malloc call (seen in GDB)

Using Valgrind I got the following trace:
```
==113540== Invalid read of size 8
==113540==    at 0xB2C51E25C: c10::detail::atomic_refcount_decrement(std::atomic<unsigned long>&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53F547: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C535CF7: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::~intrusive_ptr() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E6C7: at::TensorBase::~TensorBase() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E913: at::Tensor::~Tensor() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5346CF: torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}::operator()(at::Tensor, int) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5621C3: at::Tensor pybind11::detail::argument_loader<at::Tensor, int>::call_impl<at::Tensor, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, 0ul, 1ul, pybind11::detail::void_type>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55AFB7: std::enable_if<!std::is_void<at::Tensor>::value, at::Tensor>::type pybind11::detail::argument_loader<at::Tensor, int>::call<at::Tensor, pybind11::detail::void_type, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C551E7B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55236B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53191F: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0x420315F: cfunction_call (methodobject.c:543)
==113540==  Address 0xbab4968 is 8 bytes inside a block of size 192 free'd
==113540==    at 0x408A5D8: operator delete(void*, unsigned long) (vg_replace_malloc.c:1072)
==113540==    by 0xF64858F: c10::TensorImpl::~TensorImpl() (in /torchinstall/lib/python3.10/site-packages/torch/lib/libc10.so)
==113540==    by 0xB2C53F69F: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C535CF7: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::~intrusive_ptr() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E6C7: at::TensorBase::~TensorBase() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E913: at::Tensor::~Tensor() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C534527: torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}::operator()(at::Tensor, int) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5621C3: at::Tensor pybind11::detail::argument_loader<at::Tensor, int>::call_impl<at::Tensor, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, 0ul, 1ul, pybind11::detail::void_type>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55AFB7: std::enable_if<!std::is_void<at::Tensor>::value, at::Tensor>::type pybind11::detail::argument_loader<at::Tensor, int>::call<at::Tensor, pybind11::detail::void_type, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C551E7B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55236B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53191F: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==  Block was alloc'd at
==113540==    at 0x40866D0: operator new(unsigned long) (vg_replace_malloc.c:472)
==113540==    by 0x11386383: at::TensorBase at::detail::make_tensor_base<c10::TensorImpl, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKeySet&, caffe2::TypeMeta&>(c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >&&, c10::DispatchKeySet&, caffe2::TypeMeta&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x113867D3: at::TensorBase at::detail::_empty_generic<long>(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380A9F: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380B57: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380C07: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380D87: at::detail::empty_cpu(c10::ArrayRef<long>, c10::TensorOptions const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x128DC06F: at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x129F94BB: at::(anonymous namespace)::structured_cos_out_functional::set_output_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x1143C29B: at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11443363: at::TensorIteratorBase::build(at::TensorIteratorConfig&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11445B17: at::TensorIteratorBase::build_borrowing_unary_float_op(at::TensorBase const&, at::TensorBase const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540== 
==113540== Invalid write of size 8
==113540==    at 0xB2C51E264: c10::detail::atomic_refcount_decrement(std::atomic<unsigned long>&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53F547: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C535CF7: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::~intrusive_ptr() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E6C7: at::TensorBase::~TensorBase() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E913: at::Tensor::~Tensor() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5346CF: torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}::operator()(at::Tensor, int) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5621C3: at::Tensor pybind11::detail::argument_loader<at::Tensor, int>::call_impl<at::Tensor, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, 0ul, 1ul, pybind11::detail::void_type>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55AFB7: std::enable_if<!std::is_void<at::Tensor>::value, at::Tensor>::type pybind11::detail::argument_loader<at::Tensor, int>::call<at::Tensor, pybind11::detail::void_type, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C551E7B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55236B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53191F: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0x420315F: cfunction_call (methodobject.c:543)
==113540==  Address 0xbab4968 is 8 bytes inside a block of size 192 free'd
==113540==    at 0x408A5D8: operator delete(void*, unsigned long) (vg_replace_malloc.c:1072)
==113540==    by 0xF64858F: c10::TensorImpl::~TensorImpl() (in /torchinstall/lib/python3.10/site-packages/torch/lib/libc10.so)
==113540==    by 0xB2C53F69F: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C535CF7: c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::~intrusive_ptr() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E6C7: at::TensorBase::~TensorBase() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C51E913: at::Tensor::~Tensor() (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C534527: torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}::operator()(at::Tensor, int) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C5621C3: at::Tensor pybind11::detail::argument_loader<at::Tensor, int>::call_impl<at::Tensor, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, 0ul, 1ul, pybind11::detail::void_type>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55AFB7: std::enable_if<!std::is_void<at::Tensor>::value, at::Tensor>::type pybind11::detail::argument_loader<at::Tensor, int>::call<at::Tensor, pybind11::detail::void_type, torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&>(torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}&) && (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C551E7B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C55236B: pybind11::cpp_function::initialize<torch::detail::wrap_pybind_function_impl_<at::Tensor (&)(at::Tensor, int), 0ul, 1ul>(at::Tensor (&)(at::Tensor, int), std::integer_sequence<unsigned long, 0ul, 1ul>, bool)::{lambda(at::Tensor, int)#1}, at::Tensor, at::Tensor, int, pybind11::name, pybind11::scope, pybind11::sibling, char [4]>(at::Tensor (&)(at::Tensor, int), at::Tensor (*)(at::Tensor, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [4])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==    by 0xB2C53191F: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /torchcache/torch_extensions/py310_cpu/warn_mod/warn_mod_v1.so)
==113540==  Block was alloc'd at
==113540==    at 0x40866D0: operator new(unsigned long) (vg_replace_malloc.c:472)
==113540==    by 0x11386383: at::TensorBase at::detail::make_tensor_base<c10::TensorImpl, c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >, c10::DispatchKeySet&, caffe2::TypeMeta&>(c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_type<c10::StorageImpl> >&&, c10::DispatchKeySet&, caffe2::TypeMeta&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x113867D3: at::TensorBase at::detail::_empty_generic<long>(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380A9F: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380B57: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380C07: at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11380D87: at::detail::empty_cpu(c10::ArrayRef<long>, c10::TensorOptions const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x128DC06F: at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x129F94BB: at::(anonymous namespace)::structured_cos_out_functional::set_output_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x1143C29B: at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11443363: at::TensorIteratorBase::build(at::TensorIteratorConfig&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540==    by 0x11445B17: at::TensorIteratorBase::build_borrowing_unary_float_op(at::TensorBase const&, at::TensorBase const&) (in /torchinstall/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
==113540== 
UserWarning: Error with torch.DoubleTensor (Triggered internally at /torchcache/torch_extensions/py310_cpu/warn_mod/main.cpp:12.)
.
----------------------------------------------------------------------
Ran 1 test in 53.951s

OK
```

Note how the use-after-free is detected although it doesn't lead to a crash here, which is likely as nothing else runs/allocates after it.

I reduced the test code to the following which still reproduces the bug:
```
import warnings
import torch
import torch.utils.cpp_extension

source = '''
at::Tensor foo(at::Tensor x, int error_type) {
    std::ostringstream err_stream;
    err_stream << "Error with "  << x.type();

    TORCH_WARN(err_stream.str());
    return x.cos();
}
'''

t = torch.rand(2).double()

warn_mod = torch.utils.cpp_extension.load_inline(name='warnmod',
                                                    cpp_sources=[source],
                                                    functions=['foo'],
                                                    with_pytorch_error_handling=True)

with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("error")
    warn_mod.foo(t, 0)
```

The issue seems to get triggered by the warning converted to an error in combination with the pytorch_error_handling. I.e. without either of `TORCH_WARN`, `warnings.simplefilter("error")` or `with_pytorch_error_handling=True` the bug isn't triggered

### Versions

PyTorch version: 2.0.1
GCC version: (GCC) 12.2.0
Clang version: Could not collect
CMake version: version 3.24.3
Libc version: glibc-2.17

Python version: 3.10.8 (main, Jul 25 2023, 10:52:38) [GCC 12.2.0] (64-bit runtime)
Python platform: Linux-4.14.0-115.19.1.el7a.ppc64le-ppc64le-with-glibc2.17
CPU:
Architektur:                     ppc64le
Byte-Reihenfolge:                Little Endian


cc @malfet @zou3519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible use-after-free of Tensor in JIT generated code #112383

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible use-after-free of Tensor in JIT generated code #112383

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions