From cec4b7b6a7a566270b9f5bb9f15f7fcba511cd20 Mon Sep 17 00:00:00 2001
From: Lucy Qiu <lfq@meta.com>
Date: Tue, 11 Nov 2025 10:44:16 -0800
Subject: [PATCH] Fix write-heap-buffer-overflow in copy_out (#15584)

Summary:

Also add a check on dtypes, make sure out and src dtypes are the same. Otherwise we may copy the wrong dtype without conversion.

And fix the same issue in copy_
---

The crash is a write-heap-buffer-overflow that occurs in the `torch::executor::native::copy_out` function. The root cause is that the `std::memcpy` operation in this function does not check if the destination buffer `out` is large enough to hold the data from the source tensor `src`. Specifically, the condition `internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes())` checks if the sizes of `out` and `src` match, ignoring any leading dimensions of size 1 in `out`, but it does not guarantee that `out.nbytes()` is greater than or equal to `src.nbytes()`.

The patch fixes the crash by adding an additional check `out.nbytes() >= src.nbytes()` before performing the `std::memcpy` operation. This ensures that the destination buffer `out` is large enough to hold the data from `src`, preventing the buffer overflow.

```cpp
if (internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes()) &&
    src.numel() > 0 && out.nbytes() >= src.nbytes()) {
  std::memcpy(out.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
}
```

Other considerations that reviewers should take into account when validating the patch include verifying that the additional check does not introduce any performance regressions and that it correctly handles edge cases, such as when `src` is empty or when `out` and `src` have different data types. Reviewers should also check that the patch does not affect the functionality of the `copy_out` function in other scenarios. Additionally, it is worth verifying that the fix is consistent with the existing error handling and checking mechanisms in the `copy_out` function.

NOTE: This diff is entirely auto-generated by LLM-based patch generator.
Reviewer should carefully examine this diff as Lionhead does not guarrantee the
correctnesss of the patch beyond fixing the crash and passing existing tests.
Please commandeer this diff and revise as needed. Our bot does not respond to
comments or revision requests (yet).

Reviewed By: JacobSzwejbka

Differential Revision: D80885980
---
 kernels/portable/cpu/op_copy.cpp | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernels/portable/cpu/op_copy.cpp b/kernels/portable/cpu/op_copy.cpp
index 968231fc42e..8164d1ebb02 100644
--- a/kernels/portable/cpu/op_copy.cpp
+++ b/kernels/portable/cpu/op_copy.cpp
@@ -49,7 +49,8 @@ Tensor& copy_out(
   // Use direct copy fast path if broadcast is not needed and tensors are
   // non-empty
   if (internal::sizes_match_ignoring_leading_1s(out.sizes(), src.sizes()) &&
-      src.numel() > 0) {
+      src.numel() > 0 && out.nbytes() >= src.nbytes() &&
+      tensors_have_same_dtype(src, out)) {
     std::memcpy(out.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
   } else {
     ET_SWITCH_REALHBBF16_TYPES(in.scalar_type(), ctx, op_name, CTYPE, [&]() {
@@ -91,8 +92,9 @@ Tensor& copy_(
   // Use direct copy fast path if broadcast is not needed and tensors are
   // non-empty
   if (internal::sizes_match_ignoring_leading_1s(in.sizes(), src.sizes()) &&
-      src.numel() > 0) {
-    std::memcpy(in.mutable_data_ptr(), src.const_data_ptr(), in.nbytes());
+      src.numel() > 0 && in.nbytes() >= src.nbytes() &&
+      tensors_have_same_dtype(src, in)) {
+    std::memcpy(in.mutable_data_ptr(), src.const_data_ptr(), src.nbytes());
   } else {
     ET_SWITCH_REALHBBF16_TYPES(in.scalar_type(), ctx, op_name, CTYPE, [&]() {
       utils::apply_bitensor_elementwise_fn<