Missed optimization: Clang keeps large fixed local array alive across recursive calls, wasting stack space

### Summary

Clang keeps a large fixed-size local array in the stack frame of a recursive function alive across recursive calls, even though the array is only used after both recursive calls return and its address does not escape into them.

This leads to stack usage of roughly `depth * sizeof(array)` instead of something close to `sizeof(array) + O(depth)` for a classic mergesort implementation, and seems to be a missed optimization in Clang/LLVM's lifetime analysis / stack allocation.

### Environment

* Clang version: 21.1.0
* Platform: x86_64 (as used by Compiler Explorer / godbolt.org)
* Command line used on Compiler Explorer:

```sh
  clang++ -std=c++23 -O3 -S mgsort.cpp -o mgsort.s
```

(On Compiler Explorer this corresponds to choosing Clang and passing `-std=c++23 -O3`.)

* Reproducer link (Compiler Explorer): [https://godbolt.org/z/9bG6jGqjj](https://godbolt.org/z/9bG6jGqjj)

### Test case

```c++
#include <algorithm>
#include <cstring>

constexpr int N = 100000 + 1;

int a[N];

void mgsort(int l, int r) {
    if (l == r) {
        return ;
    }
    int mid = (l + r) >> 1;
    mgsort(l, mid);
    mgsort(mid + 1, r);

    int b[N];
    std::merge(a + l, a + mid + 1, a + mid + 1, a + r + 1, b + l);
    std::memcpy(a + l, b + l, sizeof(int) * (r - l + 1));
}
```

This is intended to be standard C++ (no VLA).

### Actual behavior

For non‑leaf calls, Clang allocates the full `int b[N]` array in the prologue of `mgsort` before making any recursive calls and keeps this stack space reserved until the function returns.

On x86_64 with `-O3`, the generated assembly looks like (simplified):

```asm
mgsort(int, int):
    mov     eax, esi
    sub     eax, edi
    je      .LBB0_14               # base case: l == r

    push    rbp
    push    r15
    push    r14
    push    r13
    push    r12
    push    rbx
    sub     rsp, 400024            # reserve ~400k of stack for b[N]

    ... compute mid, save l / r ...

    call    mgsort                 # recursive call on left half
    ...
    call    mgsort                 # recursive call on right half

    ... later, compute the address of b + l on the stack ...
    lea     r14, [rsp + 4*rax]
    add     r14, 16                # r14 = b + l (b is based at rsp+16)

    ... merge + memcpy ...

    add     rsp, 400024
    pop     rbx
    pop     r12
    pop     r13
    pop     r14
    pop     r15
    pop     rbp
.LBB0_14:
    ret
```

So every recursive frame reserves a full `b[N]` array on its stack frame, and this space is live across the recursive calls.  With `N = 100000 + 1`, `sizeof(b)` is about 400 kB; the recursion depth is `O(log N)` (around 17 here), so the total stack consumed by this single variable is roughly `17 * 400kB ≈ 6–7 MB`.

However, `b` is only accessed after both recursive calls have returned, and there is no way for a recursive call to access its caller's `b`.

### Expected behavior

Since the first use of `b` is strictly after both recursive calls, and no pointer to `b` is passed into those calls (nor stored anywhere that
they could access), it should be legal for the compiler to shrink the allocation lifetime of `b` so that its storage does not overlap the recursive calls.

Conceptually, the compiler could:

* move the actual stack allocation for `b` from the function prologue to just before the `std::merge` / `memcpy` code, and
* free it immediately after `memcpy` returns.

With such an optimization, at run time there would only be a single active instance of `b` on the stack at any time, reducing stack usage from approximately `depth * sizeof(b)` to about `sizeof(b) + O(depth)`.

This appears to be a missed optimization in Clang/LLVM's handling of large fixed-size local arrays in recursive functions.

### Additional notes

* The issue reproduces for me on Compiler Explorer (see the link above) with `-std=c++23 -O3`. It also appears with other high optimization levels (e.g. `-O3` vs `-Ofast`) where stack allocation is similar.

* A related GCC bug report with the same test case is: [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123082](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123082)

In that report, GCC developers classify this as a `middle-end` `missed-optimization` and suggest a workaround using a GNU C++ VLA extension.  The intention here is similar: to see whether LLVM/Clang could perform a comparable lifetime-shrinking optimization for the standard C++ fixed-array case, without relying on extensions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed optimization: Clang keeps large fixed local array alive across recursive calls, wasting stack space #171606

Summary

Environment

Test case

Actual behavior

Expected behavior

Additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missed optimization: Clang keeps large fixed local array alive across recursive calls, wasting stack space #171606

Description

Summary

Environment

Test case

Actual behavior

Expected behavior

Additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions