diff --git a/benchmarks/sql/LINQ.md b/benchmarks/sql/LINQ.md
index 80739e273..2d9170dc3 100644
--- a/benchmarks/sql/LINQ.md
+++ b/benchmarks/sql/LINQ.md
@@ -75,6 +75,7 @@ Notation: `—` means the variant is not applicable for this benchmark (operator
 | groupby_where_sum | `where → group_by → select((K, select → sum))` | 86 | 80 | **23** (PR-B upstream where + inner-select-sum fused) |
 | groupby_select_sum | `select → group_by → select((K, sum))` | — | 110 | **58** (PR-B upstream select fused via intermediate var bind) |
 | groupby_having_count | `group_by → having(len>=5) → select((K, length))` | 141 | 78 | **36** (PR-D having predicate rewritten to slot ref) |
+| groupby_having_hidden_sum | `group_by → having(select(price) → sum > N) → select((K, length))` | 175 | 109 | **40** (PR-E synthesized hidden accumulator slot for the having-only inner-select-sum) |
 | groupby_average | `group_by → select((K, select → average))` | 173 | 106 | **52** (PR-A2 follow-up: 2-slot acc + post-process divide) |
 | chained_where | `where → where → count` | 36 | 45 | 17 |
 | zip_dot_product | `zip → select → sum` | — | 53 | 37 |
@@ -507,7 +508,7 @@ Closes the SQL `HAVING` shape against the splice path: any `_having(pred)` betwe
 
 ### Deferred for follow-ups
 
-- Having predicates that reference a reducer absent from the select (would need a hidden accumulator slot in the table value type).
+- Having predicates that reference a reducer absent from the select on the **bare-form** select shape (the bare table layout uses `tuple<KeyT; AccT>` synthesized inside a qmacro with embedded `typedecl(invoke(...))` for the key type — can't be extended with a dynamic count of additional acc slots; named-tuple form is handled by PR-E below).
 - Having predicates that touch the raw bucket array (would require materializing the per-bucket array, defeating the splice).
 
 ## Phase 3+ groupby reducer expansion — average reducer (PR-A2 follow-up)
@@ -582,22 +583,112 @@ Only the last `min(N, length)` indices are visited; the entire `reverse_inplace`
 
 \* m1 SQL: `ORDER BY id DESC LIMIT 10` collapses to 0 ns/op via index. The m3f 0 ns/op result means sub-nanosecond per element averaged across 10/100K visited elements — the backward index loop runs only 10 iterations regardless of N.
 
+## Phase 3+ groupby reducer expansion — hidden-slot reducer-in-having (PR-E)
+
+Closes the explicit "having predicate references a reducer absent from the select" cascade case deferred since PR-D. The named-tuple select-shape now extends its internal table value type with one extra acc slot per having-only reducer; the per-element loop updates the hidden slot alongside the user-visible ones; result-build re-synthesizes the user's tuple via `ExprMakeTuple` (omitting hidden slots) while the having predicate substitutes hidden-slot references via the same `mk_slot_output_expr` helper PR-D already used for matched slots.
+
+**How it lands:**
+
+1. **Scan-then-rewrite the having predicate.** A new `extend_specs_for_missing_having_reducers` walks the having pred AST first. For each `<reducer>(<hb>._1)` or `<reducer>(select(<hb>._1, <lam>))` call whose reducer name doesn't match any existing select-side spec, it appends a new hidden `ReducerSpec` to the specs array with a fresh slot index (starting at `userVisibleSlotCount + 1`). Then the existing PR-D `rewrite_having_pred` runs unchanged — every reducer reference now has a matching spec to bind to.
+
+2. **`mk_hidden_acc_type` helper.** Derives the acc TypeDecl for hidden slots from the reducer name + bucket-element type (and the peeled inner-body type for inner-select forms). Mirrors the per-reducer-name logic of `slot_acc_type` but takes the source types directly (hidden slots have no user-visible group_proj body to drill into). `length`/`count` → `int`; `long_count` → `int64`; `average` → `tuple<double; uint64>` via `mk_avg_acc_type`; `sum`/`min`/`max`/`first` → bucket element type (or inner-body type).
+
+3. **Extend `tabValueType.argTypes` (and `argNames`).** After the existing avg-slot type swap, the named-tuple's tabValueType gets each hidden spec's accType appended at the end. argNames grow in lockstep with auto-generated `_hidden_{slot}` names to keep the named-tuple shape valid.
+
+4. **Unify result-build ExprMakeTuple synthesis.** What was `elif (hasAvg)` becomes `elif (hasAvg || hasHidden)` — the same loop that re-synthesizes the user's tuple slot-by-slot (dividing avg slots, copying others) naturally omits hidden slots because it iterates `1 .. groupProjBody._type.argTypes |> length` (the user's slot count, NOT the internal table's extended count).
+
+5. **Bare-form bail.** When the user's group_proj is a single bare reducer (not a named tuple) AND having needs a hidden slot, the planner returns null and the chain cascades. The bare table layout uses `tuple<KeyT; AccT>` synthesized inside a qmacro with embedded `typedecl(invoke(...))` for the key type, which can't be dynamically extended with additional acc slots. Pinned by `test_group_by_having_hidden_bare_form_cascades_to_tier2`.
+
+**Bail cases (cascade to tier 2):**
+- Bare-form select with hidden-slot reducer in having (per above).
+- Inner-select reducer in having whose lambda is non-peelable (peel failure returns null).
+- Two same-named inner-select reducer references in the same having clause (e.g., `_._1|>select(λ1)|>sum + _._1|>select(λ2)|>sum > N`). The splice can't tell `λ1` and `λ2` apart without structural compare; routing both to the first hidden slot would silently conflate them. Cascade tier 2 evaluates each `select()` separately and stays correct.
+- All upstream bail cases from PR-A1/A2/B/D still apply.
+
+**Edge cases worth noting:**
+- Bare reducer with same name in select AND having reuses the select slot (existing PR-D match-by-name behavior preserved).
+- Inner-select reducer in having that name-matches a select-side spec routes to that visible slot (PR-D limitation: v1 trusts the user wrote identical lambdas; structural lambda compare deferred).
+- Multiple references to the SAME bare reducer (e.g., `sum`) in the predicate dedup naturally: the first walk adds the spec; subsequent walks see a matching spec and skip.
+- 2+ DIFFERENT-named hidden reducers fall into the same scan pass; each gets its own slot.
+
+### Headline (PR-E)
+
+| Benchmark | Shape | m1 (sql) | m3 (linq) | m3f (this PR) | Win |
+|---|---|---:|---:|---:|---:|
+| groupby_having_hidden_sum | `each(arr) → group_by(brand) → having(sum(price) > 50000) → select((K, length)) → to_array` | 175 | 109 | **40** | **2.7× over m3 / 4.4× over m1 SQL** |
+| groupby_having_count (regression check) | `group_by → having(len>=5) → select((K, length))` | 141 | 78 | **36** | parity — PR-E scan-then-rewrite preserves the matching-slot splice |
+
+`groupby_having_hidden_sum` lands at ~40 ns/op vs `groupby_having_count`'s ~36 — the extra ~4 ns/op is the hidden inner-select-sum's per-element `entry._{hidden} += c.price` (one extra add per source element) alongside the visible `length++`.
+
+### Deferred for follow-ups
+
+- Bare-form select with hidden slot (requires restructuring the bare-table key-type qmacro to programmatic TypeDecl construction).
+- Inner-select reducer in having that uses a different lambda from a same-named select-side inner-select reducer — currently match-by-name conflates them (already a v1 limitation since PR-D; structural lambda compare TBD).
+- Two same-named inner-select reducers in the same having clause currently bail to cascade — a structural lambda compare would let us splice both correctly (sharing a slot when identical, splitting into two hidden slots when different).
+
+## Phase 3+ terminal-walk lane: last / single / element_at / aggregate (PR-F)
+
+Closes the `test_linq_element.das` (last / last_or_default / single / single_or_default / element_at / element_at_or_default) and `test_linq_aggregation.das` (aggregate(seed, fn)) gaps. Builds on the existing EARLY_EXIT lane infrastructure (`emit_early_exit_lane`): the function name is now a misnomer (last/single/aggregate walk the entire source), but the structural shape — one for-loop + prelude/per-match/tail statements + optional skip/take wrap — fits all seven new operators perfectly. The lane classifier is extended; `emit_early_exit_lane` gains 7 per-op arms.
+
+**How it lands:**
+
+1. **`classify_terminator` extension.** Adds `last`, `last_or_default`, `single`, `single_or_default`, `element_at`, `element_at_or_default`, `aggregate` to the EARLY_EXIT bucket. The bucket name covers any single-return terminator whose emission shape is "one for-loop + tail return".
+
+2. **`fold_linq_cond2` helper.** 2-arg sibling of `fold_linq_cond` that peels `block<(acc, x):AGG>` bodies — single-return blocks get their formals renamed via `Template.renameVariable` on both `acc` and `x`. Non-peelable bodies (multi-stmt blocks) return `null` so the caller falls back to `invoke(fn, acc, val)`. Mirrors PR-A1's inner-select-sum peel philosophy.
+
+3. **Per-op emission arms (in `emit_early_exit_lane`):**
+   - **`last` / `last_or_default`**: prelude `var found = false; var lastBind : T`; per-match `found = true; lastBind := val`; tail `if (!found) panic | return default; return <- lastBind`.
+   - **`single` / `single_or_default`**: prelude `var found = false; var bind : T`; per-match `if (found) panic | return default; found = true; bind := val`; tail `if (!found) panic | return default; return <- bind`. Note: `single_or_default` early-exits on the SECOND match (returns default), but `single` continues to panic — both still walk to the second match before deciding.
+   - **`element_at` / `element_at_or_default`**: prelude binds idx, validates `idx < 0` (panic / return default), then `var counter = 0`; per-match `if (counter == idx) return val; counter ++`; tail panic / return default.
+   - **`aggregate`**: prelude `var acc = seed` (workhorse) or `var acc <- seed` (non-workhorse); per-match `acc = peeledBody` (workhorse) or `acc <- peeledBody` (non-workhorse), where `peeledBody` is the inlined block body or `invoke(fn, acc, val)` if peel failed; tail `return acc` / `return <- acc`. Workhorse-vs-non-workhorse switching mirrors the user-side `aggregate`'s static_if (linq.das:1466).
+
+4. **Daslib bugfix (collateral).** `aggregate_impl_const` (linq.das:1458) had a static_if checking `is_workhorse(type<TT>)` where TT is the element type — but move-vs-return is decided by AGG (return/accumulator type), not TT. Surfaced when adding a parity benchmark with non-workhorse element type (`Car`) and workhorse seed (`int`): the impl tried `return <- int_from_const` which fails. Fixed by checking `is_workhorse(type<AGG>)` to match the public `aggregate(array, ...)` overload's static_if.
+
+**Bail cases (cascade to tier 2):**
+- Aggregate with a non-peelable block body (multi-statement). Cascade preserves correctness via runtime `invoke`.
+- All upstream bail cases from Phase 2A/2B still apply (`is_buffer_required_op` source, skip/take ordering).
+
+**Edge cases worth noting:**
+- `aggregate` with a peelable single-return block emits zero per-element invokes (proven by `test_aggregate_splices_peeled`'s `count_invoke_nodes` assertion).
+- `element_at` const-folds away the `idx < 0` pre-loop panic when the index is a positive literal — `panic` call count drops from 2 to 1. AST test asserts `>= 1`.
+- `single_or_default` deviates from `single`'s "walk-to-second-match" by early-exiting on the second match (matches the user-side `single_or_default(iterator)` semantics at linq.das:2713).
+- `last` / `last_or_default` work cleanly with `_select` projection — the bind captures the projected value, not the source element.
+
+### Headline (PR-F)
+
+| Benchmark | Shape | m1 (sql) | m3 (linq) | m3f (this PR) | Win |
+|---|---|---:|---:|---:|---:|
+| last_match | `each(arr) → where(price > T) → last` | 0\* | 29 | **5** | **5.8× over m3** |
+| single_match | `each(arr) → where(id == K) → single` | 0\* | 19 | **2** | **9.5× over m3** |
+| element_at_match | `each(arr) → where(price > T) → element_at(100)` | 0\* | 29 | **0\*\*** | **early-exit at ~100 source elements** |
+| aggregate_match | `each(arr) → where(price > T) → aggregate(0, $(a, c) => a + c.price)` | 34 | 51 | **5** | **10.2× over m3 / 6.8× over m1 SQL** |
+
+\* m1 SQL benchmarks here have a primary-key (`id`) lookup or LIMIT-1 fast path that bottoms out below the dastest timer resolution.
+\*\* m3f `element_at_match` bottoms out at 0 ns/op because the splice exits after visiting `INDEX + matching_density` source elements (~100 of 100K), so total time divided by source size is effectively 0 — this is the early-exit win.
+
+`aggregate_match` is the standout: peeling the block body inline + fusing the upstream where filter into the same per-element loop eliminates BOTH the per-element block invoke AND the where-iterator allocation. m3 pays for both; m3f pays for neither.
+
+### Deferred for follow-ups
+
+- Aggregate with non-peelable block body cascades to tier 2 (correct but slower) — a `return $b(stmts)` pattern recognizer would let multi-statement blocks splice too.
+- Skip-family (`skip_last` / `take_last`) — these need buffer state similar to PR-C's reverse_take; separate follow-up.
+
 ## Operator-coverage checklist (parity tests)
 
-The 24 benchmarks above cover the most common shapes. The end-game target is one benchmark per `_fold`-applicable scenario in the broader `tests/linq/` operator suite. Tracking the long-tail coverage below; PRs that add splice support for new operators should add a benchmark here if not already present.
+The benchmarks above cover the most common shapes. The end-game target is one benchmark per `_fold`-applicable scenario in the broader `tests/linq/` operator suite. Tracking the long-tail coverage below; PRs that add splice support for new operators should add a benchmark here if not already present.
 
 | Source test file | Operator group | Covered by benchmark | Status |
 |---|---|---|---|
 | `test_linq.das` | comprehension basics | count_aggregate, sum_aggregate | ✅ |
-| `test_linq_aggregation.das` | count/sum/min/max/avg/aggregate | count/sum/min/max/average_aggregate, sum_where, long_count_aggregate | ✅ core; `aggregate(seed, fn)` ⏳ |
+| `test_linq_aggregation.das` | count/sum/min/max/avg/aggregate | count/sum/min/max/average_aggregate, sum_where, long_count_aggregate, aggregate_match | ✅ core; ✅ `aggregate(seed, fn)` with peeled block body + workhorse/non-workhorse seed split (PR-F) |
 | `test_linq_querying.das` | any/all/contains | any_match, all_match, contains_match | ✅ core |
 | `test_linq_transform.das` | select/select_many/zip | to_array_filter, zip_dot_product | ✅ select/zip; `select_many` ⏳ |
 | `test_linq_sorting.das` | order/order_by/reverse | sort_first, sort_take, select_where_order_take, reverse_take | ✅ ascending + `order_descending` (Phase 3); ✅ `reverse` (Phase 3+); ✅ `reverse \|> take(N)` backward index loop on array sources (PR-C — closes the prior regression) |
-| `test_linq_group_by.das` | group_by/group_by_lazy/having | groupby_count, groupby_sum, groupby_min, groupby_max, groupby_first, groupby_multi_reducer, groupby_where_count, groupby_where_sum, groupby_select_sum, groupby_having_count, groupby_average | ✅ count/long_count/sum + inner-select-sum (PR-A1); ✅ min/max/first + inner-select-{min,max,first} + multi-reducer (PR-A2); ✅ upstream where_/select* fusion (PR-B); ✅ `having_` with matching slot (PR-D); ✅ `average` + inner-select-average + multi-reducer-with-average (PR-A2 follow-up) |
+| `test_linq_group_by.das` | group_by/group_by_lazy/having | groupby_count, groupby_sum, groupby_min, groupby_max, groupby_first, groupby_multi_reducer, groupby_where_count, groupby_where_sum, groupby_select_sum, groupby_having_count, groupby_having_hidden_sum, groupby_average | ✅ count/long_count/sum + inner-select-sum (PR-A1); ✅ min/max/first + inner-select-{min,max,first} + multi-reducer (PR-A2); ✅ upstream where_/select* fusion (PR-B); ✅ `having_` with matching slot (PR-D); ✅ `average` + inner-select-average + multi-reducer-with-average (PR-A2 follow-up); ✅ hidden-slot reducer-in-having on named-tuple select shape (PR-E) |
 | `test_linq_join.das` | join/left_join/right_join/full_outer/cross | join_count | ✅ inner; outer joins + cross ⏳ |
 | `test_linq_partition.das` | take/skip/take_while/skip_while/chunk | take_count, skip_take, take_sum_aggregate, take_count_filtered | ✅ take/skip in splice lanes; `_while` + `chunk` ⏳ |
 | `test_linq_set.das` | distinct/union/except/intersect/unique | distinct_count, distinct_take | ✅ distinct + distinct_by (streaming dedup, this PR); union/except/intersect/unique ⏳ |
-| `test_linq_element.das` | first/last/single/element_at + _or_default | first_match, first_or_default_match | ✅ first/first_or_default; last/single/element_at ⏳ |
+| `test_linq_element.das` | first/last/single/element_at + _or_default | first_match, first_or_default_match, last_match, single_match, element_at_match | ✅ first/first_or_default; ✅ last/last_or_default/single/single_or_default/element_at/element_at_or_default (PR-F terminal-walk lane) |
 | `test_linq_concat.das` | concat/prepend/append | — | ⏳ |
 | `test_linq_generation.das` | range/repeat/etc. | — | ⏳ |
 | `test_linq_bugs.das` | regression cases | — | ⏳ as bugs surface |
diff --git a/benchmarks/sql/aggregate_match.das b/benchmarks/sql/aggregate_match.das
new file mode 100644
index 000000000..eb4b65170
--- /dev/null
+++ b/benchmarks/sql/aggregate_match.das
@@ -0,0 +1,60 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+// User-supplied binary reducer: max(price - min) over a where-filtered slice. SQL
+// can express this as `SELECT (MAX(price) - MIN(price))`. m3 traverses the array
+// with the user-block invoked per element; m3f peels the block body and inlines
+// alongside the where predicate — single pass with no per-element block invoke.
+
+let THRESHOLD = 200
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let total = _sql(db |> select_from(type<Car>) |> _where(_.price > THRESHOLD)
+                                |> _select(_.price) |> sum())
+            if (total == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let total = (arr |> _where(_.price > THRESHOLD)
+                         |> aggregate(0, $(acc : int, c : Car) => acc + c.price))
+        if (total == 0) {
+            b->failNow()
+        }
+    }
+}
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let total = _fold(each(arr)._where(_.price > THRESHOLD)
+            .aggregate(0, $(acc : int, c : Car) => acc + c.price))
+        if (total == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def aggregate_match_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def aggregate_match_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def aggregate_match_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/benchmarks/sql/element_at_match.das b/benchmarks/sql/element_at_match.das
new file mode 100644
index 000000000..61c51fa02
--- /dev/null
+++ b/benchmarks/sql/element_at_match.das
@@ -0,0 +1,58 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+let THRESHOLD = 500
+let INDEX = 100
+
+// SQL: SELECT ... WHERE price > T LIMIT 1 OFFSET N — engine skips N then takes 1.
+// m3 materializes full filtered array then indexes [N].
+// m3f counts matches and early-exits when counter hits N.
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let row = _sql(db |> select_from(type<Car>) |> _where(_.price > THRESHOLD)
+                              |> skip(INDEX) |> _first())
+            if (row.price == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let row = arr |> _where(_.price > THRESHOLD) |> element_at(INDEX)
+        if (row.price == 0) {
+            b->failNow()
+        }
+    }
+}
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let row = _fold(each(arr)._where(_.price > THRESHOLD).element_at(INDEX))
+        if (row.price == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def element_at_match_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def element_at_match_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def element_at_match_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/benchmarks/sql/groupby_having_hidden_sum.das b/benchmarks/sql/groupby_having_hidden_sum.das
new file mode 100644
index 000000000..ab0e2cb69
--- /dev/null
+++ b/benchmarks/sql/groupby_having_hidden_sum.das
@@ -0,0 +1,66 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+// _group_by(_.brand) |> _having(_._1 |> select(price) |> sum > 50000) |> _select((Brand=key, N=length)).
+// SQL: SELECT brand, COUNT(*) FROM cars GROUP BY brand HAVING SUM(price) > 50000.
+// Splice mode (PR-E) recognizes that the having predicate's inner-select-sum on price has
+// NO matching select-side slot — the planner synthesizes a hidden accumulator slot on the
+// internal table value type; the per-element loop updates it alongside `length`; result-build
+// projects only the user-visible (Brand, N) and applies the having filter against the hidden
+// slot via `kv._{hidden} > 50000`. Without PR-E this chain cascaded to tier 2 (full bucket-
+// array materialization).
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let groups <- _sql(db |> select_from(type<Car>)
+                                  |> _group_by(_.brand)
+                                  |> _having(_._1 |> select($(c : Car) => c.price) |> sum > 50000)
+                                  |> _select((Brand = _._0, N = _._1 |> length)))
+            if (empty(groups)) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let groups <- (arr._group_by(_.brand)._having(_._1 |> select($(c : Car) => c.price) |> sum > 50000)._select((Brand = _._0, N = _._1 |> length)))
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let groups <- _fold(each(arr)
+                            ._group_by(_.brand)
+                            ._having(_._1 |> select($(c : Car) => c.price) |> sum > 50000)
+                            ._select((Brand = _._0, N = _._1 |> length))
+                            .to_array())
+        if (empty(groups)) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def groupby_having_hidden_sum_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def groupby_having_hidden_sum_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def groupby_having_hidden_sum_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/benchmarks/sql/last_match.das b/benchmarks/sql/last_match.das
new file mode 100644
index 000000000..cc33f9a93
--- /dev/null
+++ b/benchmarks/sql/last_match.das
@@ -0,0 +1,57 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+let THRESHOLD = 500
+
+// SQL: `SELECT ... WHERE price > THRESHOLD ORDER BY id DESC LIMIT 1` — engine reverses
+// then takes one. m3 materializes the full filtered array then indexes [-1].
+// m3f walks the source once, overwriting a single bind on each match.
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let row = _sql(db |> select_from(type<Car>) |> _where(_.price > THRESHOLD)
+                              |> _order_by_descending(_.id) |> _first())
+            if (row.price == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let row = arr |> _where(_.price > THRESHOLD) |> last()
+        if (row.price == 0) {
+            b->failNow()
+        }
+    }
+}
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let row = _fold(each(arr)._where(_.price > THRESHOLD).last())
+        if (row.price == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def last_match_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def last_match_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def last_match_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/benchmarks/sql/single_match.das b/benchmarks/sql/single_match.das
new file mode 100644
index 000000000..37bab8394
--- /dev/null
+++ b/benchmarks/sql/single_match.das
@@ -0,0 +1,58 @@
+options gen2
+options persistent_heap
+
+require _common public
+
+// Fixture car ids run 1..n so id=42 is unique. `single` walks the full source
+// (semantically — exactly-one-match), but the splice still fuses upstream where.
+//
+// SQL: SELECT * FROM Cars WHERE id = 42 LIMIT 2 — read at most 2 to assert uniqueness.
+// m3/m3f traverse the array filtering and asserting one survivor.
+
+let TARGET_ID = 42
+
+def run_m1(b : B?; n : int) {
+    with_sqlite(":memory:") $(db) {
+        fixture_db(db, n)
+        b |> run("m1_sql/{n}", n) {
+            let row = _sql(db |> select_from(type<Car>) |> _where(_.id == TARGET_ID) |> _first())
+            if (row.id == 0) {
+                b->failNow()
+            }
+        }
+    }
+}
+
+def run_m3(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3_array/{n}", n) {
+        let row = arr |> _where(_.id == TARGET_ID) |> single()
+        if (row.id == 0) {
+            b->failNow()
+        }
+    }
+}
+def run_m3f(b : B?; n : int) {
+    let arr <- fixture_array(n)
+    b |> run("m3f_array_fold/{n}", n) {
+        let row = _fold(each(arr)._where(_.id == TARGET_ID).single())
+        if (row.id == 0) {
+            b->failNow()
+        }
+    }
+}
+
+[benchmark]
+def single_match_m1(b : B?) {
+    run_m1(b, 100000)
+}
+
+[benchmark]
+def single_match_m3(b : B?) {
+    run_m3(b, 100000)
+}
+
+[benchmark]
+def single_match_m3f(b : B?) {
+    run_m3f(b, 100000)
+}
diff --git a/daslib/linq.das b/daslib/linq.das
index 20788b168..f4635a65f 100644
--- a/daslib/linq.das
+++ b/daslib/linq.das
@@ -1456,7 +1456,10 @@ def private aggregate_impl(var src; tt : auto(TT); seed : auto(AGG); func : bloc
 
 [unused_argument(tt)]
 def private aggregate_impl_const(src : auto(ARGT); tt : auto(TT); seed : auto(AGG); func : block<(acc : AGG -&, x : TT -&) : AGG -&>) : AGG -& -const {
-    static_if (typeinfo is_workhorse(type<TT>)) {
+    // Move-vs-return is decided by the accumulator type AGG (the return type), NOT the
+    // element type TT — those differ for aggregates that produce a workhorse acc from a
+    // non-workhorse element (e.g. summing prices off `array<Car>` with int seed).
+    static_if (typeinfo is_workhorse(type<AGG>)) {
         return aggregate_impl(unsafe(reinterpret<ARGT -const>(src)), type<TT -const -&>, seed, func)
     } else {
         return <- aggregate_impl(unsafe(reinterpret<ARGT -const>(src)), type<TT -const -&>, seed, func)
diff --git a/daslib/linq_fold.das b/daslib/linq_fold.das
index 100ff5840..c94c3516e 100644
--- a/daslib/linq_fold.das
+++ b/daslib/linq_fold.das
@@ -87,6 +87,27 @@ def private fold_linq_cond_peel(var expr : Expression?; var replacement : Expres
     return qmacro(invoke($e(expr), $e(replacement)))
 }
 
+[macro_function]
+def private fold_linq_cond2(var expr : Expression?; arg0Name, arg1Name : string) : Expression? {
+    // 2-arg peel for aggregate's block<(acc, x):AGG>; returns null if non-peelable so caller falls back to invoke.
+    if (expr is ExprMakeBlock) {
+        var mblk = expr as ExprMakeBlock
+        var blk = mblk._block as ExprBlock
+        if (blk.arguments |> length == 2 && blk.list |> length == 1 && blk.list[0] is ExprReturn) {
+            var ret = blk.list[0] as ExprReturn
+            if (ret.subexpr != null) {
+                var res = clone_expression(ret.subexpr)
+                var rules : Template
+                rules |> renameVariable(string(blk.arguments[0].name), arg0Name)
+                rules |> renameVariable(string(blk.arguments[1].name), arg1Name)
+                apply_template(rules, res.at, res)
+                return res
+            }
+        }
+    }
+    return null
+}
+
 struct private LinqCall {
     name : string
     moduleName : string = "linq"
@@ -359,7 +380,12 @@ def private classify_terminator(name : string) : LinqLane {
     // take/skip trailing (after to_array strip) → ARRAY lane with implicit materialization.
     if (name == "where_" || name == "select" || name == "take" || name == "skip") return LinqLane.ARRAY
     if (name == "sum" || name == "min" || name == "max" || name == "average" || name == "long_count") return LinqLane.ACCUMULATOR
-    if (name == "first" || name == "first_or_default" || name == "any" || name == "all" || name == "contains") return LinqLane.EARLY_EXIT
+    // EARLY_EXIT is also the dispatch lane for full-walk single-return terminators (last/single/element_at/aggregate) — same emit_early_exit_lane shape, different per-op state.
+    if (name == "first" || name == "first_or_default" || name == "any" || name == "all" || name == "contains"
+            || name == "last" || name == "last_or_default"
+            || name == "single" || name == "single_or_default"
+            || name == "element_at" || name == "element_at_or_default"
+            || name == "aggregate") return LinqLane.EARLY_EXIT
     return LinqLane.UNKNOWN
 }
 
@@ -846,6 +872,219 @@ def private emit_early_exit_lane(
         tailStmts |> push <| qmacro_expr() {
             return false
         }
+    } elif (opName == "last" || opName == "last_or_default") {
+        // Full walk; per-match overwrites a single bind; tail returns it after found-check (panic/default for empty).
+        var retType : TypeDeclPtr
+        if (projection != null) {
+            retType = clone_type(projection._type)
+        } else {
+            retType = clone_type(elementType)
+        }
+        if (retType != null) {
+            retType.flags.constant = false
+            retType.flags.ref = false
+        }
+        let foundName = "`found`{at.line}`{at.column}"
+        let lastBindName = "`lst`{at.line}`{at.column}"
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(foundName) = false
+        }
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(lastBindName) : $t(retType)
+        }
+        perMatchStmts |> push <| qmacro_expr() {
+            $i(foundName) = true
+        }
+        perMatchStmts |> push <| qmacro_expr() {
+            $i(lastBindName) := $i(valueName)
+        }
+        if (opName == "last") {
+            tailStmts <- qmacro_block_to_array() {
+                if (!$i(foundName)) {
+                    panic("sequence contains no elements")
+                }
+                return <- $i(lastBindName)
+            }
+        } else {
+            // Bind `d` once at the top (eager, matches linq.das line 2621).
+            let defaultName = "`dval`{at.line}`{at.column}"
+            var defaultExpr = clone_expression(terminatorCall.arguments[1])
+            preludeStmts |> push <| qmacro_expr() {
+                let $i(defaultName) = $e(defaultExpr)
+            }
+            tailStmts <- qmacro_block_to_array() {
+                if (!$i(foundName)) {
+                    return $i(defaultName)
+                }
+                return <- $i(lastBindName)
+            }
+        }
+    } elif (opName == "single" || opName == "single_or_default") {
+        // Full walk; on the SECOND match `single` panics and `single_or_default` returns the default; same for empty source.
+        var retType : TypeDeclPtr
+        if (projection != null) {
+            retType = clone_type(projection._type)
+        } else {
+            retType = clone_type(elementType)
+        }
+        if (retType != null) {
+            retType.flags.constant = false
+            retType.flags.ref = false
+        }
+        let foundName = "`found`{at.line}`{at.column}"
+        let bindName = "`sng`{at.line}`{at.column}"
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(foundName) = false
+        }
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(bindName) : $t(retType)
+        }
+        if (opName == "single") {
+            perMatchStmts |> push <| qmacro_expr() {
+                if ($i(foundName)) {
+                    panic("sequence contains more than one element")
+                }
+            }
+            perMatchStmts |> push <| qmacro_expr() {
+                $i(foundName) = true
+            }
+            perMatchStmts |> push <| qmacro_expr() {
+                $i(bindName) := $i(valueName)
+            }
+            tailStmts <- qmacro_block_to_array() {
+                if (!$i(foundName)) {
+                    panic("sequence contains no elements")
+                }
+                return <- $i(bindName)
+            }
+        } else {
+            let defaultName = "`dval`{at.line}`{at.column}"
+            var defaultExpr = clone_expression(terminatorCall.arguments[1])
+            preludeStmts |> push <| qmacro_expr() {
+                let $i(defaultName) = $e(defaultExpr)
+            }
+            // `single_or_default` early-returns default on the SECOND match (subsequent matches don't matter).
+            perMatchStmts |> push <| qmacro_expr() {
+                if ($i(foundName)) {
+                    return $i(defaultName)
+                }
+            }
+            perMatchStmts |> push <| qmacro_expr() {
+                $i(foundName) = true
+            }
+            perMatchStmts |> push <| qmacro_expr() {
+                $i(bindName) := $i(valueName)
+            }
+            tailStmts <- qmacro_block_to_array() {
+                if (!$i(foundName)) {
+                    return $i(defaultName)
+                }
+                return <- $i(bindName)
+            }
+        }
+    } elif (opName == "element_at" || opName == "element_at_or_default") {
+        // Counter-style early-exit: return the N-th element that survives upstream where/select. Negative index panics or returns default pre-loop.
+        var retType : TypeDeclPtr
+        if (projection != null) {
+            retType = clone_type(projection._type)
+        } else {
+            retType = clone_type(elementType)
+        }
+        if (retType != null) {
+            retType.flags.constant = false
+            retType.flags.ref = false
+        }
+        let idxName = "`idx`{at.line}`{at.column}"
+        let cntName = "`ec`{at.line}`{at.column}"
+        var idxExpr = clone_expression(terminatorCall.arguments[1])
+        preludeStmts |> push <| qmacro_expr() {
+            let $i(idxName) = $e(idxExpr)
+        }
+        if (opName == "element_at") {
+            preludeStmts |> push <| qmacro_expr() {
+                if ($i(idxName) < 0) {
+                    panic("element index out of range")
+                }
+            }
+        } else {
+            preludeStmts |> push <| qmacro_expr() {
+                if ($i(idxName) < 0) {
+                    return default<$t(retType)>
+                }
+            }
+        }
+        preludeStmts |> push <| qmacro_expr() {
+            var $i(cntName) = 0
+        }
+        perMatchStmts |> push <| qmacro_expr() {
+            if ($i(cntName) == $i(idxName)) {
+                return $i(valueName)
+            }
+        }
+        perMatchStmts |> push <| qmacro_expr() {
+            $i(cntName) ++
+        }
+        if (opName == "element_at") {
+            tailStmts <- qmacro_block_to_array() {
+                panic("element index out of range")
+                return default<$t(retType)>
+            }
+        } else {
+            tailStmts |> push <| qmacro_expr() {
+                return default<$t(retType)>
+            }
+        }
+    } elif (opName == "aggregate") {
+        // Peel single-return `block<(acc, x):AGG>` body inline; fall back to invoke on multi-stmt. Workhorse seed picks `=` / `return`; non-workhorse uses `<-` / `return <-` (matches linq.das:1466 static_if).
+        var seedExpr = clone_expression(terminatorCall.arguments[1])
+        var aggFn = clone_expression(terminatorCall.arguments[2])
+        var accType = clone_type(seedExpr._type)
+        if (accType != null) {
+            accType.flags.constant = false
+            accType.flags.ref = false
+        }
+        let aggAccName = "`agg`{at.line}`{at.column}"
+        let aggIsWorkhorse = (seedExpr._type != null && seedExpr._type.isWorkhorseType)
+        if (aggIsWorkhorse) {
+            preludeStmts |> push <| qmacro_expr() {
+                var $i(aggAccName) : $t(accType) = $e(seedExpr)
+            }
+        } else {
+            preludeStmts |> push <| qmacro_expr() {
+                var $i(aggAccName) : $t(accType) <- $e(seedExpr)
+            }
+        }
+        var peeled = fold_linq_cond2(aggFn, aggAccName, valueName)
+        if (peeled != null) {
+            if (aggIsWorkhorse) {
+                perMatchStmts |> push <| qmacro_expr() {
+                    $i(aggAccName) = $e(peeled)
+                }
+            } else {
+                perMatchStmts |> push <| qmacro_expr() {
+                    $i(aggAccName) <- $e(peeled)
+                }
+            }
+        } else {
+            if (aggIsWorkhorse) {
+                perMatchStmts |> push <| qmacro_expr() {
+                    $i(aggAccName) = invoke($e(aggFn), $i(aggAccName), $i(valueName))
+                }
+            } else {
+                perMatchStmts |> push <| qmacro_expr() {
+                    $i(aggAccName) <- invoke($e(aggFn), $i(aggAccName), $i(valueName))
+                }
+            }
+        }
+        if (aggIsWorkhorse) {
+            tailStmts |> push <| qmacro_expr() {
+                return $i(aggAccName)
+            }
+        } else {
+            tailStmts |> push <| qmacro_expr() {
+                return <- $i(aggAccName)
+            }
+        }
     } else {
         return null
     }
@@ -2015,6 +2254,109 @@ def private mk_slot_output_expr(at : LineInfo; entryName : string; spec : Reduce
     return mk_slot_ref(at, entryName, spec.slot)
 }
 
+[macro_function]
+def private mk_hidden_acc_type(reducerName : string; isInnerSelect : bool;
+                               bucketElemType, innerBodyType : TypeDeclPtr; at : LineInfo) : TypeDeclPtr {
+    // Hidden-slot acc type: like slot_acc_type but takes source types directly (hidden slots have no group_proj body to drill into).
+    if (reducerName == "length" || reducerName == "count") return new TypeDecl(baseType = Type.tInt, at = at)
+    if (reducerName == "long_count") return new TypeDecl(baseType = Type.tInt64, at = at)
+    if (reducerName == "average" || reducerName == "average_inner_select") return mk_avg_acc_type(at)
+    let src = isInnerSelect ? innerBodyType : bucketElemType
+    if (src == null) return null
+    return strip_const_ref(clone_type(src))
+}
+
+[macro_function]
+def private having_reducer_append_if_missing(var expr : Expression?; hbName, itName : string;
+                                             var specs : array<ReducerSpec>; userVisibleSlotCount : int;
+                                             bucketElemType : TypeDeclPtr; at : LineInfo) : bool {
+    // Append a hidden ReducerSpec when `expr` is a bare/inner-select reducer on the having bind that has no name-match in `specs`. Returns false only on inner-select peel failure or on a second same-named hidden inner-select reducer (caller cascades — the splice can't distinguish two different inner lambdas, so we'd silently route both predicate terms to the first hidden slot).
+    if (expr == null || !(expr is ExprCall)) return true
+    let c = expr as ExprCall
+    if (c.func == null || (c.arguments |> length) != 1) return true
+    var fnName = string(c.func.name)
+    if (c.func.fromGeneric != null) {
+        fnName = string(c.func.fromGeneric.name)
+    }
+    let isReducer = (fnName == "length" || fnName == "count" || fnName == "long_count"
+            || fnName == "sum" || fnName == "min" || fnName == "max"
+            || fnName == "first" || fnName == "average")
+    if (!isReducer) return true
+    if (is_bind_field_access(c.arguments[0], hbName, 1)) {
+        for (s in specs) {
+            if (s.innerBody == null && s.name == fnName) return true
+        }
+        var accType = mk_hidden_acc_type(fnName, false, bucketElemType, null, at)
+        if (accType == null) return true
+        let newSlot = (specs |> length) + 1
+        specs |> push <| ReducerSpec(slot = newSlot, name = fnName, innerBody = null, accType = accType)
+        return true
+    }
+    let canInnerSelect = (fnName == "sum" || fnName == "min" || fnName == "max"
+            || fnName == "first" || fnName == "average")
+    if (!canInnerSelect) return true
+    let inner = c.arguments[0]
+    if (inner == null || !(inner is ExprCall)) return true
+    let ic = inner as ExprCall
+    if (ic.func == null || (ic.arguments |> length) != 2 || !is_bind_field_access(ic.arguments[0], hbName, 1)) return true
+    var innerName = string(ic.func.name)
+    if (ic.func.fromGeneric != null) {
+        innerName = string(ic.func.fromGeneric.name)
+    }
+    if (innerName != "select") return true
+    let comboName = "{fnName}_inner_select"
+    for (s in specs) {
+        // Match against a select-side spec (visible slot): v1 trusts the user wrote identical lambdas (pre-existing PR-D limitation, documented in LINQ.md). Match against an already-added hidden spec from this having scan would silently route two different inner lambdas to the same slot — bail to tier 2 instead, where each select() is evaluated separately.
+        if (s.name == comboName) return s.slot <= userVisibleSlotCount
+    }
+    var innerBody = fold_linq_cond(clone_expression(ic.arguments[1]), itName)
+    if (innerBody == null || innerBody._type == null) return false
+    var accType = mk_hidden_acc_type(comboName, true, bucketElemType, innerBody._type, at)
+    if (accType == null) return false
+    let newSlot = (specs |> length) + 1
+    specs |> push <| ReducerSpec(slot = newSlot, name = comboName, innerBody = innerBody, accType = accType)
+    return true
+}
+
+[macro_function]
+def private extend_specs_for_missing_having_reducers(var expr : Expression?; hbName, itName : string;
+                                                     var specs : array<ReducerSpec>; userVisibleSlotCount : int;
+                                                     bucketElemType : TypeDeclPtr; at : LineInfo) : bool {
+    // Pre-rewrite walker that appends hidden ReducerSpecs for having-only reducers; unrecognized shapes leak to rewrite_having_pred (which bails on any surviving raw `<hb>`).
+    if (expr == null) return true
+    while (expr is ExprRef2Value) {
+        expr = (expr as ExprRef2Value).subexpr
+    }
+    if (expr is ExprCall) {
+        var c = expr as ExprCall
+        if (!having_reducer_append_if_missing(expr, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)) return false
+        for (i in 0 .. (c.arguments |> length)) {
+            if (!extend_specs_for_missing_having_reducers(c.arguments[i], hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)) return false
+        }
+        return true
+    }
+    if (expr is ExprField) {
+        var f = expr as ExprField
+        return extend_specs_for_missing_having_reducers(f.value, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)
+    }
+    if (expr is ExprOp1) {
+        var op = expr as ExprOp1
+        return extend_specs_for_missing_having_reducers(op.subexpr, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)
+    }
+    if (expr is ExprOp2) {
+        var op = expr as ExprOp2
+        if (!extend_specs_for_missing_having_reducers(op.left, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)) return false
+        return extend_specs_for_missing_having_reducers(op.right, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)
+    }
+    if (expr is ExprOp3) {
+        var op = expr as ExprOp3
+        return (extend_specs_for_missing_having_reducers(op.subexpr, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)
+                && extend_specs_for_missing_having_reducers(op.left, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at)
+                && extend_specs_for_missing_having_reducers(op.right, hbName, itName, specs, userVisibleSlotCount, bucketElemType, at))
+    }
+    return true
+}
+
 [macro_function]
 def private rewrite_having_pred(var expr : Expression?; hbName, kvName : string; specs : array<ReducerSpec>) : Expression? {
     // Substitutes `<hb>._0` → `kv._0` and `<reducer>(<hb>._1)` / `<reducer>(select(<hb>._1, <lam>))` → `kv._{slot}` (or divide expr for average) against matching specs; any other use of `<hb>` returns null and cascades.
@@ -2250,22 +2592,27 @@ def private plan_group_by(var expr : Expression?) : Expression? {
     if (groupProjBody == null || groupProjBody._type == null) return null
     var (specs, usesNamedTuple) = recognize_reducer_specs(groupProjBody, bindName, itName)
     if (empty(specs)) return null
-    var hasAvg = false
-    for (s in specs) {
-        if (is_average_spec(s)) {
-            hasAvg = true
-        }
-    }
-    // Rewrite optional having predicate against accumulator slots; bails on anything other than _._0 / reducer(_._1) matched to a select slot.
+    let userVisibleSlotCount = specs |> length
+    // Rewrite optional having predicate against accumulator slots; bails on anything other than _._0 / reducer(_._1) matched to a select-side slot or a synthesized hidden slot (PR-E).
     var havingPred : Expression?
     if (havingCall != null) {
         let hbName = "`hb`{at.line}`{at.column}"
         var rawPred = fold_linq_cond(havingCall.arguments[1], hbName)
-        if (rawPred == null) return null
+        // PR-E: scan first to add hidden slots for reducers referenced by having but absent from select.
+        if (rawPred == null || !extend_specs_for_missing_having_reducers(rawPred, hbName, itName, specs, userVisibleSlotCount, elemType, at)) return null
         havingPred = rewrite_having_pred(rawPred, hbName, kvName, specs)
         // Safety net: a node type rewrite_having_pred didn't recurse into would leak hbName through to the splice (compile error at the splice site). Bail.
         if (havingPred == null || expr_uses_var(havingPred, hbName)) return null
     }
+    let hasHidden = (specs |> length) > userVisibleSlotCount
+    // Bare form + hidden slot would require extending `tuple<KeyT; AccT>` synthesized inside a qmacro with embedded `typedecl(invoke(...))` for the key type — can't dynamically grow that. Cascade.
+    if (hasHidden && !usesNamedTuple) return null
+    var hasAvg = false
+    for (s in specs) {
+        if (is_average_spec(s)) {
+            hasAvg = true
+        }
+    }
     // Named-tuple wrap reuses the user's body type (preserves field-name hints); bare reducer synthesizes tuple<KeyT; AccT>.
     var keyBlk1 = clone_expression(keyBlock)
     var keyBlk2 = clone_expression(keyBlock)
@@ -2278,11 +2625,25 @@ def private plan_group_by(var expr : Expression?) : Expression? {
         // Average slots store the 2-tuple <sum;count> internally; result-build divides at output.
         if (hasAvg) {
             for (s in specs) {
-                if (is_average_spec(s) && s.slot < (tabValueType.argTypes |> length)) {
+                if (is_average_spec(s) && s.slot <= userVisibleSlotCount && s.slot < (tabValueType.argTypes |> length)) {
                     tabValueType.argTypes[s.slot] = mk_avg_acc_type(at)
                 }
             }
         }
+        // PR-E: append hidden slots (synthesized for having-only reducers) after the user-visible slots; result-build re-synthesizes the user's tuple shape via ExprMakeTuple, omitting them.
+        if (hasHidden) {
+            let preserveNames = (tabValueType.argNames |> length) == (tabValueType.argTypes |> length)
+            for (s in specs) {
+                if (s.slot > userVisibleSlotCount) {
+                    tabValueType.argTypes |> push <| clone_type(s.accType)
+                    if (preserveNames) {
+                        let nameIdx = tabValueType.argNames |> length
+                        tabValueType.argNames |> resize(nameIdx + 1)
+                        tabValueType.argNames[nameIdx] := "_hidden_{s.slot}"
+                    }
+                }
+            }
+        }
         var tabValueType2 = clone_type(tabValueType)
         stmts |> push <| qmacro_expr() {
             var inscope $i(tabName) : table<typedecl(_::unique_key(invoke($e(keyBlk1), default<$t(elemType)>))); $t(tabValueType)>
@@ -2402,8 +2763,8 @@ def private plan_group_by(var expr : Expression?) : Expression? {
                     $i(kvName)._1
                 }
             }
-        } elif (hasAvg) {
-            // Tuple with avg slot(s): table value type stores tuple<double;uint64> per avg slot, so re-synthesize the user's tuple at result-build, dividing avg slots. Positional tuples (no field names) and named tuples both pass through recognize_reducer_specs — slot count comes from argTypes; recordNames only get filled when argNames is populated.
+        } elif (hasAvg || hasHidden) {
+            // Re-synthesize the user's tuple slot-by-slot: avg slots divide, hidden slots are omitted by iterating only user-visible argTypes; recordNames only when argNames are populated.
             var mt = new ExprMakeTuple(at = at)
             let slotCount = (groupProjBody._type.argTypes |> length)
             let hasNames = (groupProjBody._type.argNames |> length) == slotCount
diff --git a/tests/linq/test_linq_fold.das b/tests/linq/test_linq_fold.das
index 8d4e6971a..410a218db 100644
--- a/tests/linq/test_linq_fold.das
+++ b/tests/linq/test_linq_fold.das
@@ -1059,6 +1059,188 @@ def test_contains_early_exit(t : T?) {
     }
 }
 
+[test]
+def test_last_terminal_walk(t : T?) {
+    t |> run("last: many returns final element") @(t : T?) {
+        let arr <- [7, 8, 9]
+        let l = _fold(each(arr).last())
+        t |> equal(9, l)
+    }
+    t |> run("last: where matches returns final survivor") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let l = _fold(each(arr)._where(_ < 4).last())
+        t |> equal(3, l)
+    }
+    t |> run("last: select projection returns projected last") @(t : T?) {
+        let arr <- [1, 2, 3]
+        let l = _fold(each(arr)._select(_ * 10).last())
+        t |> equal(30, l)
+    }
+    t |> run("last: parity vs plain linq") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5, 6]
+        let spliced = _fold(each(arr)._where(_ % 2 == 0).last())
+        let reference = arr |> where_($(x : int) => x % 2 == 0) |> last()
+        t |> equal(reference, spliced)
+    }
+}
+
+[test]
+def test_last_or_default_terminal_walk(t : T?) {
+    t |> run("last_or_default: empty returns default") @(t : T?) {
+        let arr : array<int>
+        let l = _fold(each(arr).last_or_default(99))
+        t |> equal(99, l)
+    }
+    t |> run("last_or_default: non-empty returns last") @(t : T?) {
+        let arr <- [7, 8, 9]
+        let l = _fold(each(arr).last_or_default(99))
+        t |> equal(9, l)
+    }
+    t |> run("last_or_default: where no match returns default") @(t : T?) {
+        let arr <- [1, 2, 3]
+        let l = _fold(each(arr)._where(_ > 100).last_or_default(-1))
+        t |> equal(-1, l)
+    }
+    t |> run("last_or_default: where matches returns last survivor") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let l = _fold(each(arr)._where(_ < 4).last_or_default(-1))
+        t |> equal(3, l)
+    }
+}
+
+[test]
+def test_single_terminal_walk(t : T?) {
+    t |> run("single: exactly one match returns it") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let s = _fold(each(arr)._where(_ == 3).single())
+        t |> equal(3, s)
+    }
+    t |> run("single: select projection on single element") @(t : T?) {
+        let arr <- [4]
+        let s = _fold(each(arr)._select(_ * 10).single())
+        t |> equal(40, s)
+    }
+}
+
+[test]
+def test_single_or_default_terminal_walk(t : T?) {
+    t |> run("single_or_default: empty returns default") @(t : T?) {
+        let arr : array<int>
+        let s = _fold(each(arr).single_or_default(99))
+        t |> equal(99, s)
+    }
+    t |> run("single_or_default: exactly one match returns it") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let s = _fold(each(arr)._where(_ == 3).single_or_default(-1))
+        t |> equal(3, s)
+    }
+    t |> run("single_or_default: more than one match returns default") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let s = _fold(each(arr)._where(_ > 2).single_or_default(-1))
+        t |> equal(-1, s)
+    }
+    t |> run("single_or_default: where no match returns default") @(t : T?) {
+        let arr <- [1, 2, 3]
+        let s = _fold(each(arr)._where(_ > 100).single_or_default(-1))
+        t |> equal(-1, s)
+    }
+}
+
+[test]
+def test_element_at_terminal_walk(t : T?) {
+    t |> run("element_at: index 0 returns first") @(t : T?) {
+        let arr <- [10, 20, 30]
+        let e = _fold(each(arr).element_at(0))
+        t |> equal(10, e)
+    }
+    t |> run("element_at: index N returns N-th") @(t : T?) {
+        let arr <- [10, 20, 30, 40, 50]
+        let e = _fold(each(arr).element_at(3))
+        t |> equal(40, e)
+    }
+    t |> run("element_at: with where filter") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5, 6, 7]
+        // Evens: [2,4,6]. element_at(1) → 4.
+        let e = _fold(each(arr)._where(_ % 2 == 0).element_at(1))
+        t |> equal(4, e)
+    }
+    t |> run("element_at: with select projection") @(t : T?) {
+        let arr <- [1, 2, 3]
+        // Projected: [10,20,30]. element_at(2) → 30.
+        let e = _fold(each(arr)._select(_ * 10).element_at(2))
+        t |> equal(30, e)
+    }
+    t |> run("element_at: parity vs plain linq") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5, 6, 7]
+        let spliced = _fold(each(arr)._where(_ > 2).element_at(2))
+        let reference = arr |> where_($(x : int) => x > 2) |> element_at(2)
+        t |> equal(reference, spliced)
+    }
+}
+
+[test]
+def test_element_at_or_default_terminal_walk(t : T?) {
+    t |> run("element_at_or_default: empty returns default") @(t : T?) {
+        let arr : array<int>
+        let e = _fold(each(arr).element_at_or_default(0))
+        t |> equal(0, e)
+    }
+    t |> run("element_at_or_default: in-range returns it") @(t : T?) {
+        let arr <- [10, 20, 30]
+        let e = _fold(each(arr).element_at_or_default(1))
+        t |> equal(20, e)
+    }
+    t |> run("element_at_or_default: out-of-range returns default") @(t : T?) {
+        let arr <- [10, 20, 30]
+        let e = _fold(each(arr).element_at_or_default(100))
+        t |> equal(0, e)
+    }
+    t |> run("element_at_or_default: negative index returns default (no panic)") @(t : T?) {
+        let arr <- [10, 20, 30]
+        let e = _fold(each(arr).element_at_or_default(-5))
+        t |> equal(0, e)
+    }
+}
+
+[test]
+def test_aggregate_terminal_walk(t : T?) {
+    t |> run("aggregate: empty returns seed") @(t : T?) {
+        let arr : array<int>
+        let a = _fold(each(arr).aggregate(42, $(acc : int, x : int) => acc + x))
+        t |> equal(42, a)
+    }
+    t |> run("aggregate: sum of doubles") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        let a = _fold(each(arr).aggregate(0, $(acc : int, x : int) => acc + x * 2))
+        t |> equal(30, a)
+    }
+    t |> run("aggregate: with where filter") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5]
+        // Odds: 1,3,5. sum-of-squares: 1+9+25 = 35.
+        let a = _fold(each(arr)._where(_ % 2 == 1).aggregate(0, $(acc : int, x : int) => acc + x * x))
+        t |> equal(35, a)
+    }
+    t |> run("aggregate: with select chain") @(t : T?) {
+        let arr <- [1, 2, 3]
+        // After select(*10): [10,20,30]. sum: 60.
+        let a = _fold(each(arr)._select(_ * 10).aggregate(0, $(acc : int, x : int) => acc + x))
+        t |> equal(60, a)
+    }
+    t |> run("aggregate: non-workhorse string seed") @(t : T?) {
+        let arr <- [1, 2, 3]
+        // String join — exercises the non-workhorse `<-` init/update/return path.
+        let a = _fold(each(arr).aggregate("",
+            $(acc : string, x : int) => acc == "" ? "{x}" : "{acc},{x}"))
+        t |> equal("1,2,3", a)
+    }
+    t |> run("aggregate: parity vs plain linq") @(t : T?) {
+        let arr <- [1, 2, 3, 4, 5, 6]
+        let spliced = _fold(each(arr)._where(_ % 2 == 0).aggregate(1, $(acc : int, x : int) => acc * x))
+        let reference = arr |> where_($(x : int) => x % 2 == 0) |> aggregate(1, $(acc : int, x : int) => acc * x)
+        t |> equal(reference, spliced)
+    }
+}
+
 [test]
 def test_take_skip_counter_lane(t : T?) {
     t |> run("counter: where.take.count") @(t : T?) {
@@ -2217,19 +2399,48 @@ def test_group_by_having_unhandled_node_cascades(t : T?) {
 }
 
 [test]
-def test_group_by_having_unmatched_reducer_cascades(t : T?) {
-    // PR-D bail: having references a reducer not present in select. Splice cascades to tier 2;
-    // semantics must still be correct via the array-shape pipeline.
-    t |> run("having uses sum but select only has length") @(tt : T?) {
-        let arr = [1, 2, 3, 4, 5, 6]
-        // Buckets %2: 0=[2,4,6](sum=12, len=3), 1=[1,3,5](sum=9, len=3). Having sum > 10 → 0.
-        let got <- _fold(arr._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select((K = _._0, N = _._1 |> length)))
+def test_group_by_having_two_same_named_inner_select_diff_lambdas(t : T?) {
+    // PR-E + Copilot review: when having has TWO same-named inner-select reducers with
+    // DIFFERENT lambdas, naive dedup-by-name would route both predicate terms to the same
+    // accumulator slot — silently wrong. Pinning correct semantics.
+    t |> run("two hidden inner-select sums (different lambdas) compute distinctly") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6, 7, 8]
+        // Bucket %3: 0=[3,6]: sum(x*2)=18, sum(x*3)=27, total=45.
+        //            1=[1,4,7]: sum(x*2)=24, sum(x*3)=36, total=60.
+        //            2=[2,5,8]: sum(x*2)=30, sum(x*3)=45, total=75.
+        // Having: sum(x*2) + sum(x*3) > 50 → buckets 1 (60>50) and 2 (75>50) pass; bucket 0 (45) fails.
+        let got <- _fold(arr._group_by_lazy(_ % 3)
+                            ._having((_._1 |> select($(x : int) => x * 2) |> sum)
+                                + (_._1 |> select($(x : int) => x * 3) |> sum) > 50)
+                            ._select((K = _._0, N = _._1 |> length)))
         var seen : table<int; int>
         for (kv in got) {
             seen[kv.K] = kv.N
         }
-        tt |> equal(1, seen |> length)
-        tt |> equal(3, seen[0])
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[1])
+        tt |> equal(3, seen[2])
+    }
+}
+
+[test]
+def test_group_by_having_bare_form_with_hidden_cascades(t : T?) {
+    // PR-E bail: bare-form group_proj combined with a having reducer that needs a hidden slot
+    // falls back to tier 2 (the bare table layout uses `tuple<KeyT; AccT>` synthesized inside a
+    // qmacro with embedded `typedecl(invoke(...))` for the key type, which we can't extend with
+    // a dynamic count of additional acc slots). Semantics must still be correct via cascade.
+    t |> run("bare-form select with hidden sum in having cascades") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6]
+        // Buckets %2: 0=[2,4,6](sum=12, len=3), 1=[1,3,5](sum=9, len=3). Having sum > 10 → 0.
+        let got <- _fold(arr._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select(_._1 |> length))
+        var cnt = 0
+        var total = 0
+        for (v in got) {
+            cnt ++
+            total += v
+        }
+        tt |> equal(1, cnt)
+        tt |> equal(3, total)
     }
 }
 
@@ -2284,6 +2495,123 @@ def test_reverse_first_or_default_parity(t : T?) {
     }
 }
 
+[test]
+def test_group_by_having_hidden_slot_fold_parity(t : T?) {
+    // PR-E: having references a reducer not present in select — splice synthesizes a hidden
+    // accumulator slot in the internal table value type; result-build projects only the
+    // user-visible slots.
+    t |> run("hidden sum: having sum>N, select length") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6]
+        // Buckets %2: 0=[2,4,6](sum=12, len=3), 1=[1,3,5](sum=9, len=3). Having sum > 10 → 0.
+        let got <- _fold(arr._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(1, seen |> length)
+        tt |> equal(3, seen[0])
+    }
+    t |> run("hidden min: having min<K, select length") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12, 22, 32]
+        // Buckets %3: 0=[30,21,12](min=12), 1=[10,31,22](min=10), 2=[20,11,32](min=11). Having min<12 → 1,2.
+        let got <- _fold(arr._group_by_lazy(_ % 3)._having(_._1 |> min < 12)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[1])
+        tt |> equal(3, seen[2])
+    }
+    t |> run("hidden max: having max>=N, select length") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        // Buckets %3: 0=[30,21,12](max=30), 1=[10,31](max=31), 2=[20,11](max=20). Having max>=30 → 0,1.
+        let got <- _fold(arr._group_by_lazy(_ % 3)._having(_._1 |> max >= 30)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[0])
+        tt |> equal(2, seen[1])
+    }
+    t |> run("hidden average: having average>F, select length") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        // Buckets %3: 0=[30,21,12](avg=21.0), 1=[10,31](avg=20.5), 2=[20,11](avg=15.5). Having avg>16 → 0,1.
+        let got <- _fold(arr._group_by_lazy(_ % 3)._having(_._1 |> average > 16.0lf)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[0])
+        tt |> equal(2, seen[1])
+    }
+    t |> run("hidden sum + visible length used in same predicate") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6]
+        // 0=[2,4,6](sum=12, len=3), 1=[1,3,5](sum=9, len=3). Having sum>=9 && len>=3 → both pass.
+        let got <- _fold(arr._group_by_lazy(_ % 2)._having((_._1 |> sum) >= 9 && (_._1 |> length) >= 3)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[0])
+        tt |> equal(3, seen[1])
+    }
+    t |> run("two hidden reducers: sum + min, select length") @(tt : T?) {
+        let arr = [10, 20, 30, 11, 21, 31, 12]
+        // 0=[30,21,12](sum=63, min=12), 1=[10,31](sum=41, min=10), 2=[20,11](sum=31, min=11). Having sum>40 && min<15 → 0,1.
+        let got <- _fold(arr._group_by_lazy(_ % 3)._having((_._1 |> sum) > 40 && (_._1 |> min) < 15)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[0])
+        tt |> equal(2, seen[1])
+    }
+    t |> run("hidden inner-select sum (lambda peelable)") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6, 7, 8]
+        // Buckets %3: 0=[3,6] doubled=[6,12](sum=18), 1=[1,4,7] doubled=[2,8,14](sum=24), 2=[2,5,8] doubled=[4,10,16](sum=30). Having sum>20 → 1,2.
+        let got <- _fold(arr._group_by_lazy(_ % 3)._having(_._1 |> select($(x : int) => x * 2) |> sum > 20)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[1])
+        tt |> equal(3, seen[2])
+    }
+    t |> run("count terminator with hidden sum") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6]
+        // 0=[2,4,6](sum=12), 1=[1,3,5](sum=9). Having sum>10 → 1 bucket.
+        let c = _fold(arr._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select((K = _._0, N = _._1 |> length)) |> count())
+        tt |> equal(1, c)
+    }
+    t |> run("upstream where + hidden sum in having") @(tt : T?) {
+        let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        // where >2 then group %3: 0=[3,6,9](sum=18), 1=[4,7,10](sum=21), 2=[5,8](sum=13). Having sum>=18 → 0,1.
+        let got <- _fold(arr._where(_ > 2)._group_by_lazy(_ % 3)._having(_._1 |> sum >= 18)._select((K = _._0, N = _._1 |> length)))
+        var seen : table<int; int>
+        for (kv in got) {
+            seen[kv.K] = kv.N
+        }
+        tt |> equal(2, seen |> length)
+        tt |> equal(3, seen[0])
+        tt |> equal(3, seen[1])
+    }
+    t |> run("empty source") @(tt : T?) {
+        var empty : array<int>
+        let got <- _fold(empty._group_by_lazy(_)._having(_._1 |> sum > 0)._select((K = _._0, N = _._1 |> length)))
+        var cnt = 0
+        for (_x in got) {
+            cnt ++
+        }
+        tt |> equal(0, cnt)
+    }
+}
+
 [test]
 def test_top_n_mid_chain_iterator_source(t : T?) {
     // Regression — `top_n_*` registered in linqCalls but always returns array regardless of
diff --git a/tests/linq/test_linq_fold_ast.das b/tests/linq/test_linq_fold_ast.das
index 5b96061f5..d56e195de 100644
--- a/tests/linq/test_linq_fold_ast.das
+++ b/tests/linq/test_linq_fold_ast.das
@@ -2403,17 +2403,71 @@ def test_group_by_having_count_terminator_splices(t : T?) {
 }
 
 [export, marker(no_coverage)]
-def target_group_by_having_unmatched_cascades() : array<tuple<K : int; N : int>> {
-    // BAIL: having uses sum but the select has only length — no matching slot, so the planner
-    // returns null and the chain cascades to tier 2.
+def target_group_by_having_hidden_sum_fold() : array<tuple<K : int; N : int>> {
+    // PR-E: having uses sum but select has only length — planner synthesizes a hidden sum slot
+    // in the internal table value type; result-build projects only the user-visible (K, N).
     return <- _fold([1, 2, 3, 4, 5, 6]._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select((K = _._0, N = _._1 |> length)))
 }
 
 [test]
-def test_group_by_having_unmatched_cascades_to_tier2(t : T?) {
+def test_group_by_having_hidden_sum_splices(t : T?) {
+    // PR-E splice fingerprint: no group_by_lazy or having_ runtime calls; single-hash hot path.
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_group_by_having_hidden_sum_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched, "expected splice wrapper")
+        t |> equal(0, count_call(body_expr, "group_by_lazy"))
+        t |> equal(0, count_call(body_expr, "having_"))
+        t |> equal(0, count_call(body_expr, "sum"), "hidden sum inlined into += hit branch")
+        t |> equal(0, count_call(body_expr, "key_exists"), "single-hash hot path")
+        t |> success(count_call(body_expr, "unique_key") >= 1, "per-element unique_key")
+    }
+}
+
+[export, marker(no_coverage)]
+def target_group_by_having_two_same_named_inner_select_cascades() : array<tuple<K : int; N : int>> {
+    // PR-E bail: two same-named inner-select reducers in the same having clause can't be
+    // distinguished by name alone — splicing would silently conflate the two lambdas onto
+    // one hidden slot. Bail to tier 2 so each select() is evaluated separately.
+    return <- _fold([1, 2, 3, 4, 5, 6, 7, 8]._group_by_lazy(_ % 3)
+                       ._having((_._1 |> select($(x : int) => x * 2) |> sum)
+                           + (_._1 |> select($(x : int) => x * 3) |> sum) > 50)
+                       ._select((K = _._0, N = _._1 |> length)))
+}
+
+[test]
+def test_group_by_having_two_same_named_inner_select_cascades_to_tier2(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_group_by_having_two_same_named_inner_select_cascades)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return <- $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper from cascade")
+        t |> success(count_outer_let_vars(body_expr) >= 2,
+            "cascade fingerprint: multiple pass_N outer vars")
+    }
+}
+
+[export, marker(no_coverage)]
+def target_group_by_having_hidden_bare_form_cascades() : array<int> {
+    // PR-E bail: bare-form group_proj (`_._1 |> length`, not a named tuple) combined with a
+    // having reducer that needs a hidden slot — the bare table layout uses `tuple<KeyT; AccT>`
+    // synthesized inside a qmacro with embedded `typedecl(invoke(...))`, which can't be
+    // extended with a dynamic count of additional acc slots. Falls back to tier 2.
+    return <- _fold([1, 2, 3, 4, 5, 6]._group_by_lazy(_ % 2)._having(_._1 |> sum > 10)._select(_._1 |> length))
+}
+
+[test]
+def test_group_by_having_hidden_bare_form_cascades_to_tier2(t : T?) {
     // Cascade fingerprint: multiple pass_N let-vars (group, having, select all separate).
     ast_gc_guard() {
-        var func = find_module_function_via_rtti(compiling_module(), @@target_group_by_having_unmatched_cascades)
+        var func = find_module_function_via_rtti(compiling_module(), @@target_group_by_having_hidden_bare_form_cascades)
         if (func == null) return
         var body_expr : ExpressionPtr
         let r = qmatch_function(func) $() {
@@ -2574,3 +2628,244 @@ def test_reverse_select_pushes_projection_not_source(t : T?) {
     }
 }
 
+// ── Phase 3+ terminal-walk lane: last / single / element_at / aggregate ─────
+
+[export, marker(no_coverage)]
+def target_last_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ < 4).last())
+}
+
+[export, marker(no_coverage)]
+def target_last_or_default_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ > 99).last_or_default(-1))
+}
+
+[export, marker(no_coverage)]
+def target_single_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ == 3).single())
+}
+
+[export, marker(no_coverage)]
+def target_single_or_default_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ > 99).single_or_default(-1))
+}
+
+[export, marker(no_coverage)]
+def target_element_at_fold() : int {
+    return _fold(each([10, 20, 30, 40, 50]).element_at(3))
+}
+
+[export, marker(no_coverage)]
+def target_element_at_or_default_fold() : int {
+    return _fold(each([10, 20, 30]).element_at_or_default(100))
+}
+
+[export, marker(no_coverage)]
+def target_aggregate_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5]).aggregate(0, $(acc : int, x : int) => acc + x * 2))
+}
+
+[export, marker(no_coverage)]
+def target_aggregate_with_where_fold() : int {
+    return _fold(each([1, 2, 3, 4, 5])._where(_ % 2 == 1).aggregate(0, $(acc : int, x : int) => acc + x * x))
+}
+
+[test]
+def test_last_splices(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_last_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "last: one for-loop (full walk)")
+        // `last` tail panics on empty source; verify the splice emitted it.
+        t |> success(count_call(body_expr, "panic") >= 1, "last: panic call in tail for empty case")
+    }
+}
+
+[test]
+def test_last_or_default_no_panic(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_last_or_default_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "last_or_default: one for-loop")
+        // `last_or_default` returns the bound default; never panics.
+        t |> equal(0, count_call(body_expr, "panic"), "last_or_default: no panic call")
+    }
+}
+
+[test]
+def test_single_splices(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_single_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "single: one for-loop")
+        // `single` panics twice in the splice: once for >1 match (in-loop), once for empty (tail).
+        t |> success(count_call(body_expr, "panic") >= 2, "single: two panic call sites")
+    }
+}
+
+[test]
+def test_single_or_default_no_panic(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_single_or_default_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "single_or_default: one for-loop")
+        t |> equal(0, count_call(body_expr, "panic"), "single_or_default: no panic")
+    }
+}
+
+[test]
+def test_element_at_splices(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_element_at_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "element_at: one for-loop")
+        // `element_at` emits two panic call sites in the splice (pre-loop for `idx < 0`,
+        // tail for "index out of range"); const-fold may eliminate the pre-loop branch
+        // when the literal index is non-negative, so >= 1 is the guarantee.
+        t |> success(count_call(body_expr, "panic") >= 1, "element_at: at least the tail panic survives")
+        // Counter `++` increments the in-flight match counter.
+        t |> success(count_op1(body_expr, "++") >= 1, "element_at: counter increment present")
+    }
+}
+
+[test]
+def test_element_at_or_default_no_panic(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_element_at_or_default_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "element_at_or_default: one for-loop")
+        t |> equal(0, count_call(body_expr, "panic"), "element_at_or_default: no panic")
+    }
+}
+
+// Count ExprInvoke nodes anywhere in the subtree. The outer `_fold` splice wrapper
+// is an ExprInvoke; per-element block calls (when peel fails) also emit ExprInvoke.
+// Peeled aggregate bodies inline the block expression and emit zero inner invokes.
+def count_invoke_nodes(expr : Expression?) : int {
+    if (expr == null) return 0
+    var n = 0
+    if (expr is ExprInvoke) {
+        n ++
+    }
+    if (expr is ExprBlock) {
+        let b = expr as ExprBlock
+        for (s in b.list) {
+            n += count_invoke_nodes(s)
+        }
+        for (s in b.finalList) {
+            n += count_invoke_nodes(s)
+        }
+    } elif (expr is ExprFor) {
+        let f = expr as ExprFor
+        for (s in f.sources) {
+            n += count_invoke_nodes(s)
+        }
+        n += count_invoke_nodes(f.body)
+    } elif (expr is ExprIfThenElse) {
+        let i = expr as ExprIfThenElse
+        n += count_invoke_nodes(i.cond)
+        n += count_invoke_nodes(i.if_true)
+        n += count_invoke_nodes(i.if_false)
+    } elif (expr is ExprOp2) {
+        let o = expr as ExprOp2
+        n += count_invoke_nodes(o.left)
+        n += count_invoke_nodes(o.right)
+    } elif (expr is ExprCall) {
+        let c = expr as ExprCall
+        for (a in c.arguments) {
+            n += count_invoke_nodes(a)
+        }
+    } elif (expr is ExprMakeBlock) {
+        let mb = expr as ExprMakeBlock
+        n += count_invoke_nodes(mb._block)
+    } elif (expr is ExprInvoke) {
+        let inv = expr as ExprInvoke
+        for (a in inv.arguments) {
+            n += count_invoke_nodes(a)
+        }
+    } elif (expr is ExprReturn) {
+        let r = expr as ExprReturn
+        n += count_invoke_nodes(r.subexpr)
+    } elif (expr is ExprOp1) {
+        let o = expr as ExprOp1
+        n += count_invoke_nodes(o.subexpr)
+    } elif (expr is ExprLet) {
+        let l = expr as ExprLet
+        for (v in l.variables) {
+            if (v != null && v.init != null) {
+                n += count_invoke_nodes(v.init)
+            }
+        }
+    }
+    return n
+}
+
+[test]
+def test_aggregate_splices_peeled(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_aggregate_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "aggregate: one for-loop")
+        // Single-return block body `acc + x * 2` peeled into `+` and `*` ops; the only
+        // ExprInvoke is the outer `_fold` splice wrapper. A non-peeled splice would emit
+        // a per-element ExprInvoke against the user's block.
+        t |> equal(1, count_invoke_nodes(body_expr),
+            "aggregate: only the outer invoke wrapper, NOT a per-element block invoke")
+        t |> success(count_op2(body_expr, "+") >= 1, "aggregate: `+` from peeled body")
+        t |> success(count_op2(body_expr, "*") >= 1, "aggregate: `*` from peeled body")
+    }
+}
+
+[test]
+def test_aggregate_with_where_fuses(t : T?) {
+    ast_gc_guard() {
+        var func = find_module_function_via_rtti(compiling_module(), @@target_aggregate_with_where_fold)
+        if (func == null) return
+        var body_expr : ExpressionPtr
+        let r = qmatch_function(func) $() {
+            return $e(body_expr)
+        }
+        t |> success(r.matched && body_expr is ExprInvoke, "expected invoke wrapper")
+        t |> equal(1, count_inner_for_loops(body_expr), "aggregate+where: one for-loop")
+        // Where predicate (`_ % 2 == 1`) inlined alongside peeled aggregate body. The
+        // `where_` library helper must NOT appear as a runtime call.
+        t |> equal(0, count_call(body_expr, "where_"), "aggregate+where: where_ helper inlined")
+        t |> equal(1, count_invoke_nodes(body_expr), "aggregate+where: only outer invoke wrapper")
+    }
+}
+