[pull] master from GaijinEntertainment:master by pull[bot] · Pull Request #1010 · forksnd/daScript

pull · 2026-05-19T02:58:21Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Three new planners slot into the _fold cascade between plan_order_family and plan_loop_or_count: - plan_reverse: src [|> where_(p)]* [|> select(proj)]? |> reverse [|> take(N)]? [|> {to_array, count, first, first_or_default}] Buffer-required terminators emit single prefilter pass into buffer + reverse_inplace + optional resize(min(N, length(buf))). count elides the buffer (counter loop only); first / first_or_default emit a last-survivor tracker. - plan_distinct: src [|> where_(p)]* [|> select(proj)]? |> distinct[_by(key)] [|> take(N)]? [|> terminator] Streaming dedup via table<unique_key> integrated into the prefilter loop. take(N) → break-after-N early-exit (guard placed BEFORE the consume site so non-positive N short-circuits). count terminator elides the buffer and returns length(seen); sum / long_count fold inline. - plan_group_by: src |> group_by[_lazy](key) |> select(group_proj) [|> {to_array, count}]? Incremental-aggregate fusion via table<unique_key; tuple<KT; AccT>>. Recognized reducers: _._1 |> {length, count, sum, long_count}. Bare + named-tuple wrap (K=_._0, V=_._1|>reducer) both supported; named-tuple preserves user field-name hints by using group_proj._type as the table value type. Bails to tier 2 for inner-select-sum, multi-reducer, having_, side-effectful keys. Headlines (100K rows INTERP): - distinct_count m3f: 33 → 15 ns/op (2.2× over prev; beats SQL m1=41) - distinct_take m3f: new — 0 ns/op (early-exit at 3 unique brands) - groupby_count m3f: 76 → 33 ns/op (2.3× over prev; 4.3× over SQL m1=141) - reverse_take m3f: regression vs m3 (lazy reverse iterator wins for small N) - groupby_sum m3f: unchanged (inner-select-sum deferred — see follow-up plan) is_buffer_required_op trimmed to {zip, join, left_join, group_join} — the named-marker arm in plan_loop_or_count is now pure defense-in-depth because the three new planners catch their ops first. New benchmarks: benchmarks/sql/reverse_take.das, benchmarks/sql/distinct_take.das. LINQ.md updated with phase status + new section + coverage table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `options _comment_hygiene = true` so daslib/style_lint enforces STYLE014 (3-line cap on module/public //) and STYLE015 (1-line cap on private //) at compile time. Stale comment blocks accumulated across the three-tier cascade landings (PRs #2687, #2697, #2712, #2714, #2721). Cleanup performed by utils/hygiene/ for //! private-docstring stripping, then a lint-driven script that keeps the first WHY line of each multi-line `//` block flagged by STYLE014/STYLE015 and drops the rest. Net: 383 lines removed, 10 added — file shrunk 2294 → 1919 lines. All 867 tests/linq still pass (INTERP + AOT). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds four small helpers used by plan_reverse / plan_distinct / plan_group_by: - merge_where_cond(curr, pred) — collapses the 5-line "if-null-else-&&" pattern that fused chained where_ predicates into a one-liner. - strip_const_ref(t) — clears constant + ref flags on a cloned TypeDecl; collapses the 3-line guarded assignment pattern (was repeated 5×). - finalize_emission(top, srcName, at, body) — wraps a single body Expression in the canonical `invoke($($i(src) : T) { $e(body) }, src)` emission. Used by plan_reverse where the body is a qmacro_block. - finalize_emission_stmts(top, srcName, at, stmts) — same wrap but splices a stmts array via $b. Used by plan_distinct + plan_group_by (their nested stmt construction is parse-fragile inside qmacro_block). - buffer_return(bufName, isIterator) — emits the `return <- buf` / `return <- buf.to_sequence_move()` tail; collapses the 7-line if/else pattern (was repeated 3×). plan_reverse also gets a structural refactor to qmacro_block bodies per branch (count, first/first_or_default, to_array), reading top-to- bottom as the emission shape instead of N × "stmts |> push <| qmacro_expr". plan_distinct + plan_group_by keep their stmts-array structure (cross- qmacro $i tag resolution fails on nested qmacro_block in those shapes — the inner-block tag scope doesn't unify with the outer block's), but adopt the shared helpers where applicable. Net: 180 lines removed, 102 added — file shrunk 1919 → 1841 lines. All 867 tests/linq pass interpret + AOT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…fixes Addresses Copilot review on PR #2721: 1. plan_distinct: select(impure) → distinct evaluated projection twice per source element (once for the unique_key, once for buffer push or += acc), firing observable side effects twice vs once in the original LINQ chain. Fix: when projection has side effects, bind projected value once per element to a fresh `pv` local; key + push/acc share the bind. Pure projections still inline (matches plan_loop_or_count's discipline). 2. plan_distinct: take(N) limit was cloned into the consume body, so a side-effecting take arg fired once per fresh distinct key. Fix: bind the limit once at outer scope (alongside `taken = 0`); guard reads the local. Same single-eval discipline as the projection bind. 3. LINQ.md: corrected stale text claiming `is_buffer_required_op` was trimmed to {zip, join, left_join, group_join} — it retains all buffer- required names, but distinct/reverse/group_by are now defense-in-depth (caught earlier by dedicated planners). 4. test_linq_fold_ast.das: trimmed stale comment claiming named-tuple field names collapse to positional access — splice DOES preserve K/N field names via the user's group_proj body type as table value. New test: test_select_distinct_impure_correctness pins the single-eval behavior (projection fires N times for N source elements, not 2N). All 869 tests/linq pass interpret + AOT (+2 new vs prior). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ening Bundle addressing Copilot review pass on PR #2721: - Formatter fix (3 lines in linq_fold.das): `- const - &` → `-const -&` (no spaces). darwin15 extended_checks formatter verify pass was failing CI. - Per-key group_by test assertions: `test_group_by_count_named_tuple_preserves_keys` and `test_group_by_sum_named_tuple_parity` now check per-bucket count/sum (not just total). Catches wrong-bucket regressions that preserve aggregate totals. - LINQ.md doc tightening: - distinct terminator list: drop `first`/`first_or_default` (not implemented; those cascade to tier 2) - distinct pseudo-code: take-guard BEFORE seen.insert/buf.push_clone (matches actual emission order — `take(0)` doesn't mutate seen-table or buffer) - reverse_take regression explanation: rewrite root cause. Both m3 and m3f materialize the full source buffer; m3's iterator-form `reverse()` yields backwards lazily via index so `take(N)` only triggers N yields; m3f does a full reverse_inplace before resize(N). For N << length the lazy approach wins. Future optimization: backward-index loop bounded to last N — deferred. - reverse_take.das header comment: drop "headline win" wording (it's a regression vs m3, not a win) - BufferGroupBy phase row + bail list: narrow to `src |> group_by_lazy(...)` (no upstream where_/select* fusion in v1 — deferred). Also clarify that side-effectful key bodies are NOT a bail — the emitted code evaluates the key once per element (same as plain group_by_lazy), so no re-eval hazard exists. Verification: 95/95 AST tests, 234/234 functional tests, 329/329 AOT, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d comments Round-2 Copilot review on PR #2721 — 4 doc/comment-only fixes: - LINQ.md:341 (distinct buffer-not-required terminators): clarify long_count is count-shaped (returns int64(length(seen))), not an inline accumulator — only sum folds inline at the freshly-inserted site with acc += <projection>. - LINQ.md:345 (plan_group_by recognizer description): drop reference to a non-existent `try_recognize_group_reducer` rewriter and return tuple. Actual recognizer is is_bucket_reducer_call (linq_fold.das:1666), inline accumulator update — no separate two-pass rewriter. - linq_fold.das:1694 — finish truncated "yields bucket" comment ("yields raw buckets (no fusion)"). - linq_fold.das:1755 — finish truncated "UniqKeyType wraps with" comment ("key is typedecl(keyBlock(default<elemT>)) wrapped in _::unique_key for the hash slot"). No code changes. Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d comment Round-3 Copilot review on PR #2721 — one real bug + one trailing nit: - plan_group_by previously assigned entry._0 = k on EVERY element. Reference group_by_lazy_impl (linq.das:1946-1950) only stores the key when the bucket is first created. For non-workhorse keys where unique_key (= "{a}") happens to collapse two observationally-different objects to the same string, the splice was storing the LAST key per bucket instead of the FIRST — divergent from reference semantics. Fix: gate the assignment on !key_exists before the tab[uk] subscript materializes the entry. One extra hash lookup per element, but correctness over speed for the rare-collision case. For workhorse keys (int/float/etc. — the canonical _ % N idiom) unique_key is a bijection so the assignment was harmless overwriting; behavior unchanged. - linq_fold.das:1817 — finish truncated "(it's already" comment ("already shaped as the named-tuple / reducer result"). Verification: 234 functional + 95 AST + 329 AOT all pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-wins regression Per Boris's suggestion: replace `key_exists(uk) + tab[uk]` (2 hash ops per element) with `tab?[uk] ?? dummy` (1 hash op per element) + addr-compare for new-bucket detection. Pattern: var dummy : <tab value type> # OUT OF LOOP for it in src: let k = invoke(keyBlk, it) let uk = _::unique_key(k) unsafe: var entry & = tab?[uk] ?? dummy # one hash op: ref to existing entry OR dummy <accUpdate> # entry._1++ or entry._1 += it if addr(entry) == addr(dummy): # dummy path = new bucket dummy._0 = k tab[uk] = dummy dummy = default<...> For canonical _ % N int key with small bucket count (groupby_count's 5 brands in 100K rows = ~99.995% hits), this is ~1 hash op/element vs the prior 2 ops/element. Headlines (100K rows INTERP): groupby_count m3f: 43 ns/op (prev fix) -> 37 ns/op (this fix) groupby_sum m3f: 105 -> 108 (within noise; sum cascades to tier 2 via inner-select-sum) distinct_count m3f: 15 -> 16 (within noise) Still slightly above the pre-correctness-fix 33 ns/op baseline due to the addr-compare, the predictable if-branch, and the dummy reset on miss. Net: 3.8x over SQL m1 (141), 1.9x over m3 (71). All 234 functional + 95 AST + 329 AOT tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h clamp CI lint chain follows my new test_linq_fold.das generic instantiations into linq.das:1071-1072 and trips PERF015 ("ternary min — use 'min(a, b)' from math module"). Pre-existing on master but per the standing "fix all lint issues, even pre-existing" rule, swap the clamping ternaries to math::clamp which is exactly equivalent (max(lo, min(hi, value))). linq.das already requires math. Unblocks extended_checks (linux, 64). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address Copilot round-3 review on PR #2721: - distinct_take.das: m3 and m3f now BOTH use each(arr) (iterator source) for apples-to-apples comparison. Prior bench had m3 on arr (eager distinct_impl) vs m3f on arr.to_sequence() (lazy distinct generator). The "0 ns/op" m3f win partly reflected lazy iterator stop-on-take semantics, not the splice. Updated numbers: m3 30 -> m3f 0 ns/op. The real win is splice eliminating per-yield generator-dispatch overhead in the lazy chain. - LINQ.md:35 BufferDistinct row: drop trailing "..." from terminator list; spell out exact spliced set {to_array, count, sum, long_count} and note that first/first_or_default/min/max/average cascade to tier 2. - LINQ.md:343 plan_distinct section: add new "Intentional divergence" note explaining that distinct[_by] |> take(N) is streaming (splice breaks on first N distinct keys) vs the runtime's eager materialize-then-take. For pure code observationally identical; for side-effecting keys/upstream, splice fires side-effects on fewer source elements. This is by design — the splice gives array sources the same iterator-form semantics already provided by linq.das:668 (the iterator-form distinct() is a lazy generator that stops under take). - LINQ.md headline table: refresh distinct_take (0 ns/op, "splice eliminates generator-dispatch overhead"), groupby_count (37 ns/op from Boris's dummy-bind pattern), and benchmark "Shape" column to show each(arr) source. 234 functional tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…treaming exit Per-element take-guard was previously inside the `if (!seen.key_exists(uk))` fresh-key branch, which meant the loop only checked `taken >= takeLim` when scanning a fresh key. For inputs like `[0,1,2,0,0,0,...,3]` (first N unique, then many duplicates of seen brands, then an (N+1)th fresh at the end), the loop kept firing key_exists / where / projection on every duplicate until the next fresh key triggered the inner break. Move the guard to the top of the for-loop. Now `taken >= takeLim` fires at the start of any iteration past the Nth distinct consume — duplicates after the Nth distinct skip the key computation entirely. True streaming-take semantics with O(N)-source upper bound. Headline benchmark numbers unchanged (distinct_take.das uses BRAND_COUNT=5 / TAKE_N=3 with cycling brands, so the Nth fresh and (N+1)th fresh are back-to-back at source positions 2 and 3 — both inner and outer guards fire at position 3). The fix is for the worst case (many duplicates between Nth and (N+1)th fresh). Also in this commit: - LINQ.md: "should never fire" wording — corrected to describe `is_buffer_required_op` as a fallback guard for shapes the dedicated planners don't cover (e.g. `distinct().first()`, `order_by(...).first()` cascade via the marker since the spliced terminator set is finite). - linq_fold.das: two truncated "Inner-select" / "Named-tuple wrap" comments finished. - distinct_take.das: header comment synced with the each(arr) source change from db3c392 (was still describing the old eager-array shape). 234 functional + 95 AST + 329 AOT tests pass. Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…uffer-distinct-groupby-reverse linq_fold: buffer-required splice arms — reverse, distinct, group_by

borisbat and others added 12 commits May 18, 2026 14:56

Merge pull request #2721 from GaijinEntertainment/bbatkin/linq-fold-b…

f9ba7a4

…uffer-distinct-groupby-reverse linq_fold: buffer-required splice arms — reverse, distinct, group_by

pull Bot locked and limited conversation to collaborators May 19, 2026

pull Bot added the ⤵️ pull label May 19, 2026

pull Bot merged commit f9ba7a4 into forksnd:master May 19, 2026
1 of 5 checks passed

pull Bot had a problem deploying to github-pages May 19, 2026 02:58 Error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from GaijinEntertainment:master#1010

[pull] master from GaijinEntertainment:master#1010
pull[bot] merged 12 commits into
forksnd:masterfrom
GaijinEntertainment:master

pull Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented May 19, 2026 •

edited

Loading