Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Flip#367

Merged
navicore merged 5 commits intomainfrom
flip
Mar 23, 2026
Merged

Flip#367
navicore merged 5 commits intomainfrom
flip

Conversation

@navicore
Copy link
Copy Markdown
Owner

No description provided.

…up fast path copying heap-boxed float pointers without cloning. We fixed that in the

  is_trivially_copyable and prev_stmt_is_trivial_literal checks. Specialization operates entirely in registers and was safe all along.
  1. Flipped tagged-ptr to default in all three crates (core, runtime, compiler)
  2. Fixed feature propagation — tagged-ptr must be in defaults of all crates, not just the compiler
  3. Re-enabled specialization (was never broken — the crashes were from dup fast path)
  4. Switched heap values from Box<Value> to Arc<Value> — clone is now O(1) refcount bump instead of O(n) allocation

  Benchmark results are transformative:
  - Compute: fib 122x faster, primes 84x faster — now competitive with Go/Rust
  - Channels: pingpong 7.7x faster, fanout 24x faster
  - Collections: 15% slower (Arc overhead — COW will fix this)

  Also found a pre-existing bug in the benchmark script (cut -d: -f5 should be -f4), which was misreporting times as message counts.
@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

PR Review posted via Claude Code - see full review below

@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

PR Review: Flip (tagged-ptr default + Arc + specialization re-enable). This is a significant correctness and performance PR. Full analysis follows. WHAT THIS PR DOES: (1) Promotes tagged-ptr from opt-in WIP to default in all three crates. (2) Replaces Box with Arc for O(1) clone. (3) Re-enables specialization, which was blanket-disabled under tagged-ptr. (4) Updates benchmark results (fanout only). --- CORRECTNESS - stack_new.rs Arc migration: The unsafe code is correct. clone_stack_value uses the canonical Arc::from_raw + Arc::clone + std::mem::forget + Arc::into_raw pattern. drop_stack_value correctly frees only when the last reference drops. Concern: stack_value_to_value changed semantics under sharing. Under Box, pop() was always O(1). Under Arc, if refcount > 1, try_unwrap silently falls back to a deep clone -- pop is no longer always O(1) for heap types. Document this in the Safety block. Also: tagged_stack_new.rs and stack_new.rs line 9 both still reference Box in encoding table comments; update to Arc. --- CORRECTNESS - Re-enabling specialization: Removing the early-return guard in can_specialize is well-reasoned. Specialization works in registers on Int/Float/Bool only. is_trivially_copyable_at_current_stmt in types.rs correctly excludes Float in tagged-ptr mode, so the fast-path dup only fires for Int and Bool. Open question: when a specialized word returns a Float to a non-specialized caller, does the ABI transition correctly box it into a tagged Arc pointer? A test for the specialized-to-interpreter Float handoff would confirm this. --- BENCHMARK RESULTS - incomplete run and format bug: Only fanout was benchmarked. CI check-bench-freshness validates only the timestamp, not which suites ran. Result files for fibonacci, primes, pingpong, skynet, and collections were deleted without replacements. The commit claims 122x speedup for fibonacci and 84x for primes -- please run 'just bench' and commit complete results before merging. Column-order bug in run.sh line 115 (cut -d: -f5): Go/Rust/Python output is BENCH:fanout:throughput-100k:COUNT:TIME_MS so field 5 is time_ms (correct). But Seq fanout_seq.txt shows BENCH:fanout:throughput-100k:23:100000 where field 4 is the time (23ms) and field 5 is the count (100000). So cut -f5 reads 100000 for Seq instead of 23ms, making the comparison table wrong. Fix seq.seq to emit count:time matching the other languages, or update run.sh. --- FEATURE FLAG PROPAGATION: The chain is now consistent across all three crates. Adding tagged-ptr to seq-runtime defaults (not only seq-compiler) is the key fix -- without it, seq-runtime built in isolation would use the 40-byte layout while the compiler expected 8-byte. --- TEST COVERAGE: Existing roundtrip tests are solid. Missing Arc-specific coverage: (1) Clone under sharing: dup a value, pop both copies, verify no double-free. (2) Specialization-to-interpreter Float handoff test. (3) Miri run to confirm no use-after-free in the clone_stack_value + drop_stack_value path. --- SUMMARY: Box -> Arc is the right design and the unsafe code is correct. Main concerns before merging: (1) Run full benchmark suite and commit all result files (five suites deleted). (2) Fix Seq benchmark column order -- fanout_seq.txt has time:count while others have count:time, so cut -f5 reads the wrong field for Seq. (3) Add a clone/drop test for the shared-Arc fallback in stack_value_to_value.

  - Fanout benchmark had swapped output fields (time:result instead of result:time)
  - Cleared stale results, established new baseline
  - Updated README with accurate numbers

  Seq is now faster than Go on fanout and faster than Rust on fibonacci. The only remaining weakness is collection building (Arc overhead per element) and
  skynet (May spawn overhead).
@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

PR Review: Flip (tagged-ptr as default + Box→Arc)

This is a well-motivated change. Switching Box<Value> to Arc<Value> is the right call for enabling O(1) clones on the stack, and the benchmark numbers bear it out — collections speedup is significant (43s→18s on build-100k). Here are my observations:

Code quality

crates/core/src/stack_new.rs

The clone_stack_value implementation is correct:

let arc = Arc::from_raw(sv as *const Value);
let cloned = Arc::clone(&arc);
std::mem::forget(arc); // Don't decrement the original
Arc::into_raw(cloned) as u64

The std::mem::forget to avoid decrementing the original's refcount is the right pattern here.

The stack_value_to_value fallback is also correct — when try_unwrap fails (multiple Arc references exist), cloning the inner value is the only safe option. Worth a comment explaining when this can happen (e.g. a value duplicated on the stack that hasn't been dropped yet).

crates/compiler/src/codegen/specialization.rs

The removal of the specialization guard is the crux of this PR. The old comment was:

Raw copies create double-frees (Box can't be aliased)

With Arc that concern goes away — refcount bumps make aliasing safe. However, the guard was also labeled WIP implying specialization may not have been fully tested with tagged-ptr. Are there existing tests covering specialization paths with heap-allocated types (Float, String, etc.) in tagged-ptr mode? If not, a targeted test would strengthen confidence here.

Potential concern: Value: Send + Sync

Arc<T> requires T: Send + Sync. If Value contains any Rc, Cell, RefCell, or raw pointers without explicit Send/Sync impls, this will fail to compile — which is good (compile-time enforcement). Just worth confirming the build passes cleanly across all feature combinations.

Benchmark files

Bug fix in benchmarks/fanout/seq.seq: The swap removal is correct. The old code was outputting BENCH:fanout:throughput-100k:{time}:{result} (swapped), but the format should be result:time_ms. The old baseline fanout_seq.txt had 100000:2 flipped to 2:100000 — this fix aligns it with the correct format. The new Seq fanout result of 3ms (vs Go 33ms, Rust 8ms) looks legitimately good and now trustworthy.

New baseline files: The Go and Rust baseline files appear to match the results files exactly (same run?). That's fine for initial cross-language baselines, but worth noting that baseline/ and results/ are now diverged in intent — baseline/ seems to represent "reference implementation" numbers while results/ tracks Seq progress. A brief note in benchmarks/README.md clarifying the distinction between the two directories would help future contributors.

skynet_seq.txt: The Seq skynet time increased (3241ms → 3918ms) between baseline and results. The README correctly attributes this to mmap-per-coroutine spawn overhead, which is a known issue separate from this PR.

Minor

  • The README.md note "May coroutine spawn overhead" appears to be a typo — likely "May's coroutine spawn overhead" (May being the coroutine library/scheduler used).

Summary

The core change (Box→Arc) is sound and the performance results are compelling. The main thing to verify before merging is that specialization correctness with heap types in tagged-ptr mode is covered by tests. Everything else looks good.

…──────────────────────────────────────────────────────────────────────────────────┐

  │             Item             │                                                        Action                                                        │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Comment on try_unwrap        │ Added explanation of when clone fallback happens (dup'd values with multiple Arc refs)                               │
  │ fallback                     │                                                                                                                      │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Specialization with heap     │ Covered by integration tests (fmath tests exercise float ops through specialized/non-specialized paths) — no new     │
  │ types                        │ test needed                                                                                                          │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Value: Send + Sync           │ Already enforced by explicit unsafe impl in value.rs — build confirms                                                │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Baseline vs results          │ Added clarifying note to README                                                                                      │
  │ directory                    │                                                                                                                      │
  ├──────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ "May" capitalization         │ Fixed to "May's"                                                                                                     │
  └──────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

PR Review: Flip (tagged-ptr default + Box→Arc migration)

This is a significant, well-executed change. Flipping tagged-ptr to default and switching heap values from Box<Value> to Arc<Value> delivers transformative performance gains while fixing a real correctness issue (the dup fast-path double-free). Overall the implementation looks sound. A few notes below:


✅ What's well done

Arc ownership model is correct throughout:

  • clone_stack_value: Arc::from_raw + Arc::clone + std::mem::forget correctly bumps the refcount without decrementing the original. No double-free.
  • drop_stack_value: let _ = Arc::from_raw(...) correctly decrements and frees on last reference.
  • stack_value_to_value: Arc::try_unwrap with clone fallback is the right pattern — cheap unwrap when sole owner, graceful clone when dup'd refs exist.

Feature propagation fix is important. Before this PR, tagged-ptr was default in compiler but not core/runtime, which could cause subtle mismatches between crates. Making it uniform is correct.

Specialization re-enablement is justified. The guard that disabled specialization in tagged-ptr mode was added defensively after the dup fast-path crashes. Now that the dup path is fixed (Box→Arc), the original concern is resolved and specialization (which operates in registers) was always safe.

Fanout benchmark bug fix (swapped result:timetime:result fields) was a real pre-existing bug that was masking the actual numbers. Good catch.


🔴 Issues

Stale module-level doc comment (stack_new.rs:9):

//! - Even > 2: Heap pointer to Box<Value>

Should be Arc<Value> now. Minor, but it's the first thing a reader sees.


🟡 Observations worth noting

Arc::try_unwrap clone behavior on dup'd values:
When a value is dup'd (refcount = 2) and then pop'd, try_unwrap fails and falls back to cloning the inner value. This is correct and the comment explains it well. The tradeoff is that every pop of a dup'd heap value pays a clone. For workloads that heavily dup/pop heap values in tight loops, this could be a hidden cost — worth monitoring if channel-heavy benchmarks regress in the future.

Collections regression is acknowledged, but the number is notable:
build-100k went from 15,664ms → 18,110ms (+15.7%), and this was already a known weak area. The commit message notes "COW optimization planned" — it would help to have a tracking issue linked so this doesn't slip.

Value: Send + Sync assumption:
The Arc<Value> approach requires Value: Send + Sync for safe use across May coroutines. The commit message says this is "enforced by explicit unsafe impl in value.rs — build confirms." Worth verifying this also covers any newly-reachable code paths through specialization (since specialization was just re-enabled).


🟢 Test coverage

Existing roundtrip tests in stack_new.rs cover Float, String, Symbol, Variant, Map, Quotation, Closure, Channel, WeaveCtx — all the heap types. There's no explicit test for the clone_stack_value + stack_value_to_value refcount path (i.e., push a float, dup it, pop both, verify no double-free or corruption), but the integration benchmark results serve as a functional stress test. A unit test for the dup→pop→pop flow on a heap type would be a good addition to guard the Arc invariants going forward.


Summary

The core Arc migration is correct and the performance numbers are impressive. Fix the stale doc comment, and consider adding a link to a follow-up issue for the COW optimization. Otherwise this looks ready to merge.

…─────────────────────────────────────────────────────────────────────┐

  │                    Item                     │                                          Action                                          │
  ├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
  │ Stale doc comment (Box<Value> → Arc<Value>) │ Fixed in module header                                                                   │
  ├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
  │ Dup→pop→pop unit test for Arc invariants    │ Added test_dup_pop_pop_heap_type — pushes Float, dups, pops both, verifies no corruption │
  ├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
  │ Baseline vs results directory distinction   │ Added clarifying note to README                                                          │
  ├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
  │ "May" → "May's"                             │ Fixed                                                                                    │
  └─────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┘
@navicore navicore merged commit 502151c into main Mar 23, 2026
2 checks passed
@navicore navicore deleted the flip branch March 23, 2026 23:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant