Skip to content

perf: add VM and GC telemetry notes#127

Merged
ianm199 merged 7 commits into
mainfrom
perf/vm-value-telemetry
Jun 2, 2026
Merged

perf: add VM and GC telemetry notes#127
ianm199 merged 7 commits into
mainfrom
perf/vm-value-telemetry

Conversation

@ianm199
Copy link
Copy Markdown
Owner

@ianm199 ianm199 commented Jun 2, 2026

Summary

  • add harness/bench/value-layout.sh plus a lua-vm example to measure Rust value/frame/object layout against reference Lua 5.4.7 C structs
  • add harness/bench/profile-inventory.sh to inventory available host profilers and repo telemetry probes
  • teach vm-execute-attribution.py to report execute_source_nodes, warn when sample lacks source-line data for lua_vm::vm::execute, and split opaque/line-0 VM samples by source file plus compact address-offset bundles
  • surface that warning from profile-hotspots.sh and document the required debug/frame-pointer release build
  • extend gc-profile.sh with a start snapshot, gc-delta.tsv, and gc-rates.tsv so collector profiles show cadence deltas and per-run/per-second rates
  • update performance docs with post-v0.0.27 VM attribution artifacts, corrected layout finding, CppCXY architecture comparison, GC/table pressure attribution, rejected call/frame, dispatch-cold-path, and table raw-set codegen spikes, plus current tooling gaps

Key findings

  • LuaValue currently measures 16 bytes, matching C Lua 5.4.7 TValue; the old "24-byte LuaValue" ceiling claim was wrong.
  • The measured layout gap is elsewhere: Rust StackValue is 24 bytes vs C StackValue 16 bytes, Rust CallInfo is 72 bytes vs C CallInfo 64 bytes, and Rust table/upvalue heap objects are larger.
  • The current VM profiles still point at call/frame/dispatch work: closure_ops_x40 has dispatch fetch 26.8% of VM self-samples, OP_CALL 13.3%, frame setup 9.7%; fibonacci_x2 has dispatch fetch 22.6%, OP_CALL 14.1%, frame setup 7.5%.
  • The new opaque-source split shows whether UNKNOWN_INLINED is truly VM-local (vm.rs:0) or standard-library/value inlining (result.rs:0, value.rs:0) before escalating to heavier tooling. It also preserves the raw address-offset bundle, which shows when a line-0 bucket is aggregating multiple code addresses.
  • GC/table profiles now split the next work cleanly: gc_pressure is collection cadence/fixed-step cost, binarytrees is cohort/old-revisit volume, and table_hash_pressure is string-key write path plus intern/concat allocation with GC stopped.
  • Rejected safe call/frame spikes showed that cached frame data has to avoid per-call write cost; the promising direction is still architecture, not duplicated fast paths.
  • A #[cold] #[inline(never)] split for trace_call/trace_exec failed the hook-heavy db.lua official test with a debug CLI segfault, so dispatch cold-path work needs correctness coverage before benchmark claims.
  • Removing the cold/out-of-line hint from LuaTable::try_raw_set_generic did not produce a keeper signal in focused best-of-5 table/GC A/B runs; leave that path alone unless a longer controlled run says otherwise.

Evidence

  • Matrix: harness/bench/results/20260602T183215Z-98bd6bd-compare.tsv
  • VM attribution: harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/vm-execute.txt
  • VM attribution: harness/bench/profiles/20260602T184108Z-18c5f24-fibonacci_x2/vm-execute.txt
  • Opaque-source split check: python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/sample.txt --source crates/lua-vm/src/vm.rs
  • Opaque-source split check: python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/sample.txt --source crates/lua-vm/src/vm.rs (vm.rs:0 = 460,32356,...; result.rs:0 = 17208; value.rs:0 = 6536)
  • GC/table hotspot: harness/bench/profiles/20260602T191942Z-e1483a6-gc_pressure_x300/summary.txt
  • GC/table rates: harness/bench/profiles/gc-profile/20260602T192753Z-e1483a6-gc_pressure_x300/gc-rates.tsv
  • GC/table hotspot: harness/bench/profiles/20260602T191955Z-e1483a6-binarytrees_x15/summary.txt
  • GC/table rates: harness/bench/profiles/gc-profile/20260602T192812Z-e1483a6-binarytrees_x15/gc-rates.tsv
  • GC/table hotspot: harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/summary.txt
  • GC/table rates: harness/bench/profiles/gc-profile/20260602T192829Z-e1483a6-table_hash_pressure_x100/gc-rates.tsv
  • Rejected table raw-set codegen A/B: harness/bench/results/20260602T193855Z-4558473-compare.tsv, harness/bench/results/20260602T193949Z-4558473-compare.tsv
  • Layout probe: bash harness/bench/value-layout.sh
  • Tool inventory: bash harness/bench/profile-inventory.sh
  • Opaque-profile warning: PROFILE_REPEAT=2 bash harness/bench/profile-hotspots.sh fibonacci 2 on a normal release build
  • Positive control: same profile after CARGO_PROFILE_RELEASE_DEBUG=true RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release -p lua-cli
  • Dispatch cold-path reject: bash harness/run_official_test.sh reference/lua-c/testes/db.lua after rebuilding debug CLI with cold trap wrappers

Validation

  • bash -n harness/bench/value-layout.sh
  • bash -n harness/bench/profile-inventory.sh
  • bash -n harness/bench/profile-hotspots.sh
  • bash -n harness/bench/gc-profile.sh
  • python3 -m py_compile harness/bench/vm-execute-attribution.py
  • python3 -m py_compile harness/bench/gc-profile-summary.py
  • python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/sample.txt --source crates/lua-vm/src/vm.rs
  • python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/sample.txt --source crates/lua-vm/src/vm.rs
  • bash harness/bench/profile-inventory.sh
  • cargo run --quiet -p lua-vm --example value_layout
  • bash harness/bench/value-layout.sh
  • cargo build --release -p lua-cli
  • PROFILE_REPEAT=1 bash harness/bench/gc-profile.sh gc_pressure
  • git diff --check
  • make test

Stacked on #126. This PR is documentation/tooling telemetry only; it makes no runtime speed claim.

@ianm199 ianm199 changed the title perf: add VM layout telemetry notes perf: add VM and GC telemetry notes Jun 2, 2026
ianm199 and others added 2 commits June 2, 2026 15:35
Add attribution for the upstream project this is an AI-assisted derivative
(translation) of, satisfying the permissive-license requirement to retain
the original copyright notice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ianm199 ianm199 force-pushed the perf/vm-value-telemetry branch from b441fca to b67d54d Compare June 2, 2026 19:46
@ianm199 ianm199 changed the base branch from perf/post-027-telemetry-wave to main June 2, 2026 20:53
@ianm199 ianm199 merged commit 318e288 into main Jun 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant