perf: add VM and GC telemetry notes by ianm199 · Pull Request #127 · ianm199/lua-rs

ianm199 · 2026-06-02T18:54:22Z

Summary

add harness/bench/value-layout.sh plus a lua-vm example to measure Rust value/frame/object layout against reference Lua 5.4.7 C structs
add harness/bench/profile-inventory.sh to inventory available host profilers and repo telemetry probes
teach vm-execute-attribution.py to report execute_source_nodes, warn when sample lacks source-line data for lua_vm::vm::execute, and split opaque/line-0 VM samples by source file plus compact address-offset bundles
surface that warning from profile-hotspots.sh and document the required debug/frame-pointer release build
extend gc-profile.sh with a start snapshot, gc-delta.tsv, and gc-rates.tsv so collector profiles show cadence deltas and per-run/per-second rates
update performance docs with post-v0.0.27 VM attribution artifacts, corrected layout finding, CppCXY architecture comparison, GC/table pressure attribution, rejected call/frame, dispatch-cold-path, and table raw-set codegen spikes, plus current tooling gaps

Key findings

LuaValue currently measures 16 bytes, matching C Lua 5.4.7 TValue; the old "24-byte LuaValue" ceiling claim was wrong.
The measured layout gap is elsewhere: Rust StackValue is 24 bytes vs C StackValue 16 bytes, Rust CallInfo is 72 bytes vs C CallInfo 64 bytes, and Rust table/upvalue heap objects are larger.
The current VM profiles still point at call/frame/dispatch work: closure_ops_x40 has dispatch fetch 26.8% of VM self-samples, OP_CALL 13.3%, frame setup 9.7%; fibonacci_x2 has dispatch fetch 22.6%, OP_CALL 14.1%, frame setup 7.5%.
The new opaque-source split shows whether UNKNOWN_INLINED is truly VM-local (vm.rs:0) or standard-library/value inlining (result.rs:0, value.rs:0) before escalating to heavier tooling. It also preserves the raw address-offset bundle, which shows when a line-0 bucket is aggregating multiple code addresses.
GC/table profiles now split the next work cleanly: gc_pressure is collection cadence/fixed-step cost, binarytrees is cohort/old-revisit volume, and table_hash_pressure is string-key write path plus intern/concat allocation with GC stopped.
Rejected safe call/frame spikes showed that cached frame data has to avoid per-call write cost; the promising direction is still architecture, not duplicated fast paths.
A #[cold] #[inline(never)] split for trace_call/trace_exec failed the hook-heavy db.lua official test with a debug CLI segfault, so dispatch cold-path work needs correctness coverage before benchmark claims.
Removing the cold/out-of-line hint from LuaTable::try_raw_set_generic did not produce a keeper signal in focused best-of-5 table/GC A/B runs; leave that path alone unless a longer controlled run says otherwise.

Evidence

Matrix: harness/bench/results/20260602T183215Z-98bd6bd-compare.tsv
VM attribution: harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/vm-execute.txt
VM attribution: harness/bench/profiles/20260602T184108Z-18c5f24-fibonacci_x2/vm-execute.txt
Opaque-source split check: python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/sample.txt --source crates/lua-vm/src/vm.rs
Opaque-source split check: python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/sample.txt --source crates/lua-vm/src/vm.rs (vm.rs:0 = 460,32356,...; result.rs:0 = 17208; value.rs:0 = 6536)
GC/table hotspot: harness/bench/profiles/20260602T191942Z-e1483a6-gc_pressure_x300/summary.txt
GC/table rates: harness/bench/profiles/gc-profile/20260602T192753Z-e1483a6-gc_pressure_x300/gc-rates.tsv
GC/table hotspot: harness/bench/profiles/20260602T191955Z-e1483a6-binarytrees_x15/summary.txt
GC/table rates: harness/bench/profiles/gc-profile/20260602T192812Z-e1483a6-binarytrees_x15/gc-rates.tsv
GC/table hotspot: harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/summary.txt
GC/table rates: harness/bench/profiles/gc-profile/20260602T192829Z-e1483a6-table_hash_pressure_x100/gc-rates.tsv
Rejected table raw-set codegen A/B: harness/bench/results/20260602T193855Z-4558473-compare.tsv, harness/bench/results/20260602T193949Z-4558473-compare.tsv
Layout probe: bash harness/bench/value-layout.sh
Tool inventory: bash harness/bench/profile-inventory.sh
Opaque-profile warning: PROFILE_REPEAT=2 bash harness/bench/profile-hotspots.sh fibonacci 2 on a normal release build
Positive control: same profile after CARGO_PROFILE_RELEASE_DEBUG=true RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release -p lua-cli
Dispatch cold-path reject: bash harness/run_official_test.sh reference/lua-c/testes/db.lua after rebuilding debug CLI with cold trap wrappers

Validation

bash -n harness/bench/value-layout.sh
bash -n harness/bench/profile-inventory.sh
bash -n harness/bench/profile-hotspots.sh
bash -n harness/bench/gc-profile.sh
python3 -m py_compile harness/bench/vm-execute-attribution.py
python3 -m py_compile harness/bench/gc-profile-summary.py
python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T184054Z-18c5f24-closure_ops_x40/sample.txt --source crates/lua-vm/src/vm.rs
python3 harness/bench/vm-execute-attribution.py harness/bench/profiles/20260602T192236Z-e1483a6-table_hash_pressure_x100/sample.txt --source crates/lua-vm/src/vm.rs
bash harness/bench/profile-inventory.sh
cargo run --quiet -p lua-vm --example value_layout
bash harness/bench/value-layout.sh
cargo build --release -p lua-cli
PROFILE_REPEAT=1 bash harness/bench/gc-profile.sh gc_pressure
git diff --check
make test

Stacked on #126. This PR is documentation/tooling telemetry only; it makes no runtime speed claim.

Add attribution for the upstream project this is an AI-assisted derivative (translation) of, satisfying the permissive-license requirement to retain the original copyright notice. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ianm199 added 5 commits June 2, 2026 14:54

perf: add VM layout telemetry notes

89161f2

docs: record rejected call frame spikes

816baf5

perf: warn on opaque VM attribution

c0c0ff2

docs: record rejected dispatch cold split

e1483a6

perf: add GC profile cadence telemetry

e6df252

ianm199 changed the title ~~perf: add VM layout telemetry notes~~ perf: add VM and GC telemetry notes Jun 2, 2026

ianm199 and others added 2 commits June 2, 2026 15:35

perf: split opaque VM attribution by source

b67d54d

ianm199 force-pushed the perf/vm-value-telemetry branch from b441fca to b67d54d Compare June 2, 2026 19:46

ianm199 mentioned this pull request Jun 2, 2026

perf: skip no-op intern retain during GC #128

Merged

ianm199 changed the base branch from perf/post-027-telemetry-wave to main June 2, 2026 20:53

ianm199 merged commit 318e288 into main Jun 2, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add VM and GC telemetry notes#127

perf: add VM and GC telemetry notes#127
ianm199 merged 7 commits into
mainfrom
perf/vm-value-telemetry

ianm199 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianm199 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key findings

Evidence

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ianm199 commented Jun 2, 2026 •

edited

Loading