Observation
On the desktop app (toolpath-desktop), deriving a single ~1737-step / 609-turn Claude session takes ~335ms of Rust-side work, dominating total click-to-painted latency.
Measured end-to-end using the perf tracer (frontend/src/lib/perf.svelte.ts) in the main window's Select → flow:
derive claude (total 472ms)
dispatch 0.0ms (+0.0ms)
invoke-start 0.0ms (+0.0ms)
invoke-end 335.0ms (+335.0ms) ← Rust derive
model-updated 335.0ms (+0.0ms)
buildTree 360.0ms (+25.0ms)
buildTree cache-hit 376.0ms (+16.0ms)
flattenChatHead 428.0ms (+52.0ms)
preview-mounted 470.0ms (+42.0ms)
dom-painted 471.0ms (+1.0ms)
The JS side is ~136ms (already optimized with WeakMap memos for buildTree / flattenChatHead). The Rust derive is 71% of total latency at ~193µs/step.
Scope
The Tauri IPC command is derive_claude, which calls:
toolpath_claude::ClaudeConvo::read_conversation(project, session_id) — reads + merges JSONL segments
toolpath_claude::derive::derive_path(&convo, &config) — maps entries → steps
We don't know yet which of those two dominates. First step is to add profiling marks inside each to see whether time is spent on JSONL parsing, ConversationView assembly, or the step-construction mapping itself.
Likely suspects worth checking once we have sub-step timing:
- Per-step allocations (actor strings, change-artifact keys) — lots of small
String ownership transitions
git_head_content shell-outs for file-diff before-state (one per file artifact on a step that touches files; could batch or cache)
- Diff generation for
raw perspectives
- Markdown rendering of text content (if any happens Rust-side)
Possible approaches (to evaluate once profiled)
- Profile first. Add
perf_mark-equivalent timing inside derive_path to split by phase (read, view build, step mapping, file-diff generation).
- Streaming derive. Return steps incrementally via Tauri events so the UI can start rendering head-path turns before the full derive completes. Good for perceived latency even if total work stays the same.
- Parallelize per-step work. File-diff generation looks parallelisable if it's a non-trivial portion of the time.
- Batch
git show calls. If a session touches many files, one git cat-file --batch pipe beats N spawn calls.
Out of scope
- Pre-derive caching (tried and reverted — added complexity for a smaller win than optimising the derive itself).
Acceptance
Sub-100ms derive for a 1737-step session on a typical laptop, or a streaming-derive path that paints the first visible turn in under ~100ms.
Observation
On the desktop app (
toolpath-desktop), deriving a single ~1737-step / 609-turn Claude session takes ~335ms of Rust-side work, dominating total click-to-painted latency.Measured end-to-end using the perf tracer (
frontend/src/lib/perf.svelte.ts) in the main window'sSelect →flow:The JS side is ~136ms (already optimized with WeakMap memos for
buildTree/flattenChatHead). The Rust derive is 71% of total latency at ~193µs/step.Scope
The Tauri IPC command is
derive_claude, which calls:toolpath_claude::ClaudeConvo::read_conversation(project, session_id)— reads + merges JSONL segmentstoolpath_claude::derive::derive_path(&convo, &config)— maps entries → stepsWe don't know yet which of those two dominates. First step is to add profiling marks inside each to see whether time is spent on JSONL parsing,
ConversationViewassembly, or the step-construction mapping itself.Likely suspects worth checking once we have sub-step timing:
Stringownership transitionsgit_head_contentshell-outs for file-diff before-state (one per file artifact on a step that touches files; could batch or cache)rawperspectivesPossible approaches (to evaluate once profiled)
perf_mark-equivalent timing insidederive_pathto split by phase (read, view build, step mapping, file-diff generation).git showcalls. If a session touches many files, onegit cat-file --batchpipe beats N spawn calls.Out of scope
Acceptance
Sub-100ms derive for a 1737-step session on a typical laptop, or a streaming-derive path that paints the first visible turn in under ~100ms.