Skip to content

Slow derive for large traces: ~335ms for a 1737-step Claude session #53

@eliothedeman

Description

@eliothedeman

Observation

On the desktop app (toolpath-desktop), deriving a single ~1737-step / 609-turn Claude session takes ~335ms of Rust-side work, dominating total click-to-painted latency.

Measured end-to-end using the perf tracer (frontend/src/lib/perf.svelte.ts) in the main window's Select → flow:

derive claude  (total 472ms)
  dispatch                0.0ms  (+0.0ms)
  invoke-start            0.0ms  (+0.0ms)
  invoke-end            335.0ms  (+335.0ms)   ← Rust derive
  model-updated         335.0ms  (+0.0ms)
  buildTree             360.0ms  (+25.0ms)
  buildTree cache-hit   376.0ms  (+16.0ms)
  flattenChatHead       428.0ms  (+52.0ms)
  preview-mounted       470.0ms  (+42.0ms)
  dom-painted           471.0ms  (+1.0ms)

The JS side is ~136ms (already optimized with WeakMap memos for buildTree / flattenChatHead). The Rust derive is 71% of total latency at ~193µs/step.

Scope

The Tauri IPC command is derive_claude, which calls:

  1. toolpath_claude::ClaudeConvo::read_conversation(project, session_id) — reads + merges JSONL segments
  2. toolpath_claude::derive::derive_path(&convo, &config) — maps entries → steps

We don't know yet which of those two dominates. First step is to add profiling marks inside each to see whether time is spent on JSONL parsing, ConversationView assembly, or the step-construction mapping itself.

Likely suspects worth checking once we have sub-step timing:

  • Per-step allocations (actor strings, change-artifact keys) — lots of small String ownership transitions
  • git_head_content shell-outs for file-diff before-state (one per file artifact on a step that touches files; could batch or cache)
  • Diff generation for raw perspectives
  • Markdown rendering of text content (if any happens Rust-side)

Possible approaches (to evaluate once profiled)

  • Profile first. Add perf_mark-equivalent timing inside derive_path to split by phase (read, view build, step mapping, file-diff generation).
  • Streaming derive. Return steps incrementally via Tauri events so the UI can start rendering head-path turns before the full derive completes. Good for perceived latency even if total work stays the same.
  • Parallelize per-step work. File-diff generation looks parallelisable if it's a non-trivial portion of the time.
  • Batch git show calls. If a session touches many files, one git cat-file --batch pipe beats N spawn calls.

Out of scope

  • Pre-derive caching (tried and reverted — added complexity for a smaller win than optimising the derive itself).

Acceptance

Sub-100ms derive for a 1737-step session on a typical laptop, or a streaming-derive path that paints the first visible turn in under ~100ms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions