Fix API-key and OAuth auth mode persistence — conflict resolution + H1 fix (#395)#397
Merged
Conversation
…ed provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method.
The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts).
The native tool smoke only ever drove a single tool-call round-trip, so it
always replayed exactly one thought_signature and passed even when an earlier
function call would drop its signature. The Antigravity/Cloud Code backend
validates *every* functionCall in the replayed history, so the field 400
("Function call is missing a thought_signature ... position N") only
reproduces with a multi-call transcript.
- Extend run_live_native_provider_tool_smoke into two phases: the historical
single round-trip (gating) plus a best-effort multi-call replay that rebuilds
a history of two assistant tool_use blocks, each carrying its own signature.
- Delegate run_live_antigravity_native_tool_smoke to the shared probe so
Antigravity (the runtime that hit this) gets the multi-call coverage too.
- Add an always-on unit guard (build_contents_replays_every_signature_across_
multi_tool_history) so the serialization regression is caught for free,
without spending live tokens.
…end-design skill
Two things the prior dashboard commit missed.
1) Restore every metric the old SQL surface (README queries, health.sql,
dau.sql) exposed that had been dropped:
- os/arch platform breakdown (was os-only)
- session starts by UTC hour (usage-timing histogram)
- pipeline-health diagnostics: lifecycle_ids, session_start_ids,
lifecycle_ids_without_install, heaviest/top5/total session events
- meaningful_sessions_30d count
stats.js gains hours, arch, health, skew, meaningfulSessions queries;
all validated end-to-end against a seeded sqlite D1 shim.
2) Redesign dashboard.js using the installed anthropics/frontend-design
skill. The previous version used system fonts and the exact
purple-gradient-on-dark the skill warns against. New 'Terminal
Observatory' aesthetic, true to jcode being a CLI agent: JetBrains
Mono instrument typography (Sora for prose), warm phosphor-amber
signal color with a single cyan accent, scanline texture, station-
clock hero number, numbered hairline section dividers, KEY/alert
accent rails, an amber UTC-hour bar histogram, and a filled cyan
active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so
the total-users number dominates while every figure stays visible.
Verified in a real browser: token gate, hero, all 8 sections, both
chart types render correctly. node --check passes on all modules.
…s catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries.
Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target.
…y, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render.
…parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests.
The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'.
Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load
Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills.
Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint.
The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites.
Codex/Claude Code preview loaders parsed the entire JSONL transcript (often multiple MB, up to tens of MB) on every selection change just to show the last ~20 messages. In the onboarding resume menu this made arrow-key navigation lag badly, since each selection spawned a fresh full-file parse thread. Normal /resume (jcode native sessions) avoids this path, which is why only onboarding felt slow. Read only the trailing 512 KiB of the file instead: drop the partial first line, skip malformed boundary records, and parse the rest. This turns each preview load from ~140ms into ~1ms regardless of transcript size. Adds regression tests covering large (>cap) Codex and Claude transcripts.
The native gmail tool keeps its interface, confirmation gating, access tiers, and token-lean formatting, but its auth/transport is now pluggable via GmailBackend (Direct | Composio). - Direct: existing local Google OAuth tokens. - Composio: routes the same Gmail REST calls through Composio's proxy-execute endpoint, brokered by a Google-verified app. No unverified-app warning and no 7-day testing-mode token expiry. Backend is selected via JCODE_GMAIL_BACKEND=composio + COMPOSIO_API_KEY. Capability checks (is_configured/can_send/can_delete) are now backend-aware. Adds unit tests and docs/GMAIL_COMPOSIO_BACKEND.md.
The desktop render loop re-requested a redraw immediately after every animated frame (welcome-hero reveal, focus pulse, spinners, smooth scroll, streaming) in both the RedrawRequested handler and the AboutToWait fallback. Because the surface uses non-blocking Mailbox presentation, present() returns instantly, so the loop rendered as fast as the CPU allowed (~300fps on a 60Hz panel) and pinned the main thread near 100% CPU. That starved input handling and compositor scheduling, which is the root cause of the laggy/janky animations and scrolling, and made streaming events queue for 200ms-1s before the UI could process them. Schedule a paced redraw (DESKTOP_ANIMATION_FRAME_INTERVAL = 16ms, serviced via ControlFlow::WaitUntil in AboutToWait) instead of an immediate request. Measured idle main-thread CPU on the welcome screen dropped from ~99% to ~0-3%, frame rate from ~305fps to display refresh, while the stream-e2e benchmark still passes all interaction/no-paint budgets (max no-paint gap 71ms vs 250ms budget).
The visual dashboard (dashboard.js + stats.js + GET / and GET /v1/stats routing) was a separately-deployed Cloudflare Worker UI that does not belong in the jcode repo. Remove it and restore the worker to its POST /v1/event ingest-only surface. The telemetry accuracy work it was built on (turn_end meaningfulness, CI exclusion, the daily_active_users rollup) stays. users.sql remains as a CLI query alongside dau.sql / health.sql.
The runtime welcome-hero mask ("Hello there") is built once on the first
single-session frame, but build_hero_reveal_texture runs a per-lit-pixel
nearest-stroke search (O(pixels x segments)) on the UI thread, costing
~600ms and stalling the start of the reveal animation.
Split the per-pixel fill across worker threads via std::thread::scope
(rows are independent and read-only over glyph_rgba/segments), reducing
the one-time build cost. Output is bit-identical to the serial path;
small images fall back to serial to avoid spawn overhead. Added parity
and worker-count tests.
…329) The header and info widget hard-coded 'OpenRouter' for any model routed through the OpenRouter slot, even when the user switched to a direct OpenAI-compatible profile such as NVIDIA NIM at runtime. The display name was resolved from process env vars that only reflect the startup profile, so a runtime '/model' switch never updated the label. Add a runtime-aware Provider::display_name() (default = name()) overridden by OpenRouterProvider (maps profile_id -> 'NVIDIA NIM', etc.) and MultiProvider (delegates to the active execution runtime). name() stays the stable machine id ('openrouter') that billing/routing keys off. format_model_name() in the header now uses the active provider's display name instead of a fixed 'OpenRouter:' prefix. Adds regression tests.
…on real transcripts
The copy-selection drag edge auto-scroll "hot zone" (top/bottom few rows of the chat pane) fired unconditionally whenever a drag entered the band. When the transcript was already pinned to the bottom (the common case), dragging into the bottom rows snapped the selection cursor to the very last visible line and armed a downward autoscroll, even though there was nothing more below to scroll into. This made it impossible to precisely highlight the bottom rows of the transcript: the selection kept jumping to the end. Gate each directional hot zone on whether there is actually more transcript to scroll into that direction (scroll > 0 for up, visible_end < line_count for down). When there is nothing to scroll, the edge band stays inert so the selection lands on the exact cell under the cursor. Adds a regression test that drags into the bottom hot zone while pinned to the bottom and asserts no autoscroll arms and the selection lands on the targeted line.
Adds a 'connect' action to the gmail tool that drives Composio's hosted Connect Link flow: it creates an auth-link session, opens the Google consent screen in the browser, polls until the connection is ACTIVE, and persists the connected account to ~/.jcode/composio_gmail.json so future sessions are already authorized. - ComposioConfig gains auth_config_id + persisted-connection fallback. - GmailClient: connect(), needs_connection(), supports_connect(), create_link()/wait_for_connection() against /connected_accounts. - tool/gmail.rs handles 'connect' before the config gate and hints the agent to connect when no account exists yet. - Tests for connect/needs_connection/effective_user_id; docs updated.
The StreamBuffer previously revealed text the instant a provider delta arrived (only capping bursts >96 chars per frame). OpenAI emits many tiny token deltas so this looked smooth, but Anthropic coalesces deltas into 20-40 char bursts with gaps, so each burst popped in at once and the UI stair-stepped. Replace the burst cap with a time-paced proportional reveal: text accumulates in a backlog and drips out at base + gain*backlog chars/sec, with the per-step elapsed time clamped so idle gaps cannot bank budget that dumps the next burst. This smooths bursty providers while keeping fast steady feeds responsive. Remote tick now reveals via flush_smooth_frame() to match; flush() still drains fully at finalize.
…picker The first-run onboarding 'continue where you left off' picker previously surfaced only ONE external CLI: when a user was logged into both Codex and Claude Code, it picked whichever had the most recent transcript and hid the other CLI's history entirely. Now the onboarding picker loads and displays every detected external CLI's transcripts together in one combined, recency-sorted list: - Add SessionFilterMode::ExternalClis (Codex OR Claude Code). - Add load_external_cli_sessions_grouped_multi to load several CLIs. - onboarding_open_transcript_picker now takes the full detected CLI set; the banner reads 'We found your Codex and Claude Code sessions' when both are present. Resume still works off each session's own id/source, so selecting either CLI's transcript resumes correctly. Adds a regression test seeding both a Codex and a Claude Code transcript and asserting both appear.
Show the model's live reasoning out of the box. DisplayConfig now defaults reasoning_display to Current (with show_thinking=true to keep the provider request + streaming display paths in sync), and the generated default config documents reasoning_display = "current".
…arallel tool-call probe - New REASONING_CAPABILITY checkpoint (taxonomy v3), never required for user-readiness and excluded from strict coverage. A reasoning word problem is sent and the turn is classified streamed/opaque/none from StreamEvent signals (ThinkingDelta text, ThinkingSignatureDelta, OpenAIReasoning, and Gemini-3 tool thought_signature). Absence records 'none' and passes. - Shared native tool smoke gains a Phase 3 that asks for two tool calls in a single assistant message, replays both tool_use blocks (each with its own thought_signature) in one assistant turn and answers both results, recording parallel_tool_calls: verified|skipped (best-effort, never fails). - Wired reasoning into the antigravity, generic-native, and claude drivers; skipped on non-full tiers; surfaced in the doctor report detail. - Unit tests for classification, parallel replay shape, detail strings, and the observe-only contract (probe error -> skipped, never failed).
…as reasoning signal A Gemini-3 thoughtSignature that was not consumed by a following functionCall (e.g. a pure-text reasoning turn) was silently dropped. Emit it as a ThinkingSignatureDelta instead so reasoning-aware consumers (and the new provider-doctor reasoning probe) can observe that the model reasoned even when no reasoning text and no tool call were produced.
…add scroll diag instrumentation Building a fresh FontSystem every frame (rescanning all system fonts) inside the inline-code/math pill geometry builder caused multi-ms per-frame scroll spikes over code blocks. Reuse a thread-local measurement FontSystem instead.
Standalone pictographic emoji/symbols (🔄 ⬜ → ✓ etc.) render identically under Basic and Advanced cosmic-text shaping, so escalating the whole visible-window buffer to Advanced shaping for them was pure per-frame scroll overhead on emoji-rich transcripts. Only sequences that truly need shaping (variation selectors, ZWJ, regional-indicator flag pairs) and lines carrying inline-code/ math spans still use Advanced. Cuts worst-case scroll-frame shaping cost.
'jcode server reload' (run by installers and the TUI's stale-server reload path) now repairs the shared-server channel before sending the forced reload. The running daemon resolves its reload target from that channel; if it still points at the daemon's own old binary (the 'current client, stale server' state after a no-op /update), a forced reload would just re-exec the same old binary. Repairs shared-server -> stable when stable is strictly newer (never downgrades, preserves a fresher self-dev pin). Adds scripts/stale_server_upgrade_sandbox.sh: a live end-to-end sandbox that starts a REAL released v0.14.6 daemon and runs the new client's 'jcode server reload', asserting the daemon upgrades to the new release. Verified locally: v0.14.6 daemon -> v0.22 after reload, deterministic across runs, fully isolated from the real global daemon via JCODE_SOCKET.
Configuring agents.swarm_model with an explicit auth-route prefix (e.g. openai-api:gpt-5.5, openai-oauth:..., claude-api:..., claude-oauth:...) now pins spawned swarm agents to that exact model + provider + auth route instead of inheriting the coordinator's model. The prefix is split into a bare model plus stable provider_key/route_api_method ids that round-trip through ModelRouteApiMethod::parse on session restore. Lets users force spawned agents onto a specific API-key route (e.g. GPT-5.5 via the OpenAI API) regardless of what the coordinator is running.
cosmic-text/rustybuzz/ttf-parser/swash/yazi/fontdb do all desktop transcript glyph shaping and are 15-40x slower at opt-level=0, making debug/selfdev scrolling of real emoji/markdown-heavy transcripts janky (p99 ~238ms) even though release was smooth. Pin these stable third-party crates to opt-level=3 in dev/selfdev/test (same one-time-compile trick already used for jcode-tui-anim). Debug-build scroll p99 drops 238ms -> 8.4ms with no impact on recompile speed of jcode's own crates.
…iling Profiles a realistic mix of user actions (smooth/whole-line scroll, selection drag, composer typing, model-picker and session-switcher toggles, window resize, and streaming growth) against the user's largest real on-disk transcripts, each phase measured as per-frame CPU p50/p95/p99/max with a 120fps budget check. Complements --real-transcript-scroll-benchmark for broad interaction coverage.
…emental wrap The streaming_growth phase re-wrapped the entire transcript every frame, which production avoids by caching the wrapped static base and only appending the wrapped streaming tail. Mirror that here: wrap the static body once, then per frame truncate to the static base and append the tail. Drops measured streaming_growth p99 ~72ms -> ~18ms, reflecting the real production path.
…ines Production caches the raw (unwrapped) styled body lines across resizes and only re-runs the width-dependent wrap, via single_session_rendered_body_lines_from_raw_ref. Mirror that in the resize phase instead of regenerating raw markdown lines every frame. Measured window_resize p99 ~64ms -> ~28ms, matching the real path.
…aming The streaming text loop re-ran text_content.find(...) over the ENTIRE accumulated response on every TextDelta until a wrapped-tool-call marker was found. For normal answers (no marker) that scanned everything every token: O(response) per delta, O(response^2) over a full streamed answer. Scan only the newly appended delta plus a short overlap window (so a marker straddling the append boundary is still detected), giving O(delta) per token. Add unit tests asserting equivalence to a full rescan across chunk sizes, unicode, and the boundary-straddle case.
…uffer parse_next_event reassigned self.buffer = self.buffer[pos+2..].to_string() for every SSE event, copying and reallocating the entire remaining buffer each time. When one network chunk batches many SSE events this is O(buffer^2). Use String::drain(..pos+2) to remove the consumed prefix in place. Pure behavior-preserving refactor.
prepare_body_incremental recounted user messages in messages[..prev_msg_count] on every incremental append to seed prompt_num. Appending one message at a time over a long session made that cumulative O(n^2). prev.user_prompt_texts is extended in lockstep with each rendered user message, so its length already is the prior user-prompt count; use it directly for O(1) seeding.
rebuild_items scanned every filtered session ref once per server group to collect that group's sessions: O(groups * filtered_refs). With many remote server groups and many sessions this scaled poorly on every search keystroke. Bucket the filtered refs by group_idx in a single O(filtered_refs) pass, then emit groups in order (O(groups)). Behavior (grouping, ordering, saved-id filtering) is preserved.
Follow-up to the previous fix that stopped the edge auto-scroll hot zone from snapping the selection to the last line while pinned. That left a gap: dragging *past* the last line (down into the empty area below the content-sized chat pane) no longer extended the selection at all, because that overshoot row maps to no line and copy_point_from_screen returned None. Native terminal/browser selection treats dragging past the last line as "select through the end of that line". Add copy_pane_drag_point(), which clamps vertical overshoot to the nearest in-bounds line edge: a drag below the last visible line snaps to the end of that line, and a drag above the first visible line snaps to its start. A direct hit on a real line still yields precise per-cell selection. Use it for both Drag and Up so the boundary line is fully covered during the drag and on release. Adds a regression test that anchors on the last content line, drags straight down past the bottom of the pane with the cursor x only partway through the line, and asserts the whole last line (through its end) is selected without arming autoscroll or scrolling.
In 'current' reasoning-display mode the live dim/italic reasoning used to vanish in a single frame when the answer committed or a tool ran, snapping the transcript upward. Instead, on close the reasoning block is sliced out of the streaming buffer into a dedicated collapsing 'reasoning' display message that height-collapses (ease-out, oldest line first) toward a one-line '▸ thought for Xs' summary, leaving a trace behind. - New ReasoningCollapse state + begin/advance/finalize on App. - Renders via a new 'reasoning' display role (dim+italic, sentinel-stripped). - Redraw loop (local + remote tick, turn loop) advances the animation; redraw policy keeps frames live while collapsing. - Reduced-motion / low-power tiers snap straight to the summary. - Guards drop the animation safely on transcript reset/replace. - Tests: block parsing, summary labels, monotone collapse, finalize, reduced-motion snap, and end-to-end dim/italic render of the role.
bump_display_messages_version recomputed display_user_message_count and display_edit_tool_message_count by scanning all display messages twice on every mutation. Appending one message at a time over a long session made counter maintenance cumulatively O(M^2). The hot append path now folds the single new message into the cached counters (O(1)) and bumps the version without a full rescan; rarer bulk/remove/replace paths still recompute fully. Add a test asserting the incrementally-maintained counters match a full recompute after interleaved pushes and removes.
…_FUNCTION_CALL Gemini-3 thinking models intermittently emit Python-style pseudo-code (e.g. print(default_api.read(...))) instead of a clean functionCall, which the Cloud Code backend rejects with finish_reason=MALFORMED_FUNCTION_CALL and empty content. Previously the runtime ended the turn with a silent empty MessageEnd, so the agent looked like it stalled with no answer. For gemini-3.1-pro-high this hit roughly half of tool turns. Three layered mitigations (per Gemini function-calling guidance / field reports): 1. Prevention: when tools are advertised, append a 'Function calling' guard to the Gemini system prompt forbidding code/namespaces (build_system_instruction_with_tool_guard). 2. Transparent retry: detect a malformed empty turn (is_retryable_empty_turn) and re-request up to twice before surfacing anything, so the agent never sees the blip. Retries force function-calling mode ANY so the model must emit a real functionCall instead of pseudo-code. 3. Surfacing: if output is still empty after retries, emit an actionable error (with the finish_reason and finishMessage) instead of a silent empty turn. Also surfaces the previously-hidden finishMessage for diagnosis. Measured on the live Antigravity backend: gemini-3.1-pro-high tool-call success went from ~50% to ~7/8 (remaining miss was a probe-deadline timeout, not malformed). Unit tests cover the guard and the retry classifier.
…pt per mouse move Selection hit-testing (single_session_visible_body -> body viewport -> single_session_rendered_body_lines_for_tick) re-parsed markdown and re-wrapped the ENTIRE transcript on every selection mouse-move during a drag, an O(transcript) cost per pointer event. Add a thread-local single-entry memo keyed by the existing body cache key and return the wrapped lines as a shared Rc, so the viewport only clones the visible slice instead of the whole transcript. The render hot path keeps its separate Canvas-side cache, so this only accelerates input/scroll-metric/geometry callers. Measured on real transcripts (debug build): per-mouse-move selection hit-test p99 ~121ms -> ~0.06ms. Adds a selection_input_hittest benchmark phase that isolates this cost, plus a debug-only env gate for A/B measurement.
gather_recent_sessions fully parsed every session JSON file (the sessions dir can hold tens of thousands) just to drop those older than the 24h cutoff and keep the 20 most recent: O(all_sessions * parse) per ambient cycle. Pre-filter candidate files by filesystem mtime (with a 1h margin for write/clock skew) before loading, sort newest-first, and only parse up to a bounded budget (4x the limit) before the existing id-based sort/truncate. Behavior is preserved; work drops from O(all_sessions) to O(recent_sessions).
picker_fuzzy_score re-lowercased and re-collected the filter pattern into a Vec<char> on every call, i.e. once per entry inside the per-keystroke filter loop (O(entries * pattern)). Hoist pattern normalization out via picker_fuzzy_pattern + picker_fuzzy_score_with_pattern and normalize once per filter pass. Scoring behavior is unchanged.
search_matched_session_refs cloned the cached match set into candidates and then cloned the new matches back into the cache: two full-list clones per narrowing keystroke. Take the cached refs in place via mem::take (it is about to be overwritten anyway), eliminating the candidates clone. Behavior unchanged.
…ection text copy_selection_status built the entire selected string via current_copy_selection_text just to report char/line counts in the status line. This ran on every render frame while in copy mode, including every drag move, so a large selection (e.g. select-all) re-allocated and re-joined the whole transcript text each frame. Add copy_selection_metrics (and a raw-lines fast path) that counts chars/lines using the same slicing logic without allocating the joined string, and use it for the status line. Add a test asserting the metrics exactly match the built selection text's char/line counts.
…t O(L^2)) For each image/mermaid placeholder, body prep scanned forward through all following blank lines to compute the placeholder height. A message with many placeholders each followed by long blank runs made this O(wrapped_lines^2). Precompute, in a single reverse pass, the blank-run length starting at every line; the placeholder height is then an O(1) lookup. Extracted into a shared compute_image_regions helper used by both wrap_lines and wrap_lines_with_map. Behavior is identical (height = 1 + trailing blank run).
…todo progress swarm list previously returned only a shallow roster (name, role, status, files, age). Enrich each agent row with: - live activity (processing + current tool name) - provider/model - token churn over a recent ~10s window + cumulative tokens - turn count - todo progress (completed/total) - contextual idle/active duration label (idle Ns when ready, Ns when running) - completion report when finished Token churn and turn count are tracked in a new lock-free per-session metrics registry (jcode-base::session_metrics) rather than on the Agent struct, because swarm list reads stats while an agent may hold its own Mutex<Agent> lock mid-turn (try_lock fails exactly when churn is most interesting). Metrics are recorded from the streaming turn loop and run_turn, and forgotten on session disconnect. handle_comm_list now joins swarm membership with live session state and todos, sharing the runtime-extras gathering helper in comm_sync.
…story
The 'current' reasoning collapse only ran in the live streaming path. When
the transcript was re-rendered from stored history (self-dev reload, resume,
remote sync, compaction-window expand), the shared history renderer replayed
every persisted reasoning trace in full regardless of reasoning_display mode,
so after the collapse animation finished a reload would bring all the
reasoning back.
format_reasoning_markup now honors the active mode:
- Off: persisted reasoning is hidden entirely.
- Current: the block folds to a single '▸ thought (N lines)' trace line,
matching the live collapse end state.
- Full: classic full replay (unchanged).
Adds reasoning_summary_line_markup helper + tests for all three modes.
This was referenced Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #395 — resolves 9 merge conflicts, fixes H1 (route_api_method propagation in SubagentTool), adapts loading.rs to use CASR-based loader.
Changes:
session.route_api_method = parent_session.route_api_method.clone()in task.rs SubagentTool — subagents now inherit auth route from parentload_external_cli_sessions_grouped_multiusing CASR