Enhance session management and prompt navigation features#329
Merged
Conversation
Enhance prompt navigation and session management features
Resolves two conflicts vs current master: - crates/jcode-provider-metadata/src/catalog.rs: LOGIN_PROVIDERS array length conflict between master's +4 providers (BigModel, Cohere, GitLab Duo, Vertex AI = 49) and PR #323's +1 (Anthropic API direct = 46). Final length: 50. - src/tui/app/state_ui_input_helpers.rs: Master appended dollar_token_tests module at EOF; PR #323 appended ExternalCliSuggestionCandidate + helpers. Both are end-of-file additions to disjoint scopes, kept both.
Every workflow run on master and recent branches has been failing at
startup with:
Invalid Argument - failed to parse workflow:
Unrecognized named-value: 'secrets'. Located at position 1 within
expression: secrets.DEPLOY_KEY != '' (Lines 30, 84, 126, 216, 388
in .github/workflows/ci.yml; Line 63 in .github/workflows/release.yml)
GitHub Actions' expression validator does not permit referencing the
`secrets` context inside an `if:` expression at step level in this
configuration — runs were being marked "completed/failure" with zero
jobs before any work could happen, so master CI has effectively been
non-functional.
Workaround: lift `secrets.DEPLOY_KEY` into a job-level `env:`
binding (`DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}`), then check
`env.DEPLOY_KEY != ''` in the step `if:`. The secret is still
accessed via `secrets.DEPLOY_KEY` inside `with:` blocks (which is
allowed). Behavior is unchanged when the secret is present or absent.
Applied to every job in ci.yml that uses webfactory/ssh-agent
(quality, mobile-simulator, build, windows-build-test,
windows-cross-check) and to release.yml's build-linux-macos job.
- Remove unused_mut (input.rs, turborag.rs) - Gate test-only functions with #[cfg(test)] (hash_window, active_at_token, suggest_at_path) - Fix ClipboardCommand visibility (private_interfaces warning) - Allow dead_code on AtPicker public API stubs kept for upcoming wiring - Fix unused variable _end in test assertions - Apply cargo clippy --fix for needless_borrow, manual_char_comparison, etc. - Fix manual_clamp in acp.rs - Fix doc_overindented_list_items in args.rs, terminal.rs - Replace vec_init_then_push with vec![] literal in export.rs - Allow too_many_arguments, enum_variant_names on anthropic.rs internals - Refresh code_size, test_size, panic, swallowed_error budget baselines
- Add #[allow(clippy::await_holding_lock)] to test fns that hold lock_test_env() across await (intentional test serialization) - Fix if_same_then_else in state_ui_input_helpers.rs (simplify redundant conditional to direct assignment)
…ets-context fix(ci): mirror DEPLOY_KEY into env to unblock workflow startup
Fix clippy lints, dead-code warnings, and refresh quality budgets
Merge PR #323 with conflicts resolved (macOS shortcuts + UI fixes)
quangdang46
added a commit
that referenced
this pull request
Jun 6, 2026
…1 fix (#395) * fix(provider): keep API-key vs OAuth auth mode across the two persisted provider_key vocabularies Claude (and OpenAI) sessions could silently shift from an API key onto the OAuth subscription. Root cause: two divergent provider_key vocabularies persist into sessions, and the session-reconstruction helpers only understood one of them. - The structured model-route picker (RPC) persists RuntimeKey::stable_id() values: claude-oauth / anthropic-api-key / openai-oauth / openai-api-key. - The legacy /model + login path persists: claude / claude-api / openai / openai-api. model_switch_request_for_session_model and session_provider_key_matches_provider_name only matched the legacy vocabulary. A session whose provider_key was 'anthropic-api-key' (without a separately-persisted route_api_method, e.g. a forked/child/ambient/ overnight session) therefore reconstructed a bare model with no auth prefix, leaving the Anthropic provider in Auto mode -- which now prefers OAuth (commit 00e9b9f) -- silently moving an API-key user onto the subscription. Fix: - Add canonical_session_provider_key() to fold the picker vocabulary back onto the canonical keys, and apply it in the reconstruction/match helpers so either vocabulary recovers the exact OAuth-vs-API-key route. - Carry route_api_method alongside provider_key when copying a parent session to a child (ambient, overnight, fork, selfdev, crash recovery) so children reconstruct the full route even without the canonicalizer. Adds a regression test proving anthropic-api-key/openai-api-key/-oauth provider keys preserve the auth route without route_api_method. * telemetry: add served dashboard with accurate 'total users' headline The worker previously only accepted POST /v1/event; there was no visual dashboard (just SQL files run by hand). Add a real one. Headline metric (users.sql + stats.js): total_users = distinct, non-CI telemetry_id that ever installed OR did meaningful work. Validated with sqlite edge-case repros (install-only, turn_end-only with lost session_end, empty open/close, CI). Reported alongside broader tiers (reached) and narrower tiers (core, installed) plus raw CI-inclusive totals so no signal is removed. - src/stats.js: read-only aggregation (counts only, never raw rows) over users, DAU/WAU/MAU rollup, installs, D7 retention, engagement quality, per-turn, errors, feature adoption, transport, version/os/channel/ provider/auth/onboarding breakdowns, 60d timeseries, recent feedback. One shared MEANINGFUL_SQL predicate so every window agrees. - src/worker.js: GET / serves the dashboard, GET /v1/stats serves JSON gated behind DASHBOARD_TOKEN (deny-by-default), POST /v1/event unchanged. CORS widened to GET. - src/dashboard.js: self-contained HTML/CSS/inline-SVG dashboard (no CDN, works under Cloudflare). Tiered layout: hero total-users number, active funnel + chart, 'how the number is built' transparency band, then acquisition/retention, engagement, reliability, breakdowns, features, feedback. Importance shown via hero/key tags/muted diagnostics. - README + package.json: dashboard usage, DASHBOARD_TOKEN setup, npm run users; type:module to silence ESM warning. Validated: node --check on all modules, getStats end-to-end against a seeded sqlite D1 shim (total_users=3 with CI excluded), and rendered in a real browser (token gate + every section + charts). * test(provider-doctor): cover multi-call thought_signature replay The native tool smoke only ever drove a single tool-call round-trip, so it always replayed exactly one thought_signature and passed even when an earlier function call would drop its signature. The Antigravity/Cloud Code backend validates *every* functionCall in the replayed history, so the field 400 ("Function call is missing a thought_signature ... position N") only reproduces with a multi-call transcript. - Extend run_live_native_provider_tool_smoke into two phases: the historical single round-trip (gating) plus a best-effort multi-call replay that rebuilds a history of two assistant tool_use blocks, each carrying its own signature. - Delegate run_live_antigravity_native_tool_smoke to the shared probe so Antigravity (the runtime that hit this) gets the multi-call coverage too. - Add an always-on unit guard (build_contents_replays_every_signature_across_ multi_tool_history) so the serialization regression is caught for free, without spending live tokens. * telemetry dashboard: restore all legacy metrics + redesign with frontend-design skill Two things the prior dashboard commit missed. 1) Restore every metric the old SQL surface (README queries, health.sql, dau.sql) exposed that had been dropped: - os/arch platform breakdown (was os-only) - session starts by UTC hour (usage-timing histogram) - pipeline-health diagnostics: lifecycle_ids, session_start_ids, lifecycle_ids_without_install, heaviest/top5/total session events - meaningful_sessions_30d count stats.js gains hours, arch, health, skew, meaningfulSessions queries; all validated end-to-end against a seeded sqlite D1 shim. 2) Redesign dashboard.js using the installed anthropics/frontend-design skill. The previous version used system fonts and the exact purple-gradient-on-dark the skill warns against. New 'Terminal Observatory' aesthetic, true to jcode being a CLI agent: JetBrains Mono instrument typography (Sora for prose), warm phosphor-amber signal color with a single cyan accent, scanline texture, station- clock hero number, numbered hairline section dividers, KEY/alert accent rails, an amber UTC-hour bar histogram, and a filled cyan active-users area chart. Tiered HEADLINE/SIGNAL/DIAGNOSTIC layout so the total-users number dominates while every figure stays visible. Verified in a real browser: token gate, hero, all 8 sections, both chart types render correctly. node --check passes on all modules. * feat(skills): endorse NVIDIA CUDA-X skills from official NVIDIA/skills catalog Add NVIDIA's CUDA-X / GPU accelerated-computing agent skills (cuOpt, cuPyNumeric, cuDF, CUDA-Q, and the cuTile/TileGym GPU-dev skill) to the endorsed-skills list, sourced from the official NVIDIA-verified catalog at github.com/NVIDIA/skills. - EndorsedSkill gains category + optional install hint fields. - /skills now groups endorsed skills by category with per-category installed counts and shows the 'npx skills add nvidia/skills' install command for missing skills, plus the catalog URL. - Tests cover the new fields and the NVIDIA catalog entries. * test(reload): prove normal-user /update upgrades the daemon end-to-end Adds an integration test that drives the REAL update-detection core (newer_binary_available, the function behind server_has_update) and the reload-target resolver after a normal (non-self-dev) /update channel swap. Models a shipped user: shared-server tracking stable, daemon running the old release, /update installs a newer release and advances stable/current/ shared-server. Asserts both that the old daemon reports an update and that it reloads into the freshly installed release. This documents that normal users are covered by advance_shared_server_if_tracking_stable + the cross-flavor reload target. * telemetry dashboard: add user leaderboard, token usage, agent autonomy, full active tiers Surface every remaining signal the original SQL surface (and the schema) exposed but the dashboard had not yet shown: - User leaderboard (sec 09): top 20 anonymous ids by lifecycle volume with sessions/turns/tokens/tool_calls, version and last-seen. CI and non-release ids are tagged and dimmed (the old 'Heavy telemetry IDs' query, made visual). - Token usage (sec 04): full breakdown - input/output/cache_read/ cache_creation/total, both 30d and all-time (was a single combined number). - Agent autonomy (sec 05): spawned agents, subagent/swarm/background tasks + successes, user cancellations, and where agent time goes (active/model/tool/blocked/idle), time-to-first-action, avg max concurrency. These schema columns were never surfaced before. - Active-user tiers: DAU/WAU/MAU now show meaningful + raw subvalues, not just the headline number. - Engagement: added time-to-first-tool-success. stats.js gains tokens, agent, and leaderboard queries (26 queries total, all validated against the real schema via a seeded sqlite D1 shim). dashboard.js renumbered to 11 sections with a new leaderboardPanel renderer and CI/dev tag styling. Verified end-to-end in a real browser: all sections, the leaderboard table, and both chart types render. * fix(gemini): tolerate generateContent candidate content without role/parts Live multi-call provider-doctor against gemini-3.1-pro-high surfaced a real decode abort: the Antigravity/Cloud Code generateContent response occasionally omits `role` (and sometimes `parts`) on a candidate's `content`, but the struct required `role`, so the whole turn failed with "missing field `role`". The response-side role is never read, so default both fields rather than aborting. Adds two decode regression tests. * test(provider-doctor): drive a real multi-call signature-replay loop The first cut of the multi-call phase only nudged a 2nd tool call after the model had already answered, so live runs reported multi_tool_replay=skipped and never actually exercised the multi-functionCall history. Replace it with an agentic loop driven by a two-file read prompt: each emitted tool call is replayed (carrying its captured thought_signature) and answered with a synthetic result, so by the final turn we send two assistant functionCall blocks and assert the backend accepts the transcript. Surface the verified/skipped status in the doctor report detail. Verified live: provider-doctor antigravity -m gemini-3.1-pro-high --tier full now reports 'multi-call signature replay verified'. * perf(resume): parallelize session loading; scope onboarding picker Cold /resume and onboarding/catch-up pickers were dominated by serial per-file IO+JSON parsing over large session histories (87k jcode snapshots + hundreds of Codex/Claude transcripts here). - Add a bounded scoped-thread parallel_map helper in the session picker loader and use it for: candidate mtime stat (readdir then parallel stat), the jcode summary parse pass (two-phase: parallel fill to scan_limit, then parallel saved-gate over the tail), and the external Codex/pi/opencode stub parsers. - Load the catch-up 'seen' state once (CatchupSeenSnapshot) instead of re-reading catchup_seen.json per session. - Onboarding transcript picker now loads only the relevant external CLI (load_external_cli_sessions_grouped) instead of the full load_sessions_grouped on the UI thread. - Catch-up picker now opens from cache and refreshes off-thread via the shared async picker-load path instead of blocking the live session. Measured on real data (idle, 4 runs each): load_sessions ~660ms -> ~434ms (~34%) load_sessions_grouped ~685ms -> ~465ms (~32%) onboarding picker load ~685ms (UI thread) -> ~14ms scoped CLI load * chore(release): bump version to 0.22.0 Minor bump covering the 44 commits since v0.21.0, including: - Eager token-by-token reasoning streaming and per-line multi-line thinking rendering in the TUI. - Provider fixes: Gemini schema/thought_signature handling, Kimi reasoning_content, OpenRouter empty-message guard, Anthropic 1M context + split-cache cost accounting, API-key vs OAuth auth mode. - Swarm: route messages by target, broadcast to whole swarm, inherit coordinator model/auth route on spawn. - Self-dev reload correctness (daemon reloads into advertised binary), reload-trace OOM cap, and provider-doctor generic native suites. - Served telemetry dashboard with accurate user/install metrics and /skills + endorsed NVIDIA CUDA-X skills. * feat(skills): endorse Anthropic frontend-design skill Add Anthropic's official frontend-design skill (the best design-focused agent skill) to the endorsed list under a new 'Anthropic Design' category, sourced from github.com/anthropics/skills with an install hint. * fix(onboarding): record auth_success for auto-imported logins The guided first-run onboarding flow auto-imports existing external CLI logins (Claude/Codex/Gemini/Copilot/Cursor/OpenRouter) via run_external_auth_auto_import_candidates, which bypasses the manual pending_login path that record_auth_success was wired into. As a result every auto-imported login -- the happy path of the new onboarding -- was invisible to the activation funnel, making auth_success undercount badly (observed: more users reaching first_assistant_response than auth_success in post-0.17 install cohorts, which is impossible without auth). Surface coarse (provider, method="import") telemetry labels from the import outcome and record auth_success for each imported provider in both the onboarding and manual /login auto-import callers. Domain logic in jcode-app-core stays telemetry-free; the TUI layer emits the event, matching existing call sites. * perf(session-picker): tail-read external transcript previews Codex/Claude Code preview loaders parsed the entire JSONL transcript (often multiple MB, up to tens of MB) on every selection change just to show the last ~20 messages. In the onboarding resume menu this made arrow-key navigation lag badly, since each selection spawned a fresh full-file parse thread. Normal /resume (jcode native sessions) avoids this path, which is why only onboarding felt slow. Read only the trailing 512 KiB of the file instead: drop the partial first line, skip malformed boundary records, and parse the rest. This turns each preview load from ~140ms into ~1ms regardless of transcript size. Adds regression tests covering large (>cap) Codex and Claude transcripts. * skill_manage: include endorsed skill catalog in list output * feat(gmail): add pluggable Composio managed-OAuth backend The native gmail tool keeps its interface, confirmation gating, access tiers, and token-lean formatting, but its auth/transport is now pluggable via GmailBackend (Direct | Composio). - Direct: existing local Google OAuth tokens. - Composio: routes the same Gmail REST calls through Composio's proxy-execute endpoint, brokered by a Google-verified app. No unverified-app warning and no 7-day testing-mode token expiry. Backend is selected via JCODE_GMAIL_BACKEND=composio + COMPOSIO_API_KEY. Capability checks (is_configured/can_send/can_delete) are now backend-aware. Adds unit tests and docs/GMAIL_COMPOSIO_BACKEND.md. * desktop: pace animation redraws to ~60fps instead of busy-spinning The desktop render loop re-requested a redraw immediately after every animated frame (welcome-hero reveal, focus pulse, spinners, smooth scroll, streaming) in both the RedrawRequested handler and the AboutToWait fallback. Because the surface uses non-blocking Mailbox presentation, present() returns instantly, so the loop rendered as fast as the CPU allowed (~300fps on a 60Hz panel) and pinned the main thread near 100% CPU. That starved input handling and compositor scheduling, which is the root cause of the laggy/janky animations and scrolling, and made streaming events queue for 200ms-1s before the UI could process them. Schedule a paced redraw (DESKTOP_ANIMATION_FRAME_INTERVAL = 16ms, serviced via ControlFlow::WaitUntil in AboutToWait) instead of an immediate request. Measured idle main-thread CPU on the welcome screen dropped from ~99% to ~0-3%, frame rate from ~305fps to display refresh, while the stream-e2e benchmark still passes all interaction/no-paint budgets (max no-paint gap 71ms vs 250ms budget). * telemetry-worker: remove served web dashboard The visual dashboard (dashboard.js + stats.js + GET / and GET /v1/stats routing) was a separately-deployed Cloudflare Worker UI that does not belong in the jcode repo. Remove it and restore the worker to its POST /v1/event ingest-only surface. The telemetry accuracy work it was built on (turn_end meaningfulness, CI exclusion, the daily_active_users rollup) stays. users.sql remains as a CLI query alongside dau.sql / health.sql. * desktop: parallelize hero reveal texture build to cut welcome stutter The runtime welcome-hero mask ("Hello there") is built once on the first single-session frame, but build_hero_reveal_texture runs a per-lit-pixel nearest-stroke search (O(pixels x segments)) on the UI thread, costing ~600ms and stalling the start of the reveal animation. Split the per-pixel fill across worker threads via std::thread::scope (rows are independent and read-only over glyph_rgba/segments), reducing the one-time build cost. Output is bit-identical to the serial path; small images fall back to serial to avoid spawn overhead. Added parity and worker-count tests. * fix(provider): show active OpenAI-compatible profile name in header (#329) The header and info widget hard-coded 'OpenRouter' for any model routed through the OpenRouter slot, even when the user switched to a direct OpenAI-compatible profile such as NVIDIA NIM at runtime. The display name was resolved from process env vars that only reflect the startup profile, so a runtime '/model' switch never updated the label. Add a runtime-aware Provider::display_name() (default = name()) overridden by OpenRouterProvider (maps profile_id -> 'NVIDIA NIM', etc.) and MultiProvider (delegates to the active execution runtime). name() stays the stable machine id ('openrouter') that billing/routing keys off. format_model_name() in the header now uses the active provider's display name instead of a fixed 'OpenRouter:' prefix. Adds regression tests. * desktop: add --real-transcript-scroll-benchmark to profile scrolling on real transcripts * fix(tui): don't snap selection to bottom edge when already pinned The copy-selection drag edge auto-scroll "hot zone" (top/bottom few rows of the chat pane) fired unconditionally whenever a drag entered the band. When the transcript was already pinned to the bottom (the common case), dragging into the bottom rows snapped the selection cursor to the very last visible line and armed a downward autoscroll, even though there was nothing more below to scroll into. This made it impossible to precisely highlight the bottom rows of the transcript: the selection kept jumping to the end. Gate each directional hot zone on whether there is actually more transcript to scroll into that direction (scroll > 0 for up, visible_end < line_count for down). When there is nothing to scroll, the edge band stays inert so the selection lands on the exact cell under the cursor. Adds a regression test that drags into the bottom hot zone while pinned to the bottom and asserts no autoscroll arms and the selection lands on the targeted line. * feat(gmail): add in-agent Composio connect (OAuth) action Adds a 'connect' action to the gmail tool that drives Composio's hosted Connect Link flow: it creates an auth-link session, opens the Google consent screen in the browser, polls until the connection is ACTIVE, and persists the connected account to ~/.jcode/composio_gmail.json so future sessions are already authorized. - ComposioConfig gains auth_config_id + persisted-connection fallback. - GmailClient: connect(), needs_connection(), supports_connect(), create_link()/wait_for_connection() against /connected_accounts. - tool/gmail.rs handles 'connect' before the config gate and hints the agent to connect when no account exists yet. - Tests for connect/needs_connection/effective_user_id; docs updated. * tui: pace streaming text reveal to fix Anthropic choppiness The StreamBuffer previously revealed text the instant a provider delta arrived (only capping bursts >96 chars per frame). OpenAI emits many tiny token deltas so this looked smooth, but Anthropic coalesces deltas into 20-40 char bursts with gaps, so each burst popped in at once and the UI stair-stepped. Replace the burst cap with a time-paced proportional reveal: text accumulates in a backlog and drips out at base + gain*backlog chars/sec, with the per-step elapsed time clamped so idle gaps cannot bank budget that dumps the next burst. This smooths bursty providers while keeping fast steady feeds responsive. Remote tick now reveals via flush_smooth_frame() to match; flush() still drains fully at finalize. * feat(onboarding): show both Codex and Claude Code sessions in resume picker The first-run onboarding 'continue where you left off' picker previously surfaced only ONE external CLI: when a user was logged into both Codex and Claude Code, it picked whichever had the most recent transcript and hid the other CLI's history entirely. Now the onboarding picker loads and displays every detected external CLI's transcripts together in one combined, recency-sorted list: - Add SessionFilterMode::ExternalClis (Codex OR Claude Code). - Add load_external_cli_sessions_grouped_multi to load several CLIs. - onboarding_open_transcript_picker now takes the full detected CLI set; the banner reads 'We found your Codex and Claude Code sessions' when both are present. Resume still works off each session's own id/source, so selecting either CLI's transcript resumes correctly. Adds a regression test seeding both a Codex and a Claude Code transcript and asserting both appear. * feat(display): default reasoning display to current Show the model's live reasoning out of the box. DisplayConfig now defaults reasoning_display to Current (with show_thinking=true to keep the provider request + streaming display paths in sync), and the generated default config documents reasoning_display = "current". * provider-doctor: add observe-only reasoning_capability checkpoint + parallel tool-call probe - New REASONING_CAPABILITY checkpoint (taxonomy v3), never required for user-readiness and excluded from strict coverage. A reasoning word problem is sent and the turn is classified streamed/opaque/none from StreamEvent signals (ThinkingDelta text, ThinkingSignatureDelta, OpenAIReasoning, and Gemini-3 tool thought_signature). Absence records 'none' and passes. - Shared native tool smoke gains a Phase 3 that asks for two tool calls in a single assistant message, replays both tool_use blocks (each with its own thought_signature) in one assistant turn and answers both results, recording parallel_tool_calls: verified|skipped (best-effort, never fails). - Wired reasoning into the antigravity, generic-native, and claude drivers; skipped on non-full tiers; surfaced in the doctor report detail. - Unit tests for classification, parallel replay shape, detail strings, and the observe-only contract (probe error -> skipped, never failed). * fix(gemini/antigravity): surface leftover Gemini-3 thought signature as reasoning signal A Gemini-3 thoughtSignature that was not consumed by a following functionCall (e.g. a pure-text reasoning turn) was silently dropped. Emit it as a ThinkingSignatureDelta instead so reasoning-aware consumers (and the new provider-doctor reasoning probe) can observe that the model reasoned even when no reasoning text and no tool call were produced. * docs(provider-doctor): correct reasoning probe answer comment (4 cows) * desktop: cache measurement FontSystem for inline-code pill geometry; add scroll diag instrumentation Building a fresh FontSystem every frame (rescanning all system fonts) inside the inline-code/math pill geometry builder caused multi-ms per-frame scroll spikes over code blocks. Reuse a thread-local measurement FontSystem instead. * fix(reload): newer client drags a stale older server forward on attach Fixes the 'current client (v0.22), stale older server, /update only updates the client' report. Root cause: the decision to upgrade the server runs in the OLD server process, which a newer client cannot retroactively fix. Two gaps: 1. Detection: the client deferred + reloaded an older server only when the server self-reported server_has_update. An old daemon whose shared-server channel still points at its own binary legitimately reports Some(false) ('nothing newer to reload into'), which the client trusted -> stuck forever. Now a client-proven-older release (server_version < client_version, clean semver) always wins and defers, regardless of the server's self-report. 2. Reload target: even after deferring, a forced reload re-execs whatever the shared-server channel points at -- still the old binary. The new client now repairs the shared-server channel client-side before reloading (repair_stale_shared_server_channel): repoint shared-server -> stable when stable is strictly newer by mtime. Never downgrades, and preserves a deliberately-pinned self-dev build that is fresher than stable. This is version-agnostic (no per-version allowlist): any server that is a strictly-older clean release than the connected client gets dragged forward. Existing recover_reloading_server + 3-attempt loop cap handle the case where a reload still does not take (fresh spawn self-heals via the candidate logic). Tests: - build-support: repair repoints stale->stable, no-op when current, preserves a fresher self-dev pin, never downgrades when stable is older. - tui: client-proven-older server with server_has_update=Some(false) now defers; same/newer server still trusted; full-path sandbox drives the real handle_server_event History handler against a temp JCODE_HOME in the field state and asserts the shared-server channel is repaired to the new release. * desktop: don't force Advanced text shaping for standalone emoji Standalone pictographic emoji/symbols (🔄 ⬜ → ✓ etc.) render identically under Basic and Advanced cosmic-text shaping, so escalating the whole visible-window buffer to Advanced shaping for them was pure per-frame scroll overhead on emoji-rich transcripts. Only sequences that truly need shaping (variation selectors, ZWJ, regional-indicator flag pairs) and lines carrying inline-code/ math spans still use Advanced. Cuts worst-case scroll-frame shaping cost. * fix(reload): repair stale shared-server channel in 'jcode server reload' 'jcode server reload' (run by installers and the TUI's stale-server reload path) now repairs the shared-server channel before sending the forced reload. The running daemon resolves its reload target from that channel; if it still points at the daemon's own old binary (the 'current client, stale server' state after a no-op /update), a forced reload would just re-exec the same old binary. Repairs shared-server -> stable when stable is strictly newer (never downgrades, preserves a fresher self-dev pin). Adds scripts/stale_server_upgrade_sandbox.sh: a live end-to-end sandbox that starts a REAL released v0.14.6 daemon and runs the new client's 'jcode server reload', asserting the daemon upgrades to the new release. Verified locally: v0.14.6 daemon -> v0.22 after reload, deterministic across runs, fully isolated from the real global daemon via JCODE_SOCKET. * swarm: honor explicit auth-route prefix in agents.swarm_model Configuring agents.swarm_model with an explicit auth-route prefix (e.g. openai-api:gpt-5.5, openai-oauth:..., claude-api:..., claude-oauth:...) now pins spawned swarm agents to that exact model + provider + auth route instead of inheriting the coordinator's model. The prefix is split into a bare model plus stable provider_key/route_api_method ids that round-trip through ModelRouteApiMethod::parse on session restore. Lets users force spawned agents onto a specific API-key route (e.g. GPT-5.5 via the OpenAI API) regardless of what the coordinator is running. * build: optimize text-shaping deps in dev/selfdev/test profiles cosmic-text/rustybuzz/ttf-parser/swash/yazi/fontdb do all desktop transcript glyph shaping and are 15-40x slower at opt-level=0, making debug/selfdev scrolling of real emoji/markdown-heavy transcripts janky (p99 ~238ms) even though release was smooth. Pin these stable third-party crates to opt-level=3 in dev/selfdev/test (same one-time-compile trick already used for jcode-tui-anim). Debug-build scroll p99 drops 238ms -> 8.4ms with no impact on recompile speed of jcode's own crates. * desktop: add --real-transcript-action-benchmark for multi-action profiling Profiles a realistic mix of user actions (smooth/whole-line scroll, selection drag, composer typing, model-picker and session-switcher toggles, window resize, and streaming growth) against the user's largest real on-disk transcripts, each phase measured as per-frame CPU p50/p95/p99/max with a 120fps budget check. Complements --real-transcript-scroll-benchmark for broad interaction coverage. * desktop: make action-benchmark streaming phase mirror production incremental wrap The streaming_growth phase re-wrapped the entire transcript every frame, which production avoids by caching the wrapped static base and only appending the wrapped streaming tail. Mirror that here: wrap the static body once, then per frame truncate to the static base and append the tail. Drops measured streaming_growth p99 ~72ms -> ~18ms, reflecting the real production path. * desktop: make action-benchmark resize phase reuse cached raw styled lines Production caches the raw (unwrapped) styled body lines across resizes and only re-runs the width-dependent wrap, via single_session_rendered_body_lines_from_raw_ref. Mirror that in the resize phase instead of regenerating raw markdown lines every frame. Measured window_resize p99 ~64ms -> ~28ms, matching the real path. * perf(agent): scan only new delta for wrapped-tool markers during streaming The streaming text loop re-ran text_content.find(...) over the ENTIRE accumulated response on every TextDelta until a wrapped-tool-call marker was found. For normal answers (no marker) that scanned everything every token: O(response) per delta, O(response^2) over a full streamed answer. Scan only the newly appended delta plus a short overlap window (so a marker straddling the append boundary is still detected), giving O(delta) per token. Add unit tests asserting equivalence to a full rescan across chunk sizes, unicode, and the boundary-straddle case. * perf(openrouter): drain consumed SSE prefix instead of reallocating buffer parse_next_event reassigned self.buffer = self.buffer[pos+2..].to_string() for every SSE event, copying and reallocating the entire remaining buffer each time. When one network chunk batches many SSE events this is O(buffer^2). Use String::drain(..pos+2) to remove the consumed prefix in place. Pure behavior-preserving refactor. * perf(tui): avoid rescanning transcript prefix in incremental body prep prepare_body_incremental recounted user messages in messages[..prev_msg_count] on every incremental append to seed prompt_num. Appending one message at a time over a long session made that cumulative O(n^2). prev.user_prompt_texts is extended in lockstep with each rendered user message, so its length already is the prior user-prompt count; use it directly for O(1) seeding. * perf(session-picker): partition filtered refs by group in one pass rebuild_items scanned every filtered session ref once per server group to collect that group's sessions: O(groups * filtered_refs). With many remote server groups and many sessions this scaled poorly on every search keystroke. Bucket the filtered refs by group_idx in a single O(filtered_refs) pass, then emit groups in order (O(groups)). Behavior (grouping, ordering, saved-id filtering) is preserved. * fix(tui): fully select the last line when dragging past the bottom Follow-up to the previous fix that stopped the edge auto-scroll hot zone from snapping the selection to the last line while pinned. That left a gap: dragging *past* the last line (down into the empty area below the content-sized chat pane) no longer extended the selection at all, because that overshoot row maps to no line and copy_point_from_screen returned None. Native terminal/browser selection treats dragging past the last line as "select through the end of that line". Add copy_pane_drag_point(), which clamps vertical overshoot to the nearest in-bounds line edge: a drag below the last visible line snaps to the end of that line, and a drag above the first visible line snaps to its start. A direct hit on a real line still yields precise per-cell selection. Use it for both Drag and Up so the boundary line is fully covered during the drag and on release. Adds a regression test that anchors on the last content line, drags straight down past the bottom of the pane with the cursor x only partway through the line, and asserts the whole last line (through its end) is selected without arming autoscroll or scrolling. * feat(tui): collapse 'current' reasoning with a height animation In 'current' reasoning-display mode the live dim/italic reasoning used to vanish in a single frame when the answer committed or a tool ran, snapping the transcript upward. Instead, on close the reasoning block is sliced out of the streaming buffer into a dedicated collapsing 'reasoning' display message that height-collapses (ease-out, oldest line first) toward a one-line '▸ thought for Xs' summary, leaving a trace behind. - New ReasoningCollapse state + begin/advance/finalize on App. - Renders via a new 'reasoning' display role (dim+italic, sentinel-stripped). - Redraw loop (local + remote tick, turn loop) advances the animation; redraw policy keeps frames live while collapsing. - Reduced-motion / low-power tiers snap straight to the summary. - Guards drop the animation safely on transcript reset/replace. - Tests: block parsing, summary labels, monotone collapse, finalize, reduced-motion snap, and end-to-end dim/italic render of the role. * perf(tui): maintain display-message counters incrementally on append bump_display_messages_version recomputed display_user_message_count and display_edit_tool_message_count by scanning all display messages twice on every mutation. Appending one message at a time over a long session made counter maintenance cumulatively O(M^2). The hot append path now folds the single new message into the cached counters (O(1)) and bumps the version without a full rescan; rarer bulk/remove/replace paths still recompute fully. Add a test asserting the incrementally-maintained counters match a full recompute after interleaved pushes and removes. * fix(antigravity/gemini): recover from intermittent Gemini-3 MALFORMED_FUNCTION_CALL Gemini-3 thinking models intermittently emit Python-style pseudo-code (e.g. print(default_api.read(...))) instead of a clean functionCall, which the Cloud Code backend rejects with finish_reason=MALFORMED_FUNCTION_CALL and empty content. Previously the runtime ended the turn with a silent empty MessageEnd, so the agent looked like it stalled with no answer. For gemini-3.1-pro-high this hit roughly half of tool turns. Three layered mitigations (per Gemini function-calling guidance / field reports): 1. Prevention: when tools are advertised, append a 'Function calling' guard to the Gemini system prompt forbidding code/namespaces (build_system_instruction_with_tool_guard). 2. Transparent retry: detect a malformed empty turn (is_retryable_empty_turn) and re-request up to twice before surfacing anything, so the agent never sees the blip. Retries force function-calling mode ANY so the model must emit a real functionCall instead of pseudo-code. 3. Surfacing: if output is still empty after retries, emit an actionable error (with the finish_reason and finishMessage) instead of a silent empty turn. Also surfaces the previously-hidden finishMessage for diagnosis. Measured on the live Antigravity backend: gemini-3.1-pro-high tool-call success went from ~50% to ~7/8 (remaining miss was a probe-deadline timeout, not malformed). Unit tests cover the guard and the retry classifier. * perf(desktop): memoize rendered body lines; stop re-wrapping transcript per mouse move Selection hit-testing (single_session_visible_body -> body viewport -> single_session_rendered_body_lines_for_tick) re-parsed markdown and re-wrapped the ENTIRE transcript on every selection mouse-move during a drag, an O(transcript) cost per pointer event. Add a thread-local single-entry memo keyed by the existing body cache key and return the wrapped lines as a shared Rc, so the viewport only clones the visible slice instead of the whole transcript. The render hot path keeps its separate Canvas-side cache, so this only accelerates input/scroll-metric/geometry callers. Measured on real transcripts (debug build): per-mouse-move selection hit-test p99 ~121ms -> ~0.06ms. Adds a selection_input_hittest benchmark phase that isolates this cost, plus a debug-only env gate for A/B measurement. * perf(ambient): pre-filter recent sessions by file mtime before parsing gather_recent_sessions fully parsed every session JSON file (the sessions dir can hold tens of thousands) just to drop those older than the 24h cutoff and keep the 20 most recent: O(all_sessions * parse) per ambient cycle. Pre-filter candidate files by filesystem mtime (with a 1h margin for write/clock skew) before loading, sort newest-first, and only parse up to a bounded budget (4x the limit) before the existing id-based sort/truncate. Behavior is preserved; work drops from O(all_sessions) to O(recent_sessions). * perf(tui): normalize inline-picker fuzzy pattern once per keystroke picker_fuzzy_score re-lowercased and re-collected the filter pattern into a Vec<char> on every call, i.e. once per entry inside the per-keystroke filter loop (O(entries * pattern)). Hoist pattern normalization out via picker_fuzzy_pattern + picker_fuzzy_score_with_pattern and normalize once per filter pass. Scoring behavior is unchanged. * perf(session-picker): avoid cloning cached search refs when narrowing search_matched_session_refs cloned the cached match set into candidates and then cloned the new matches back into the cache: two full-list clones per narrowing keystroke. Take the cached refs in place via mem::take (it is about to be overwritten anyway), eliminating the candidates clone. Behavior unchanged. * perf(tui): compute copy-selection status metrics without building selection text copy_selection_status built the entire selected string via current_copy_selection_text just to report char/line counts in the status line. This ran on every render frame while in copy mode, including every drag move, so a large selection (e.g. select-all) re-allocated and re-joined the whole transcript text each frame. Add copy_selection_metrics (and a raw-lines fast path) that counts chars/lines using the same slicing logic without allocating the joined string, and use it for the status line. Add a test asserting the metrics exactly match the built selection text's char/line counts. * perf(tui): compute mermaid/image regions in one reverse pass (O(L) not O(L^2)) For each image/mermaid placeholder, body prep scanned forward through all following blank lines to compute the placeholder height. A message with many placeholders each followed by long blank runs made this O(wrapped_lines^2). Precompute, in a single reverse pass, the blank-run length starting at every line; the placeholder height is then an O(1) lookup. Extracted into a shared compute_image_regions helper used by both wrap_lines and wrap_lines_with_map. Behavior is identical (height = 1 + trailing blank run). * feat(swarm): enrich swarm list with live activity, churn, turns, and todo progress swarm list previously returned only a shallow roster (name, role, status, files, age). Enrich each agent row with: - live activity (processing + current tool name) - provider/model - token churn over a recent ~10s window + cumulative tokens - turn count - todo progress (completed/total) - contextual idle/active duration label (idle Ns when ready, Ns when running) - completion report when finished Token churn and turn count are tracked in a new lock-free per-session metrics registry (jcode-base::session_metrics) rather than on the Agent struct, because swarm list reads stats while an agent may hold its own Mutex<Agent> lock mid-turn (try_lock fails exactly when churn is most interesting). Metrics are recorded from the streaming turn loop and run_turn, and forgotten on session disconnect. handle_comm_list now joins swarm membership with live session state and todos, sharing the runtime-extras gathering helper in comm_sync. * test(swarm): cover enriched swarm list rendering (activity, churn, turns, idle label) * fix(tui): honor reasoning_display mode when re-rendering persisted history The 'current' reasoning collapse only ran in the live streaming path. When the transcript was re-rendered from stored history (self-dev reload, resume, remote sync, compaction-window expand), the shared history renderer replayed every persisted reasoning trace in full regardless of reasoning_display mode, so after the collapse animation finished a reload would bring all the reasoning back. format_reasoning_markup now honors the active mode: - Off: persisted reasoning is hidden entirely. - Current: the block folds to a single '▸ thought (N lines)' trace line, matching the live collapse end state. - Full: classic full replay (unchanged). Adds reasoning_summary_line_markup helper + tests for all three modes. * chore(release): bump version to 0.23.0 * fix: propagate route_api_method in SubagentTool + resolve loading.rs ForeignSession --------- Co-authored-by: jeremy <94247773+1jehuang@users.noreply.github.com> Co-authored-by: quangdang46 <quangdang46@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.