v0.0.4

Latest

Latest

github-actions released this 10 Jun 17:22

· 8 commits to main since this release

f586506

0.0.4

Added

Added a think tool — a no-op scratchpad the model calls to reason in the open mid-task (plan the next move after a batch of tool results, weigh a trade-off and state the rejected option, check a plan against a record_decision before editing). It writes nothing to disk, fetches nothing, and returns nothing new — the thought just lands in the conversation so the reasoning is in the record and survives into later turns. Read-only (so it never needs approval) and kept for sub-agents (it survives the wildcard filter and is whitelistable by name); the system prompt's tool-discipline list teaches the timing — use it to think, not to narrate, and skip it when the next step is already known. Aliased (scratchpad) and tolerant of the field spellings models reach for (thought/text/note/reasoning).
Added MCP resources — list_mcp_resources and read_mcp_resource: when a connected MCP server advertises the resources capability at handshake, tomte registers one shared pair of tools so the model can pull a server's files/docs/configs into context by URI (parity with Claude Code's ListMcpResources/ReadMcpResource). list_mcp_resources with no server aggregates every resource-capable server under a header; with a server it lists just that one. read_mcp_resource reads one URI, disambiguating with server only when several servers expose resources (a sole server is used automatically; an unknown name errors with the known servers listed). Both are read-only (auto-approvable), and the server output is wrapped in the same <untrusted-mcp-output> provenance fence as tool results — a compromised server can't forge the close tag or smuggle a framework marker. The tools are never deferred behind tool_search (they aren't mcp__-namespaced), and they only appear when a server can actually serve them, so they don't clutter the tool list otherwise.
Added TOMTE_MAX_CONTEXT_TOKENS — an env override for the resolved context-window size, mirroring Claude Code's CLAUDE_CODE_MAX_CONTEXT_TOKENS. Pin the window for a gateway/proxy whose real limit tomte can't infer from the model name, or shrink it to force earlier compaction. Accepts a bare integer or a k/m suffix (200000, 200k, 1m; case-insensitive, underscores tolerated); a valid value is clamped to [8k, 20m] so a typo can't make every turn read as ≥100% full and thrash the compaction path, and an unset or unparseable value leaves the catalog/provider value untouched. It flows through the single effective_context_limit choke point, so the status-bar gauge, /context, the warn/auto-compact thresholds, and microcompaction all honor it together.
Made the collapsed "Thought for Xs" line click-to-expand. Once the model's live reasoning collapses into the compact Thought for Xs line, you can now left-click it to re-show the full thought (and click again to hide it) — the same click-target affordance as the "Jump to bottom" bar and the fleet rows. The reasoning text is retained instead of discarded on collapse, the line carries a dim (click to show) / (click to hide) hint, and expanding works even with live thinking display turned off (it's an explicit request). Implemented without disturbing the render cache's fast path: each visible thought line's screen rect is mapped per frame (append-only marks tracked alongside the cached lines), and toggling re-wraps only the affected block. Alt-screen renderer; the opt-in inline viewport keeps the collapsed line without the click target.
Made the busy spinner narrate what tomte is actually doing. It already borrowed an in-progress todo's active_form as the live word (Claude-parity), but most short turns carry no todo, so the line fell straight to a whimsical pool word (Pottering…). Now, when there's no in-progress todo, the spinner shows the running tool's plain action verb — Reading…, Running…, Searching…, Editing…, Delegating… — derived from the most recent tool call whose result hasn't arrived yet. The precedence is todo active_form → running-tool verb → the drifting pool word (kept for the gaps between tool calls, so tomte's voice survives). Meta tools (todo_write/goal_update/wait), MCP tools, and unknown names fall through to the pool rather than reading something unhelpful.
Added a per-read output ceiling to read_file, matching how Claude Code keeps a single file read token-bounded. The 2000-line and 2000-char/line caps bounded line count and width but not total size, so a file of long-but-under-2000-char lines could dump hundreds of KB — hundreds of thousands of tokens — in one read, burning context and the user's limit. A read now stops once its rendered output passes ~25k tokens (Claude Code's CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS default; tomte estimates ≈4 bytes/token) — even under 2000 lines — and emits the same continue with offset=N notice as the line cap, so nothing is lost, just paged. A TOMTE_READ_MAX_TOKENS env override mirrors Claude Code's knob (clamped to a sane band). Normal small/medium reads are unchanged; the description now nudges toward reading only the slice you need (offset/limit, or grep first), which is cheaper than a capped full read. Applies to both the in-memory and large-file (>5 MB streamed) read paths.
Added tomte receipt — the work receipt: one Markdown / HTML / JSON artifact that proves a stretch of work instead of transcribing it, ready to attach to a PR. It bundles a fresh Proof Capsule (the files git reports changed plus the real exit codes of the project's own test/typecheck/lint/build, run by the CLI right now), whether HEAD carries a verified Commit Seal (checked with the same binding rules as tomte seal verify), what the session actually did — the shell commands run and files edited, read from the persisted session log (the CLI's own record of executed tool calls, never a model's recollection) with the per-model token/cost receipt — and the newest recorded decisions with the drift-watch counts. Sections degrade gracefully (outside a repo, with no saved sessions, with an empty trail, the receipt says so), lists are capped with "and N more" pointers at the full stores, and the HTML page is standalone with all interpolated text escaped. --session <id> picks an older session (default: the project's newest), --json/--html pick the format, --out writes a file. It always renders, red or green — the gates remain tomte prove and tomte seal verify.
Added the official GitHub Action — "Done means verified" as a PR gate. uses: ryan-mt/tomte@v0.0.4 installs the released binary (checksum-verified against the published .sha256), runs tomte prove (the project's own test/typecheck/lint/build, real exit codes) and tomte rounds (drift watch, risk risers, hot-and-untested files; --no-proof automatically when prove already paid for the check suite), optionally requires a bound Commit Seal on HEAD (seal-verify: "true"), and fails the job when any selected gate is red. The full evidence lands in the job's step summary (the PR check page) with long outputs truncated to a stated tail — never silently — and comment: "true" posts it as one self-updating PR comment (needs pull-requests: write). Inputs select the gates, the release version, and the working-directory; the action exposes a verified output for downstream steps. It deliberately brings no toolchain — the project's own setup steps run first, and tomte measures that project's checks.
Added monorepo coverage to the Proof Capsule (tomte prove, and everything built on it — /prove, seal, rounds, receipt, the race judge, the GitHub Action). The capsule used to verify only the primary ecosystem at the root (a Cargo.toml beat everything), so a repo carrying a second stack — a Next.js site beside a Rust workspace, a src-tauri/ beside a Node app — could ship that stack completely unverified while the card read green. Now an immediate sub-directory holding a different ecosystem's manifest gets its own checks too, named <dir>:<check> (e.g. tomte-website:lint), run inside that directory, and folded into the same verdict — a red website lint fails tomte prove exactly like a red cargo test. Same-kind sub-projects are deliberately left to the root toolchain (a cargo workspace or npm workspaces already run their members), so nothing runs or bills twice; hidden directories and dependency/build dirs (node_modules, target, dist, …) are never treated as project roots, sub-project discovery is sorted so equal trees plan equal capsules, and the reproduce line wraps each sub-check as (cd <dir> && …) so the pasted command still runs everything exactly where the capsule did.
CI now covers the website: a website job runs npm ci, npm run lint, and npm run build for tomte-website/ on every push and PR, so a broken site can no longer ride in under green Rust checks.
Added tomte why diff [base] (and /why diff [base] in a session) — review the reasoning, not just the code. A PR review reads the diff; this reads the decision trail against the same range (the merge-base with base, defaulting to the first of origin/main/main/origin/master/master that resolves) and answers what the diff can't: which decisions are new in this range (each flagged when it points outside the changed files), which earlier decisions were superseded here — promises deliberately broken, shown as was → now — and which changed files carry no recorded why at all (the reviewer's gap list; committed, uncommitted, and untracked files all count). Everything is computed from real state — git for the range, the project's own trail for the decisions — the analysis core is pure and fully unit-tested, and --json emits the report for scripting. This is the view a clone can't fake quickly: it needs the store, the anchors, and the supersede links the trail has been accumulating all along.
Added tomte models (and /models in a session) — the model lineup from real state. One card answers what was previously scattered across the docs, the auth store, and config.json: every model tomte can drive with its context window and thinking capabilities (adaptive/extended thinking, xhigh eligibility — straight from the model catalog, family fallbacks included), which credentials are actually present per provider (presence and source only — OAuth/API key/env — never token contents), which OpenAI ids the ChatGPT-subscription backend rejects (API key only), the active model and reasoning effort, and the exact failover chain an overload would walk (the configured fallback_models, the built-in ladder, or an honest off/none). --json emits the same report for scripting; a provider with no credential points at tomte login instead of pretending its models are one keypress away.
Added a built-in failover ladder, so a rate-limited or overloaded model no longer kills the turn just because fallback_models was never configured (the default for every fresh install). When the chain is empty, reactive failover now walks a conservative same-provider ladder instead of giving up: claude-fable-5 → claude-opus-4-8 → claude-sonnet-4-6 on Anthropic, gpt-5.5 → gpt-5.4 on OpenAI — anchored at the active model's tier and only ever moving sideways or down, so failover never silently moves a session onto a more expensive model. Deliberately conservative everywhere else too: every ladder id is accepted by every auth mode of its provider (the ChatGPT-subscription backend rejects mini/nano/pro ids), the Anthropic ladder stops at Sonnet (auto-dropping a coding session to Haiku would trade a visible error for silently weaker edits), and a model tomte can't place — Haiku, gpt-5.2, a local or third-party endpoint — gets no default chain at all, so an unknown provider is never rerouted to one the user didn't pick. All the existing failover guards still apply (only genuine overload errors, never fatal 4xx/refusals/context overflow; candidates without a usable credential are skipped; bounded attempts; announced via the same fallback card). A configured fallback_models list remains authoritative, and auto_fallback: false in config.json restores the old fail-fast behavior.
Added tomte sessions — the saved-session ledger, headless. Bare it lists this project's persisted sessions newest-first (id, age, model, message count, and the first-prompt preview — the same store tomte resume and --continue read); tomte sessions show [id] prints one as a readable markdown transcript — user and assistant messages in full, each tool call as a one-line > tool: note with the argument a human would recognize it by (command, path, pattern, …), and tool results deliberately omitted with the count stated (they dominate the bytes; --json carries the full record), --out writes it to a file; and tomte sessions prune deletes old sessions with two unionable rules — --keep N (keep the newest N) and --older-than-days N — dry-run by default: the plan prints exactly which sessions are selected and nothing is touched until --yes. At least one rule is required, so a bare prune can never select the whole store; ids are validated with the same rules as save/load, so a crafted id can't escape the sessions dir; a per-file delete failure is reported and makes the run non-zero instead of being swallowed.
Added tomte cost --all — one cost ledger across every saved session for the project, instead of one session at a time. Per-model token counts are merged with saturating sums and sorted by model id (equal stores render byte-identical reports), then priced through the same per-billing-class tables as /cost — cache reads and writes keep their discounted rates, and the cross-provider OpenAI/Anthropic subtotals still appear when the history spans both. --all conflicts with --session (one report, one scope), and a session file that no longer loads is skipped — the same posture as the session list.
Added tomte completions <shell> — a shell completion script for the whole command surface (bash, zsh, fish, powershell, elvish), generated from the very clap definition that parses the CLI, so the script can never drift from the real commands and flags. Pipe it where your shell looks (tomte completions bash > ~/.local/share/bash-completion/completions/tomte, tomte completions zsh > "${fpath[1]}/_tomte", or add tomte completions powershell | Out-String | Invoke-Expression to $PROFILE); an unknown shell is a parse error listing the choices, and an early-closed pipe (| head) exits 0 like every other pipeable command.
Added the Context Manifest — prove the context before the edit lands (the Repo Twin's X-ray, now an automatic pre-edit step). The first time a session edits a file, the glass-box pre-flight card gains a ◈ context manifest for this edit section: the files a maintainer would have in context for that edit, each with the real index edge it came from (import / symbol / test / git) and checked against the session's own read log (✓ read this session vs not read yet — claimed context is verified, not asserted), plus the nearby files deliberately left out with the reason each is unreachable. Cache-only by design: it reads the twin's cached index and never builds one inline (no mid-edit stall; no cache → no card, and a cache the tree has outgrown is labeled stale instead of passed off as current). Shown once per file per session, only when the twin actually connects something — an isolated file stays quiet. Don't stuff context; prove it.

Changed

Made the inline viewport the default renderer again (SOUL Pillar 4 — the custodian does not hijack the terminal): finished turns flow into the terminal's own native scrollback, the mouse is never captured, and native wheel-scroll + click-drag selection/copy keep working. 0.0.2 shipped this default on env-var opt-out alone and 0.0.3 reverted it; this time the full-screen alternate screen stays a first-class, persistent choice — set render_mode: "alt" in config.json (aliases altscreen/alt-screen/alt_screen/fullscreen accepted), or use the TOMTE_INLINE env var to override either way per launch (1/true/yes/on forces inline, 0/false/no/off forces alt-screen). Configs predating the field keep working and default to inline.

Fixed

Fixed the inline renderer's transcript never reaching native scrollback on Windows Terminal — wheel-scrolling up showed nothing because there was nothing to scroll to. The scrolling-regions ratatui feature made insert_before commit finished turns by scrolling a DECSTBM region, and ratatui's contract assumes a region that includes row 0 pushes the scrolled-out lines into scrollback — true on xterm/iTerm/kitty, but Windows Terminal discards them instead (microsoft/terminal#3673), so the session's history silently vanished as it was written. The feature is now off: insert_before falls back to scrolling the whole screen with real newlines, which lands in scrollback on every terminal. The clear-and-redraw blink that feature was hiding stays hidden by the synchronized-update (DECSET 2026) wrapper that already batches the scrollback commit and the frame diff into one atomic paint. Note for inline mode: the clickable "Jump to bottom" bar is an alt-screen affordance — inline never captures the mouse, so it cannot know you scrolled; press any key (Windows Terminal snaps on input) or Ctrl+End to return to the tail.
Fixed Ctrl+O being a dead key at rest in the default inline renderer. Inline mode pushes finished turns into the terminal's native scrollback via insert_before, which can never be repainted — so the moment a turn settled, toggling the expanded-tools flag had nothing left to redraw, and every "(Ctrl+O for more)" hint pointed at a key that visibly did nothing. Ctrl+O at rest now opens a full-screen transcript viewer instead: the whole session re-rendered with tool detail expanded (the same leaf renderers as the live view), inside the alternate screen so the native scrollback underneath stays untouched and comes back on exit. It opens at the newest lines (where the hints point), scrolls with ↑/↓, PgUp/PgDn, Home/End, and the mouse wheel (capture is scoped to the viewer — inline mode itself still never takes the mouse), and closes with Esc, q, or Ctrl+O again. Mid-turn the plain toggle still works — the live tail repaints every frame — and the flag now ends with the turn, so a mid-turn expand can no longer bake expanded detail into every future turn's scrollback commits with no way to collapse them. The opt-in alt-screen renderer keeps the toggle as-is; it repaints the whole transcript each frame, so it was always correct there.
Hardened the built-in failover ladder's tier anchoring (found in a fan-out review of the 0.0.4 commits): the anchor used the first ladder tier the model id contained, so a hypothetical id carrying two tier words (claude-sonnet-opus-distill) would anchor at the higher one and the ladder could offer a pricier model than the one the user picked — against the feature's own "sideways or down, never up" rule. The anchor now takes the lowest tier the id mentions; every real single-tier id behaves exactly as before.
Synced the version stamp everywhere it appears: README and the website still read 0.0.3 after the crates moved to 0.0.4; both now carry 0.0.4 (the website's handoff excerpt included).
Fixed a destructive-command classifier bypass via a leading double slash (security, found in a full-source bug sweep). The kernel collapses //etc and ///dev/sda to /etc and /dev/sda, so rm -rf //etc, chmod -R 777 //etc, chown -R root //usr, dd of=//dev/sda, echo x > //dev/sda, and shred //dev/sda hit the same catastrophic targets as their single-slash twins — but the classifier's prefix matches (/etc, /dev/sd, the /*-glob test) missed the doubled form and let those slip the refuse-pending-override gate. A new collapse_leading_slashes normalizes a run of leading / to one before the rm/chmod/chown target and raw-block-device checks, so the doubled spellings now flag exactly like the canonical ones; mid-path // (which already resolves and matches) is untouched. Matters most on Windows, where there is no filesystem sandbox behind the classifier.
Fixed non-determinism in the Repo Twin's file walk (found in the sweep). walk_source_paths returned files in ignore's read_dir order — filesystem- and machine-dependent — so the stored files/imports/symbols order, and the first-N caps that read it (why-context's "examined and excluded" neighbors, the distinctive-symbols trace, and Night Rounds' TODO scan), could differ run-to-run, quietly breaking the "same tree → same card" guarantee the twin/pulse/rounds/race pillars all advertise. The walk now sorts its output, matching every other walker in tomte (glob/grep/list_dir/skill/rules).
Fixed cost mis-billing of cached prompt tokens on the OpenAI-compatible Chat Completions path (found in the sweep). normalize_usage mapped only prompt_tokens/completion_tokens/total_tokens and dropped prompt_tokens_details.cached_tokens, so any Chat provider that reports a cache hit (real OpenAI /v1/chat/completions does) had every cached token billed at the full input rate in /cost instead of the cache-read discount. It now re-emits the cache count under the Responses shape (input_tokens_details.cached_tokens) that the usage classifier already understands. Display-only — context/compaction math was always correct — and it benefits every OpenAI-wire-compatible provider, not just OpenAI.

Assets 10