0.0.4
Added
-
Added a
thinktool — a no-op scratchpad the model calls to reason in the open mid-task (plan the next move after a batch of tool results, weigh a trade-off and state the rejected option, check a plan against arecord_decisionbefore editing). It writes nothing to disk, fetches nothing, and returns nothing new — the thought just lands in the conversation so the reasoning is in the record and survives into later turns. Read-only (so it never needs approval) and kept for sub-agents (it survives the wildcard filter and is whitelistable by name); the system prompt's tool-discipline list teaches the timing — use it to think, not to narrate, and skip it when the next step is already known. Aliased (scratchpad) and tolerant of the field spellings models reach for (thought/text/note/reasoning). -
Added MCP resources —
list_mcp_resourcesandread_mcp_resource: when a connected MCP server advertises theresourcescapability at handshake, tomte registers one shared pair of tools so the model can pull a server's files/docs/configs into context by URI (parity with Claude Code's ListMcpResources/ReadMcpResource).list_mcp_resourceswith noserveraggregates every resource-capable server under a header; with aserverit lists just that one.read_mcp_resourcereads one URI, disambiguating withserveronly when several servers expose resources (a sole server is used automatically; an unknown name errors with the known servers listed). Both are read-only (auto-approvable), and the server output is wrapped in the same<untrusted-mcp-output>provenance fence as tool results — a compromised server can't forge the close tag or smuggle a framework marker. The tools are never deferred behindtool_search(they aren'tmcp__-namespaced), and they only appear when a server can actually serve them, so they don't clutter the tool list otherwise. -
Added
TOMTE_MAX_CONTEXT_TOKENS— an env override for the resolved context-window size, mirroring Claude Code'sCLAUDE_CODE_MAX_CONTEXT_TOKENS. Pin the window for a gateway/proxy whose real limit tomte can't infer from the model name, or shrink it to force earlier compaction. Accepts a bare integer or ak/msuffix (200000,200k,1m; case-insensitive, underscores tolerated); a valid value is clamped to[8k, 20m]so a typo can't make every turn read as ≥100% full and thrash the compaction path, and an unset or unparseable value leaves the catalog/provider value untouched. It flows through the singleeffective_context_limitchoke point, so the status-bar gauge,/context, the warn/auto-compact thresholds, and microcompaction all honor it together. -
Made the collapsed "Thought for Xs" line click-to-expand. Once the model's live reasoning collapses into the compact
Thought for Xsline, you can now left-click it to re-show the full thought (and click again to hide it) — the same click-target affordance as the "Jump to bottom" bar and the fleet rows. The reasoning text is retained instead of discarded on collapse, the line carries a dim(click to show)/(click to hide)hint, and expanding works even with live thinking display turned off (it's an explicit request). Implemented without disturbing the render cache's fast path: each visible thought line's screen rect is mapped per frame (append-only marks tracked alongside the cached lines), and toggling re-wraps only the affected block. Alt-screen renderer; the opt-in inline viewport keeps the collapsed line without the click target. -
Made the busy spinner narrate what tomte is actually doing. It already borrowed an in-progress todo's
active_formas the live word (Claude-parity), but most short turns carry no todo, so the line fell straight to a whimsical pool word (Pottering…). Now, when there's no in-progress todo, the spinner shows the running tool's plain action verb —Reading…,Running…,Searching…,Editing…,Delegating…— derived from the most recent tool call whose result hasn't arrived yet. The precedence is todoactive_form→ running-tool verb → the drifting pool word (kept for the gaps between tool calls, so tomte's voice survives). Meta tools (todo_write/goal_update/wait), MCP tools, and unknown names fall through to the pool rather than reading something unhelpful. -
Added a per-read output ceiling to
read_file, matching how Claude Code keeps a single file read token-bounded. The 2000-line and 2000-char/line caps bounded line count and width but not total size, so a file of long-but-under-2000-char lines could dump hundreds of KB — hundreds of thousands of tokens — in one read, burning context and the user's limit. A read now stops once its rendered output passes ~25k tokens (Claude Code'sCLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENSdefault; tomte estimates ≈4 bytes/token) — even under 2000 lines — and emits the samecontinue with offset=Nnotice as the line cap, so nothing is lost, just paged. ATOMTE_READ_MAX_TOKENSenv override mirrors Claude Code's knob (clamped to a sane band). Normal small/medium reads are unchanged; the description now nudges toward reading only the slice you need (offset/limit, orgrepfirst), which is cheaper than a capped full read. Applies to both the in-memory and large-file (>5 MB streamed) read paths. -
Added
tomte receipt— the work receipt: one Markdown / HTML / JSON artifact that proves a stretch of work instead of transcribing it, ready to attach to a PR. It bundles a fresh Proof Capsule (the files git reports changed plus the real exit codes of the project's own test/typecheck/lint/build, run by the CLI right now), whether HEAD carries a verified Commit Seal (checked with the same binding rules astomte seal verify), what the session actually did — the shell commands run and files edited, read from the persisted session log (the CLI's own record of executed tool calls, never a model's recollection) with the per-model token/cost receipt — and the newest recorded decisions with the drift-watch counts. Sections degrade gracefully (outside a repo, with no saved sessions, with an empty trail, the receipt says so), lists are capped with "and N more" pointers at the full stores, and the HTML page is standalone with all interpolated text escaped.--session <id>picks an older session (default: the project's newest),--json/--htmlpick the format,--outwrites a file. It always renders, red or green — the gates remaintomte proveandtomte seal verify. -
Added the official GitHub Action — "Done means verified" as a PR gate.
uses: ryan-mt/tomte@v0.0.4installs the released binary (checksum-verified against the published.sha256), runstomte prove(the project's own test/typecheck/lint/build, real exit codes) andtomte rounds(drift watch, risk risers, hot-and-untested files;--no-proofautomatically when prove already paid for the check suite), optionally requires a bound Commit Seal on HEAD (seal-verify: "true"), and fails the job when any selected gate is red. The full evidence lands in the job's step summary (the PR check page) with long outputs truncated to a stated tail — never silently — andcomment: "true"posts it as one self-updating PR comment (needspull-requests: write). Inputs select the gates, the releaseversion, and theworking-directory; the action exposes averifiedoutput for downstream steps. It deliberately brings no toolchain — the project's own setup steps run first, and tomte measures that project's checks. -
Added monorepo coverage to the Proof Capsule (
tomte prove, and everything built on it —/prove,seal,rounds,receipt, the race judge, the GitHub Action). The capsule used to verify only the primary ecosystem at the root (aCargo.tomlbeat everything), so a repo carrying a second stack — a Next.js site beside a Rust workspace, asrc-tauri/beside a Node app — could ship that stack completely unverified while the card read green. Now an immediate sub-directory holding a different ecosystem's manifest gets its own checks too, named<dir>:<check>(e.g.tomte-website:lint), run inside that directory, and folded into the same verdict — a red website lint failstomte proveexactly like a red cargo test. Same-kind sub-projects are deliberately left to the root toolchain (a cargo workspace or npm workspaces already run their members), so nothing runs or bills twice; hidden directories and dependency/build dirs (node_modules,target,dist, …) are never treated as project roots, sub-project discovery is sorted so equal trees plan equal capsules, and the reproduce line wraps each sub-check as(cd <dir> && …)so the pasted command still runs everything exactly where the capsule did. -
CI now covers the website: a
websitejob runsnpm ci,npm run lint, andnpm run buildfortomte-website/on every push and PR, so a broken site can no longer ride in under green Rust checks. -
Added
tomte why diff [base](and/why diff [base]in a session) — review the reasoning, not just the code. A PR review reads the diff; this reads the decision trail against the same range (the merge-base withbase, defaulting to the first oforigin/main/main/origin/master/masterthat resolves) and answers what the diff can't: which decisions are new in this range (each flagged when it points outside the changed files), which earlier decisions were superseded here — promises deliberately broken, shown as was → now — and which changed files carry no recorded why at all (the reviewer's gap list; committed, uncommitted, and untracked files all count). Everything is computed from real state — git for the range, the project's own trail for the decisions — the analysis core is pure and fully unit-tested, and--jsonemits the report for scripting. This is the view a clone can't fake quickly: it needs the store, the anchors, and the supersede links the trail has been accumulating all along. -
Added
tomte models(and/modelsin a session) — the model lineup from real state. One card answers what was previously scattered across the docs, the auth store, and config.json: every model tomte can drive with its context window and thinking capabilities (adaptive/extended thinking,xhigheligibility — straight from the model catalog, family fallbacks included), which credentials are actually present per provider (presence and source only — OAuth/API key/env — never token contents), which OpenAI ids the ChatGPT-subscription backend rejects (API key only), the active model and reasoning effort, and the exact failover chain an overload would walk (the configuredfallback_models, the built-in ladder, or an honestoff/none).--jsonemits the same report for scripting; a provider with no credential points attomte logininstead of pretending its models are one keypress away. -
Added a built-in failover ladder, so a rate-limited or overloaded model no longer kills the turn just because
fallback_modelswas never configured (the default for every fresh install). When the chain is empty, reactive failover now walks a conservative same-provider ladder instead of giving up:claude-fable-5 → claude-opus-4-8 → claude-sonnet-4-6on Anthropic,gpt-5.5 → gpt-5.4on OpenAI — anchored at the active model's tier and only ever moving sideways or down, so failover never silently moves a session onto a more expensive model. Deliberately conservative everywhere else too: every ladder id is accepted by every auth mode of its provider (the ChatGPT-subscription backend rejects mini/nano/pro ids), the Anthropic ladder stops at Sonnet (auto-dropping a coding session to Haiku would trade a visible error for silently weaker edits), and a model tomte can't place — Haiku,gpt-5.2, a local or third-party endpoint — gets no default chain at all, so an unknown provider is never rerouted to one the user didn't pick. All the existing failover guards still apply (only genuine overload errors, never fatal 4xx/refusals/context overflow; candidates without a usable credential are skipped; bounded attempts; announced via the same fallback card). A configuredfallback_modelslist remains authoritative, andauto_fallback: falsein config.json restores the old fail-fast behavior. -
Added
tomte sessions— the saved-session ledger, headless. Bare it lists this project's persisted sessions newest-first (id, age, model, message count, and the first-prompt preview — the same storetomte resumeand--continueread);tomte sessions show [id]prints one as a readable markdown transcript — user and assistant messages in full, each tool call as a one-line> tool:note with the argument a human would recognize it by (command, path, pattern, …), and tool results deliberately omitted with the count stated (they dominate the bytes;--jsoncarries the full record),--outwrites it to a file; andtomte sessions prunedeletes old sessions with two unionable rules —--keep N(keep the newest N) and--older-than-days N— dry-run by default: the plan prints exactly which sessions are selected and nothing is touched until--yes. At least one rule is required, so a bare prune can never select the whole store; ids are validated with the same rules as save/load, so a crafted id can't escape the sessions dir; a per-file delete failure is reported and makes the run non-zero instead of being swallowed. -
Added
tomte cost --all— one cost ledger across every saved session for the project, instead of one session at a time. Per-model token counts are merged with saturating sums and sorted by model id (equal stores render byte-identical reports), then priced through the same per-billing-class tables as/cost— cache reads and writes keep their discounted rates, and the cross-provider OpenAI/Anthropic subtotals still appear when the history spans both.--allconflicts with--session(one report, one scope), and a session file that no longer loads is skipped — the same posture as the session list. -
Added
tomte completions <shell>— a shell completion script for the whole command surface (bash, zsh, fish, powershell, elvish), generated from the very clap definition that parses the CLI, so the script can never drift from the real commands and flags. Pipe it where your shell looks (tomte completions bash > ~/.local/share/bash-completion/completions/tomte,tomte completions zsh > "${fpath[1]}/_tomte", or addtomte completions powershell | Out-String | Invoke-Expressionto$PROFILE); an unknown shell is a parse error listing the choices, and an early-closed pipe (| head) exits 0 like every other pipeable command. -
Added the Context Manifest — prove the context before the edit lands (the Repo Twin's X-ray, now an automatic pre-edit step). The first time a session edits a file, the glass-box pre-flight card gains a
◈ context manifest for this editsection: the files a maintainer would have in context for that edit, each with the real index edge it came from (import / symbol / test / git) and checked against the session's own read log (✓ read this sessionvsnot read yet— claimed context is verified, not asserted), plus the nearby files deliberately left out with the reason each is unreachable. Cache-only by design: it reads the twin's cached index and never builds one inline (no mid-edit stall; no cache → no card, and a cache the tree has outgrown is labeled stale instead of passed off as current). Shown once per file per session, only when the twin actually connects something — an isolated file stays quiet. Don't stuff context; prove it.
Changed
- Made the inline viewport the default renderer again (SOUL Pillar 4 — the custodian does not hijack the terminal): finished turns flow into the terminal's own native scrollback, the mouse is never captured, and native wheel-scroll + click-drag selection/copy keep working. 0.0.2 shipped this default on env-var opt-out alone and 0.0.3 reverted it; this time the full-screen alternate screen stays a first-class, persistent choice — set
render_mode: "alt"in config.json (aliasesaltscreen/alt-screen/alt_screen/fullscreenaccepted), or use theTOMTE_INLINEenv var to override either way per launch (1/true/yes/onforces inline,0/false/no/offforces alt-screen). Configs predating the field keep working and default to inline.
Fixed
- Fixed the inline renderer's transcript never reaching native scrollback on Windows Terminal — wheel-scrolling up showed nothing because there was nothing to scroll to. The
scrolling-regionsratatui feature madeinsert_beforecommit finished turns by scrolling a DECSTBM region, and ratatui's contract assumes a region that includes row 0 pushes the scrolled-out lines into scrollback — true on xterm/iTerm/kitty, but Windows Terminal discards them instead (microsoft/terminal#3673), so the session's history silently vanished as it was written. The feature is now off:insert_beforefalls back to scrolling the whole screen with real newlines, which lands in scrollback on every terminal. The clear-and-redraw blink that feature was hiding stays hidden by the synchronized-update (DECSET 2026) wrapper that already batches the scrollback commit and the frame diff into one atomic paint. Note for inline mode: the clickable "Jump to bottom" bar is an alt-screen affordance — inline never captures the mouse, so it cannot know you scrolled; press any key (Windows Terminal snaps on input) or Ctrl+End to return to the tail. - Fixed Ctrl+O being a dead key at rest in the default inline renderer. Inline mode pushes finished turns into the terminal's native scrollback via
insert_before, which can never be repainted — so the moment a turn settled, toggling the expanded-tools flag had nothing left to redraw, and every "(Ctrl+O for more)" hint pointed at a key that visibly did nothing. Ctrl+O at rest now opens a full-screen transcript viewer instead: the whole session re-rendered with tool detail expanded (the same leaf renderers as the live view), inside the alternate screen so the native scrollback underneath stays untouched and comes back on exit. It opens at the newest lines (where the hints point), scrolls with ↑/↓, PgUp/PgDn, Home/End, and the mouse wheel (capture is scoped to the viewer — inline mode itself still never takes the mouse), and closes with Esc,q, or Ctrl+O again. Mid-turn the plain toggle still works — the live tail repaints every frame — and the flag now ends with the turn, so a mid-turn expand can no longer bake expanded detail into every future turn's scrollback commits with no way to collapse them. The opt-in alt-screen renderer keeps the toggle as-is; it repaints the whole transcript each frame, so it was always correct there. - Hardened the built-in failover ladder's tier anchoring (found in a fan-out review of the 0.0.4 commits): the anchor used the first ladder tier the model id contained, so a hypothetical id carrying two tier words (
claude-sonnet-opus-distill) would anchor at the higher one and the ladder could offer a pricier model than the one the user picked — against the feature's own "sideways or down, never up" rule. The anchor now takes the lowest tier the id mentions; every real single-tier id behaves exactly as before. - Synced the version stamp everywhere it appears: README and the website still read
0.0.3after the crates moved to0.0.4; both now carry0.0.4(the website's handoff excerpt included). - Fixed a destructive-command classifier bypass via a leading double slash (security, found in a full-source bug sweep). The kernel collapses
//etcand///dev/sdato/etcand/dev/sda, sorm -rf //etc,chmod -R 777 //etc,chown -R root //usr,dd of=//dev/sda,echo x > //dev/sda, andshred //dev/sdahit the same catastrophic targets as their single-slash twins — but the classifier's prefix matches (/etc,/dev/sd, the/*-glob test) missed the doubled form and let those slip the refuse-pending-override gate. A newcollapse_leading_slashesnormalizes a run of leading/to one before therm/chmod/chowntarget and raw-block-device checks, so the doubled spellings now flag exactly like the canonical ones; mid-path//(which already resolves and matches) is untouched. Matters most on Windows, where there is no filesystem sandbox behind the classifier. - Fixed non-determinism in the Repo Twin's file walk (found in the sweep).
walk_source_pathsreturned files inignore'sread_dirorder — filesystem- and machine-dependent — so the storedfiles/imports/symbolsorder, and the first-N caps that read it (why-context's "examined and excluded" neighbors, the distinctive-symbols trace, and Night Rounds' TODO scan), could differ run-to-run, quietly breaking the "same tree → same card" guarantee the twin/pulse/rounds/race pillars all advertise. The walk now sorts its output, matching every other walker in tomte (glob/grep/list_dir/skill/rules). - Fixed cost mis-billing of cached prompt tokens on the OpenAI-compatible Chat Completions path (found in the sweep).
normalize_usagemapped onlyprompt_tokens/completion_tokens/total_tokensand droppedprompt_tokens_details.cached_tokens, so any Chat provider that reports a cache hit (real OpenAI/v1/chat/completionsdoes) had every cached token billed at the full input rate in/costinstead of the cache-read discount. It now re-emits the cache count under the Responses shape (input_tokens_details.cached_tokens) that the usage classifier already understands. Display-only — context/compaction math was always correct — and it benefits every OpenAI-wire-compatible provider, not just OpenAI.