Skip to content

Releases: noah-thing/receipt

AI token receipt

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 10:19

Receipt posts an itemized AI-cost comment, tokens and dollars, broken down by model, on every pull request, scoped to that branch's commits. It puts the cost of an agent's work right next to the diff, instead of waiting for the monthly bill.

Works with Claude Code, Cursor, Copilot, Aider, Codex, and any OpenAI or Anthropic-compatible API. Every figure comes from the provider's own token counts times a price book you can read and override. It never stores prompts or completions.

Receipt 0.5.0 — session-health automation

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 20:24

Builds on 0.4.0's session health with automation, history, and a reviewer-facing note.

  • receipt guard — a Claude Code hook entrypoint. Stays silent until a session crosses your gate, then prints the single most important move where the agent sees it. With --notify it exits 2 so Claude Code feeds the nudge back to the model — strongest on the PreCompact hook, right before lossy auto-compaction.
  • receipt health --json / --quiet --gate — machine-readable output and severity exit codes (0 / 10 watch / 20 degrading / 30 critical) for hooks and CI.
  • receipt health --all — scores every past session and learns your personal pattern ("you tend to drift around turn ~12; X% of sessions compacted too late").
  • Context tax — shows how much of a session is just re-sending itself (the quadratic cost behind both rising spend and fading quality).
  • PR-comment health note — a collapsed, reviewer-facing <details> block when the work ran under degrading conditions; silent otherwise; opt out with "health": false. It never claims the code is wrong — only points to where to look.

Honest constraint: the token-only ledger cannot detect redundant file reads, identical-command loops, or semantic issues (hallucinations, drift) — those need data Receipt deliberately never stores. Features only ever flag "conditions correlated with drift," documented in docs/SESSION-HEALTH.md.

Still deterministic and local. 104 tests, typecheck clean. MIT.

Receipt 0.4.0 — session health

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 20:05

Receipt now keeps your agent sharp, not just cheap. 🧠

receipt health     # is the current session still sharp?

Long AI-coding sessions get worse before they hit any limit — context rot, lost-in-the-middle, instruction drift, looping, and lossy auto-compaction set in while the window still has room. Built from a six-agent research sweep (RULER, lost-in-the-middle, Chroma's context-rot study, the multi-turn "lost in conversation" finding, and Anthropic's context-engineering guidance), receipt health reads the same local ledger and scores the latest session for:

  • context fill — by % and absolute tokens (big windows degrade by absolute size)
  • session length & age — multi-turn drift past ~12 turns; accumulation slowdown past ~2h
  • cache reuse — low reuse means the context is churning
  • auto-compaction cascades — each summary loses precise detail
  • looping — retries that re-send everything for nothing

…then tells you the move to make before quality drops: /compact (proactively at 50–60%), /clear between tasks, a fresh session with a handoff, or /rewind out of a loop.

It never tells the agent to think or write less — every suggestion refreshes context to preserve full capability. The live gauge also rides in receipt statusline and receipt fuel. Research, signals, and thresholds: docs/SESSION-HEALTH.md.

Deterministic and local — no API key, nothing leaves your machine. 89 tests, typecheck clean. MIT.

Receipt 0.3.0 — advice

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 13:26

Receipt now tells you how to spend fewer tokens — without making the work worse. 💡

receipt advice          # this branch
receipt advice --all    # the whole ledger

It reads your usage, names what's driving the cost, and recommends fixes that only target waste — never "think less" or "explain less." Output and reasoning are the value; the advisor leaves them alone. What it catches:

  • context re-sent at full price instead of cached,
  • the cache rebuilt faster than it's reused,
  • retries that re-send everything for no new output,
  • premium prices paid for mechanical reads a cheaper model handles at the same quality.

Each tip is ranked by impact with the dollar saving where it's computable, and the single biggest win rides along on every PR receipt and receipt show.

Deterministic and local — no API key, no network. 75 tests, typecheck clean. MIT.

Receipt 0.2.1 — hardening

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 11:53

Hardening and polish for the usage-awareness feature shipped in 0.2.0. No breaking changes.

  • Provider-aware lever. The what-if suggestion now picks the cheapest model from the same provider — an OpenAI-heavy branch is told about gpt-4.1-nano, not a Claude model.
  • Foolproof plan lookup. A malformed or hand-edited plan id (e.g. toString) can no longer resolve to an inherited object member and poison the budget. budget plan validates strictly.
  • Cleaner distribution. Task sizes and records ignore unscoped proxy-only usage, so "≈ 2 PRs" reflects your real branches.
  • Quieter proxy. Rate-limit capture skips no-op disk writes.
  • Opt-out. Set "usage": false in .receipt/config.json to drop the block from the PR comment and receipt show.

67 tests, typecheck clean. Verified end-to-end across empty / no-plan / full scenarios. MIT.

Receipt 0.2.0 — usage awareness

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 10:49

See what a task costs you, not just dollars. 🔋

Receipt now measures every task as a share of your Claude plan's windows — the 5-hour and weekly limits — and turns that into things that change how you work.

New commands

  • receipt fuel — how much of your plan you're using right now, what's left, and your pace
  • receipt records — your heaviest, leanest, and most recent tasks, ranked
  • receipt forecast — a typical task's window impact and your weekly runway
  • receipt statusline — a one-line gauge for the Claude Code statusline
  • receipt calibrate — set your real budget from a limit you actually hit
  • receipt budget plan <pro|max5x|max20x> — pick your tier

On the PR comment and receipt show: a usage block with the % of your 5h and weekly windows the branch ate, the same number in your own work-units, an efficiency grade, and the one lever worth pulling (what the heavy model's reads would cost on a cheaper one). Add --fun for honest playful comparisons.

Real numbers, honestly. The proxy now reads your true ceiling from the provider's anthropic-ratelimit-* headers; or receipt calibrate learns it from a wall you hit. Presets are clearly labeled estimates. How it all works: docs/LIMITS.md.

Everything reads the same local, append-only ledger — no prompts, nothing leaves your machine. 60 tests, typecheck clean. MIT.

Receipt 0.1.0

Choose a tag to compare

@noah-thing noah-thing released this 30 Jun 09:58

First public version. Itemized AI cost on every pull request.

  • Logging proxy for streaming and non-streaming usage
  • Importers for agent session logs and any JSON/JSONL export
  • Append-only token and cost ledger (metadata only, never content)
  • Markdown PR receipt: model table, spend sparkline, budget bar, median-PR compare
  • receipt post and a composite GitHub Action that upsert one sticky comment
  • Local dashboard: spend over time, by model, by branch, top calls

Use it on a PR with uses: noah-thing/receipt@v1. 40 tests, no production dependencies.