Skip to content

MaxForAI/Tokenless

Repository files navigation

Tokenless faucet logo

Tokenless

One command to cut token usage by up to 50%+.

vibe coding request reduction chat response reduction license MIT

English · 简体中文 · 日本語 · Français · Español

Before / After · Quick start · Install · Profiles · Benchmarks & Evidence · Roadmap · Full benchmark guide


Quick start

npm install -g github:MaxForAI/Tokenless
tokenless repair-hooks --user
tokenless install-commands --user
tokenless launch

Then use Claude Code normally. Switch profiles anytime:

tokenless style chat     # readable, shorter replies
tokenless style coding   # dense coding output
tokenless style off      # full hard-off

Claude Code gets expensive when every log, file read, diff, and long reply keeps getting carried into the next request.

Tokenless fixes that.

It keeps the raw evidence on your machine, sends Claude a compact version, and lets you expand the original output only when you need it.

Before / After

Normal Claude Code Claude Code with Tokenless
Reads a large file or log into future context repeatedly. Stores the raw output locally and sends a compact packet.
Verbose final replies become part of the next request history. chat and coding profiles keep replies short.
Agent trajectory can grow through repeated exploration and task-plan history. Launcher trims Task/Plan tools by default; packets reduce large read context.

Example large-read replacement:

Raw context Tokenless context
Full file/log output is carried through API requests. TOKENLESS-READ-PACKET/0.1 with artifact id, imports, symbols, snippets, nearby files, and exact expansion commands.

Why Tokenless

Claude Code sessions can become expensive because tool outputs, file reads, task-plan history, and verbose assistant replies are repeatedly carried through future API requests. Tokenless targets three sources of growth:

  • Large tool output: test logs, build logs, search results, tree output, diffs, large reads, and large successful edit/write results.
  • Agent trajectory overhead: repeated request context, high-overhead Task/Plan tools, and large raw file payloads.
  • Response verbosity: optional chat and coding profiles reduce assistant output tokens.

Benchmarks & Evidence

Tokenless has two evidence layers: real Claude Code API-body measurements, and external research showing why shorter, denser context can reduce cost without automatically reducing quality.

Real Claude Code benchmark runs

These are API-body measurements from actual Claude Code sessions. The main metric is estimated request-body or response-body tokens from raw API logs, not local hook-side savings estimates.

Scenario Baseline Tokenless Reduction
5-turn CRM vibe coding, off vs coding 4,697,867 request tokens 2,476,391 47.3%
6-turn natural conversation, off vs chat 7,223 response tokens 1,442 80.0%
Large CSS visual edit 1,017,642 request tokens 403,995-473,354 ~54-60%
10k-line React/TSX edit 917,137 request tokens 545,456 40.5%
Multifile React dashboard 628,261 request tokens 512,521 18.4%
Task/Plan tools enabled vs default launcher 1,524,894 request tokens 1,087,753 28.7%

The strongest current product benchmark is the 5-turn CRM vibe-coding run: a non-specialist user gave vague iterative product-polish prompts. The public coding profile reduced request tokens by 47.3%, response tokens by 44.4%, and request count by 39.3% versus clean off.

The clean natural-conversation run isolates chat: no file tools or packet reducers were involved, and response tokens dropped by 80.0%.

Detailed methodology and raw run notes are in docs/benchmarking.md and docs/style-benchmark.md.

Research backing

The research does not prove Tokenless automatically helps every session. It supports the benchmark premise: context and response length are controllable engineering variables, and less text can sometimes be cheaper, faster, and more accurate.

Paper Why it matters for Tokenless
Brevity Constraints Reverse Performance Hierarchies in Language Models Brevity constraints improved large-model accuracy by 26.3 percentage points on inverse-scaling problems. Verbose is not always better.
Prompt Compression in the Wild Prompt compression can deliver real end-to-end speedups when workload, compression ratio, and hardware match; quality can remain statistically unchanged.
LLMLingua Prompt compression can reduce inference cost while preserving semantic integrity under high compression ratios.
LongLLMLingua Long-context compression can improve key-information perception while reducing cost and latency.
Selective Context Pruning redundant context reported 50% context-cost reduction, 36% memory reduction, and 32% inference-time reduction with minor quality loss.
Gist Tokens Learned prompt compression reached up to 26x prompt compression and up to 40% FLOPs reduction.

Roadmap

Tokenless currently focuses on Claude Code context growth from tool output, file reads, and response verbosity. Next areas:

  • User prompt compression: identify repeated prompt patterns, compress user intent without losing constraints, and keep the original prompt recoverable.
  • Router-side optimization: reduce duplicated context and style overhead before requests hit the model backend.
  • Broader workflow support: keep Claude Code as the primary target, then evaluate adapters for other agentic coding tools where the same context-growth problem appears.

Installation

Install from GitHub:

npm install -g github:MaxForAI/Tokenless
tokenless repair-hooks --user
tokenless launch

Tokenless is currently distributed through GitHub. It has not been published to the public npm registry yet.

For local development from a checkout:

git clone https://github.com/MaxForAI/Tokenless.git
cd Tokenless
npm install
npm link
tokenless repair-hooks --user
tokenless launch

If Claude Code is not available as claude on your PATH, set CLAUDE_BIN:

CLAUDE_BIN=/path/to/claude tokenless launch

Check installation status:

tokenless status --user

Output profiles

Tokenless has three public profiles:

Profile Behavior
chat Default. Short, readable natural-language responses. Only changes output style.
coding Dense structured responses for coding workflows. Only changes output style.
off Full Tokenless hard-off. Disables style injection and compression hooks.

Set a profile:

tokenless style chat
tokenless style coding
tokenless style off

Claude Code slash command shortcuts:

/tokenless-style-chat
/tokenless-style-coding
/tokenless-style-off

The selected profile is stored at ~/.tokenless/style.json by default and persists across Claude Code restarts.

TOKENLESS_MODE=off remains available as an environment-level hard-off switch for benchmark runs.

How it works

For noisy commands and large outputs:

Claude requests a noisy tool call
  -> Tokenless intercepts it with Claude Code hooks
  -> The original command or output is processed locally
  -> Raw stdout/stderr or file content is saved as an artifact
  -> Claude receives a compact TOKENLESS-* packet
  -> Claude expands only the relevant artifact slice if needed

Example read packet:

TOKENLESS-READ-PACKET/0.1
file: /path/to/src/App.tsx
artifact_id: ctx_20260518_abc123
summary: large TSX source packet with imports, declarations, snippets, and nearby files

Expand raw evidence when needed:

tokenless latest --data-dir ~/.tokenless
tokenless expand latest --around "DashboardShell" --data-dir ~/.tokenless
tokenless expand latest --lines 120:170 --data-dir ~/.tokenless

The shorter acc command remains available as a compatibility alias. tokenless is the preferred public command.

What Tokenless handles

Tokenless currently handles:

  • Test logs: npm test, pytest, go test, cargo test.
  • Build and CI logs: npm run build, docker build, kubectl logs.
  • Diffs and history: git diff, git log.
  • Search and tree output: rg, grep -R, find, tree, ls -R.
  • Large low-risk reads: CSS, HTML, JSON, logs, docs, generated files, large JS/TS/Python source files, and large Vue/Svelte components.
  • Large successful edit/write tool results: conservative TOKENLESS-EDIT-PACKET and TOKENLESS-WRITE-PACKET.
  • Unexpectedly large Bash output through fallback compression.

Small bounded commands, such as rg -m 20, find ... | head, cat file | grep, and tree | head, pass through directly. Small reads are not compressed by default.

Read packets

Read packets cap large low-risk file reads as TOKENLESS-READ-PACKET/0.1.

Default policy:

  • Compress low-risk reads over roughly 4,000 estimated tokens: CSS, HTML, JSON, logs, docs, lockfiles, generated files.
  • Compress large JS/TS/React/Python source reads over roughly 30,000 estimated tokens with source-oriented packets.
  • Compress large Vue/Svelte single-file components over roughly 12,000 estimated tokens with component-oriented packets.
  • Do not compress small files.
  • Do not compress source families such as .go, .rs, .java, .swift, or .cpp by default.

Read packets are indexes, not edit proof. If Claude needs exact code or style, it should expand the relevant lines first:

tokenless expand ctx_abc --around ".target-card" --data-dir ~/.tokenless
tokenless expand ctx_abc --lines 520:535 --data-dir ~/.tokenless

For CSS, SCSS, Less, HTML, HTM, and SVG files, read packets include deterministic editable summaries: variables, color palettes, likely editable selectors, media and animation rules, sections, ids/classes, interactive elements, assets, and headings.

Large-file gate behavior:

  • Tokenless blocks raw access to large low-risk files until a read packet exists.
  • The hook prints the required next command, usually tokenless read --agent --data-dir ~/.tokenless <file>.
  • Read packets record file size and modified time.
  • If the file changes after packet creation, the packet is stale and Tokenless asks for a fresh packet.
  • For several related changes, expand the relevant lines and use one bounded edit while the packet is current.

Edit and write packets

Tokenless can cap large successful Edit, MultiEdit, and low-risk Write tool results.

The policy is intentionally conservative:

  • Successful Edit / MultiEdit output over roughly 3,000 estimated tokens can become TOKENLESS-EDIT-PACKET/0.1.
  • Successful low-risk Write output over roughly 5,000 estimated tokens can become TOKENLESS-WRITE-PACKET/0.1.
  • Failed or risky outputs are never compressed.
  • Source-code Write outputs are not compressed by default.
  • Raw tool output is stored locally before replacement.

Risk signals that pass through unchanged include:

Error
Failed
old_string
not found
multiple matches
ambiguous
permission denied
conflict
No changes
partial

Edit/write packets do not claim semantic correctness. They only confirm the tool completed successfully, preserve the raw artifact, and mark previous read packets for that file as stale.

Launcher behavior

tokenless launch starts Claude Code with normal read, edit, write, and bash tools available, but disables high-overhead Task/Plan tools by default:

TaskCreate, TaskUpdate, TaskList, TaskGet, EnterPlanMode, ExitPlanMode

This reduces fixed tool-schema overhead and prevents task-list history from being repeatedly carried through API request context.

Opt back into Task/Plan tools for a session:

TOKENLESS_ALLOW_TASK_TOOLS=1 tokenless launch

Slash commands

Install user-level Claude Code slash commands:

tokenless install-commands --user

Installed commands:

/tokenless
/tokenless-style-chat
/tokenless-style-coding
/tokenless-style-off

/tokenless shows hook status, active mode, profile, savings, packet counts, pending gates, and latest artifact.

Restart Claude Code after installing slash commands. If you previously installed older Tokenless commands, clean them up with:

tokenless uninstall-commands --user
tokenless install-commands --user

Common CLI commands

tokenless --help
tokenless status --user
tokenless latest --data-dir ~/.tokenless
tokenless list --data-dir ~/.tokenless
tokenless stats --data-dir ~/.tokenless
tokenless show latest --data-dir ~/.tokenless
tokenless expand latest --around "Cannot find module" --data-dir ~/.tokenless
tokenless clean --data-dir ~/.tokenless --keep 100 --dry-run
tokenless style status
tokenless style coding
tokenless api-usage --since 24h

tokenless stats separates local savings by source:

  • hook: real Claude Code hook-path compression.
  • eval: local evaluation fixtures.
  • smoke: manual probes, doctor checks, and ad hoc compression runs.
  • legacy: records created before source tagging.

Benchmarking

For API-body verification, enable raw Claude Code API body capture:

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_LOG_RAW_API_BODIES="file:<api-body-dir>"

Inspect captured request/response bodies:

node plugins/claude-code/bin/tokenless api-probe stats \
  --dir "<api-body-dir>" \
  --data-dir ~/.tokenless

A valid clean off run must show:

TOKENLESS-READ-PACKET: request=0
TOKENLESS-EDIT-PACKET: request=0
TOKENLESS-WRITE-PACKET: request=0
request_saved_estimate: 0

Raw API bodies can contain full prompts, tool outputs, and sensitive local context. Keep capture disabled unless you are verifying what enters model context.

Development and validation

Run the main smoke checks:

npm run eval:complex
npm run eval:read
npm run eval:edit
npm run eval:cli-smoke
npm run doctor

Run all synthetic and captured cases:

npm run eval:all

Check hook install state:

npm run tokenless:status
npm run tokenless:install:dry-run
npm run tokenless:repair-hooks:dry-run
npm run tokenless:uninstall:dry-run

Test the actual Claude Code hook path inside Claude Code:

npm run test:complex

Expected behavior:

PreToolUse caps the noisy command
Claude reruns node .../bin/acc run --agent ...
Claude receives TOKENLESS-PACKET/0.1
Raw artifact can be expanded with tokenless expand latest

Cleanup

Artifacts are stored under the selected --data-dir, usually ~/.tokenless/artifacts.

Preview cleanup:

npm run tokenless:clean:dry-run

Delete old artifacts:

tokenless clean --data-dir ~/.tokenless --older-than 7d

Keep only the newest 100 artifacts:

tokenless clean --data-dir ~/.tokenless --keep 100

Privacy and safety model

  • Tokenless runs locally.
  • Raw artifacts stay on local disk under the configured data directory.
  • Tokenless does not call a separate LLM or cloud summarization service.
  • Reducers are deterministic and intentionally conservative.
  • Risky failed outputs pass through unchanged.
  • Exact legal, financial, medical, security, and code-review work may require explicit artifact expansion.

Limitations

  • Claude Code hooks are the primary integration target.
  • Reducers are policy-based and may miss some noisy outputs.
  • Small outputs can expand slightly if forced through Tokenless; classifiers avoid common bounded commands, but the policy is not perfect.
  • Read packets are useful evidence, not a substitute for exact line expansion before high-risk edits.
  • API-body token counts are estimates, not exact billed-token accounting.

Star this repo

Tokenless saves tokens and keeps the raw evidence. Star costs zero. Fair trade.

Star History Chart

Acknowledgements

Tokenless is an independent implementation and does not use code from caveman. We still want to acknowledge caveman as part of the broader open-source conversation around shorter, lower-token agent output.

License

MIT. Free to use, modify, and ship.

Contributors

  • Max Liu
  • Codex, AI coding assistant