tokenlean

tokenlean chains aip-proxy and LiteLLM in front of the GitHub Copilot API so that Claude Code (or any OpenAI-compatible tool) can use your Copilot subscription as a backend. It also integrates rtk as a Claude Code hook to compress command outputs, delivering two independent layers of token reduction.

Double savings

Layer What it compresses Reduction

rtk Shell command outputs (git, ls, grep…) before they reach the model context 60–90%

aip-proxy Input prompts (whitespace, comments, deduplication) 15–40%

How it works

Your app / tool
      │
      ▼ HTTP :4444
  aip-proxy          ← compresses prompts 15-40% (whitespace, comments, deduplication)
      │
      ▼ HTTP :4445
   LiteLLM            ← translates OpenAI API calls to GitHub Copilot API
      │
      ▼ HTTPS
 GitHub Copilot API

Available models

All models configured in copilot-config.yaml. Use the model_name value as the model field in your API calls.

OpenAI

Model name	Underlying model	Reasoning effort
`gpt-4-1`	GPT-4.1	—
`gpt-4o`	GPT-4o	—
`gpt-5-mini`	GPT-5 mini	high
`gpt-5-1`	GPT-5.1	high
`gpt-5-1-codex`	GPT-5.1 Codex	high
`gpt-5-1-codex-max`	GPT-5.1 Codex Max	xhigh
`gpt-5-1-codex-mini`	GPT-5.1 Codex mini	high
`gpt-5-2`	GPT-5.2	xhigh
`gpt-5-2-codex`	GPT-5.2 Codex	xhigh
`gpt-5-3-codex`	GPT-5.3 Codex	xhigh
`gpt-5-4`	GPT-5.4	xhigh
`gpt-5-4-mini`	GPT-5.4 mini	high

Anthropic

Model name	Underlying model
`claude-haiku-4-5`	Claude Haiku 4.5
`claude-sonnet-4`	Claude Sonnet 4
`claude-sonnet-4-5`	Claude Sonnet 4.5
`claude-sonnet-4-6`	Claude Sonnet 4.6
`claude-opus-4-5`	Claude Opus 4.5
`claude-opus-4-6`	Claude Opus 4.6

Google

Model name	Underlying model
`gemini-2-5-pro`	Gemini 2.5 Pro
`gemini-3-flash`	Gemini 3 Flash (preview)
`gemini-3-1-pro`	Gemini 3.1 Pro (preview)

xAI

Model name	Underlying model
`grok-code-fast-1`	Grok Code Fast 1

Requirements

Option	Requirements
Dev Container / Docker	Docker with Compose plugin
Bare metal	macOS or Linux, `curl`, `sudo` access for system packages

A valid GitHub Copilot subscription with an authenticated VS Code session (or GitHub CLI) is required for all options.

Setup & Usage

Option A — Dev Container (VS Code / Codespaces)

The repo includes a dev container that reuses docker-compose.yml — gives you a full Python 3.11 environment with all dependencies pre-installed, Claude Code CLI, and ports 4444/4445 forwarded automatically.

Open in VS Code:

Install the Dev Containers extension
Ctrl+Shift+P → Dev Containers: Reopen in Container

Open in GitHub Codespaces:

Once inside the container, start the proxies manually:

/app/entrypoint.sh

Option B — Docker

Run the full proxy stack in a container without installing Python, Poetry, or any dependencies locally.

# Build and start (detached)
docker compose up -d --build

# Check status / health
docker compose ps

# Tail logs
docker compose logs -f

# Stop
docker compose down

# Reload models without rebuilding
docker compose restart tokenlean

The container exposes:

:4444 — aip-proxy (connect your OpenAI-compatible client here)
:4445 — LiteLLM (internal, exposed for debugging)

copilot-config.yaml is mounted as a read-only volume — edit models and restart without rebuilding the image. Logs are persisted to logs/litellm.log and logs/aip-proxy.log in the project directory.

Note

The container uses a HEALTHCHECK on http://localhost:4444/health. Wait for status healthy before sending requests.

Option C — Bare metal (macOS + Linux)

1. Bootstrap dependencies

make install / make venv auto-installs every missing dependency in order — no manual steps required on most systems:

Dependency	Auto-install method
`python3`	`apt-get` / `dnf` / `pacman` (requires `sudo`). Fails with a clear message if no known package manager is found.
`poetry`	1st: `pipx install poetry` · 2nd: `pip install --user poetry` (if pip exists) · 3rd: official installer via `curl https://install.python-poetry.org` — handles PEP 668 / `externally-managed-environment` on Ubuntu 24.04+
Claude Code	`npm install -g @anthropic-ai/claude-code`. If the global npm prefix requires root (`/usr` or `/usr/local`), automatically redirects to `~/.npm-global` — handles `EACCES` on system npm installs
`rtk`	`brew install rtk` · fallback: `curl` official install script (Linux without Homebrew)

2. Full one-shot setup

make install

Runs the complete setup in sequence:

venv — installs python3 + Poetry + project dependencies
install-claude — installs Claude Code CLI via npm
install-rtk — installs rtk and configures the Claude Code hook
configure-claude — patches ~/.claude/settings.json to point at the local proxy (timestamped backup saved)
start — starts LiteLLM and aip-proxy in background

After running make install, restart Claude Code to activate the rtk hook.

3. Install dependencies only

make venv

Ensures python3, Poetry and project dependencies are installed. Idempotent — only re-runs when pyproject.toml changes.

4. Start the proxies

make start

Starts LiteLLM (:4445) first, waits for readiness (up to 120s), then starts aip-proxy (:4444). Both run in the background with logs written to logs/litellm.log and logs/aip-proxy.log.

Warning

make start automatically stops any processes already running on ports 4444 and 4445.

5. Use the API

Point any OpenAI-compatible client to http://localhost:4444:

curl http://localhost:4444/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4-1",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

5. All make commands

Command	Description
`make`	Show all available targets (default)
`make install`	Full setup: venv + Claude Code + rtk + configure + start
`make install-claude`	Install Claude Code CLI via npm
`make install-rtk`	Install rtk and configure its Claude Code hook
`make configure-claude`	Patch `~/.claude/settings.json` to use the local proxy (timestamped backup created)
`make unconfigure-claude`	Restore the most recent settings backup
`make venv`	Install Poetry and dependencies (idempotent)
`make start`	Start LiteLLM + aip-proxy in background (waits for readiness)
`make stop`	Graceful stop (SIGTERM → 3s grace → kill -9 fallback)
`make restart`	Stop then start
`make status`	Show RUNNING / STOPPED / DEAD state for each service
`make log-aip`	Tail the aip-proxy log
`make log-litellm`	Tail the LiteLLM log
`make savings`	Live token savings dashboard — aip-proxy + rtk (Ctrl+C to exit)
`make clean-logs`	Delete log files
`make clean`	Stop services + delete logs, PIDs, and virtualenv

Claude Code integration

make configure-claude (or make install) patches ~/.claude/settings.json with:

"env": {
  "ANTHROPIC_AUTH_TOKEN": "litellm",
  "ANTHROPIC_BASE_URL": "http://localhost:4444",
  "ANTHROPIC_MODEL": "claude-sonnet-4-6",
  "ANTHROPIC_SMALL_FAST_MODEL": "gpt-4-1"
}

All other settings (hooks, model, etc.) are preserved. A timestamped backup is saved before every modification (e.g. settings.json.20260327_143012.bak). Run make unconfigure-claude to roll back.

python3 configure_claude.py apply    # point Claude at the proxy
python3 configure_claude.py restore  # roll back to last backup

rtk integration — the second layer of savings

rtk is a Rust CLI proxy that reduces LLM token consumption by 60–90% by filtering and compressing command outputs before they reach the model context.

Saving layer	What it compresses	Reduction
rtk	Shell command outputs (git, cargo, ls, grep…)	60–90%
aip-proxy	Input prompts (whitespace, comments, duplicate blocks)	15–40%

make install-rtk installs rtk and runs rtk init -g --auto-patch, which installs a PreToolUse hook into Claude Code that transparently rewrites common shell commands to their rtk-filtered equivalents — zero token overhead, no workflow changes.

Tip: run rtk gain at any time to see how many tokens rtk has saved in your sessions.

CI

GitHub Actions (.github/workflows/docker-tests.yml) runs on every push and pull request to main or develop when Docker-related files change. The workflow:

Builds the OCI image with BuildKit + layer caching
Runs test_docker.sh --no-build (10 integration tests)
On failure: prints the last 100 lines of container logs

Tests cover: image build, OCI labels, exposed ports, container startup, healthcheck, aip-proxy /health, LiteLLM /health, /v1/models API, graceful restart, and volume mount mode.

Cross-platform compatibility

The Makefile is fully compatible with macOS and Linux:

python3 auto-installed via apt-get / dnf / pacman if missing (requires sudo)
Poetry installed via pipx → pip --user → curl installer cascade (handles PEP 668 / externally-managed-environment on Ubuntu 24.04+)
npm global installs redirected to ~/.npm-global automatically when system prefix requires root (/usr or /usr/local)
Port killing: lsof with fuser fallback (for minimal Linux distros)
Port readiness: nc -z with python3 socket fallback (for distros using ncat)
All shell commands use POSIX-compatible syntax
make stop does not depend on venv
Claude Code installed via npm (universal cross-platform method)
rtk installed via Homebrew with curl fallback for Linux

Project structure

tokenlean/
├── .devcontainer/
│   └── devcontainer.json      # VS Code / GitHub Codespaces dev container
├── .github/
│   └── workflows/
│       └── docker-tests.yml   # CI: build + integration tests (main, develop)
├── Dockerfile                 # Multi-stage OCI image (builder + runtime)
├── docker-compose.yml         # One-command container deployment
├── .dockerignore              # Excludes .venv, logs, .git, .devcontainer, etc.
├── entrypoint.sh              # Container entrypoint (graceful SIGTERM handling)
├── test_docker.sh             # Docker integration test suite (10 tests)
├── copilot-config.yaml        # LiteLLM model definitions + reasoning_effort
├── configure_claude.py        # Patches ~/.claude/settings.json
├── savings.py                 # Live token savings dashboard
├── Makefile                   # Cross-platform automation (macOS + Linux)
├── pyproject.toml             # Poetry configuration and dependencies
├── .claude/CLAUDE.md          # Claude Code project instructions
├── logs/                      # Service logs (generated)
│   ├── litellm.log            # LiteLLM runtime log
│   └── aip-proxy.log          # aip-proxy runtime log
├── litellm.pid                # LiteLLM PID (generated, bare-metal only)
├── aip-proxy.pid              # aip-proxy PID (generated, bare-metal only)
└── .venv/                     # Python virtualenv (generated, bare-metal only)

Acknowledgements

LiteLLM — Unified OpenAI-compatible proxy that translates API calls across dozens of LLM providers.
aip-proxy — Token compression proxy; reduces input prompts 15–40% via whitespace normalization, comment removal, and block deduplication.
FastAPI & Uvicorn — Async web framework and ASGI server powering both proxy layers.
httpx — Modern async HTTP client used internally for proxying requests.
rtk — Rust Token Killer. Reduces LLM token consumption 60–90% by filtering and compressing dev command outputs. Single binary, zero dependencies.

Maintainers

Maintained by @gdilo and @futesat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenlean

How it works

Available models

OpenAI

Anthropic

Google

xAI

Requirements

Setup & Usage

Option A — Dev Container (VS Code / Codespaces)

Option B — Docker

Option C — Bare metal (macOS + Linux)

1. Bootstrap dependencies

2. Full one-shot setup

3. Install dependencies only

4. Start the proxies

5. Use the API

5. All make commands

Claude Code integration

rtk integration — the second layer of savings

CI

Cross-platform compatibility

Project structure

Acknowledgements

Maintainers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude		.claude
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
configure_claude.py		configure_claude.py
copilot-config.yaml		copilot-config.yaml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml
savings.py		savings.py
test_docker.sh		test_docker.sh

Layer	What it compresses	Reduction
rtk	Shell command outputs (git, ls, grep…) before they reach the model context	60–90%
aip-proxy	Input prompts (whitespace, comments, deduplication)	15–40%

Folders and files

Latest commit

History

Repository files navigation

tokenlean

How it works

Available models

OpenAI

Anthropic

Google

xAI

Requirements

Setup & Usage

Option A — Dev Container (VS Code / Codespaces)

Option B — Docker

Option C — Bare metal (macOS + Linux)

1. Bootstrap dependencies

2. Full one-shot setup

3. Install dependencies only

4. Start the proxies

5. Use the API

5. All make commands

Claude Code integration

rtk integration — the second layer of savings

CI

Cross-platform compatibility

Project structure

Acknowledgements

Maintainers

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages