tokenlean chains aip-proxy and LiteLLM in front of the GitHub Copilot API so that Claude Code (or any OpenAI-compatible tool) can use your Copilot subscription as a backend. It also integrates rtk as a Claude Code hook to compress command outputs, delivering two independent layers of token reduction.
Double savings
Layer What it compresses Reduction rtk Shell command outputs (git, ls, grep…) before they reach the model context 60–90% aip-proxy Input prompts (whitespace, comments, deduplication) 15–40%
Your app / tool
│
▼ HTTP :4444
aip-proxy ← compresses prompts 15-40% (whitespace, comments, deduplication)
│
▼ HTTP :4445
LiteLLM ← translates OpenAI API calls to GitHub Copilot API
│
▼ HTTPS
GitHub Copilot API
All models configured in copilot-config.yaml. Use the model_name value as the model field in your API calls.
| Model name | Underlying model | Reasoning effort |
|---|---|---|
gpt-4-1 |
GPT-4.1 | — |
gpt-4o |
GPT-4o | — |
gpt-5-mini |
GPT-5 mini | high |
gpt-5-1 |
GPT-5.1 | high |
gpt-5-1-codex |
GPT-5.1 Codex | high |
gpt-5-1-codex-max |
GPT-5.1 Codex Max | xhigh |
gpt-5-1-codex-mini |
GPT-5.1 Codex mini | high |
gpt-5-2 |
GPT-5.2 | xhigh |
gpt-5-2-codex |
GPT-5.2 Codex | xhigh |
gpt-5-3-codex |
GPT-5.3 Codex | xhigh |
gpt-5-4 |
GPT-5.4 | xhigh |
gpt-5-4-mini |
GPT-5.4 mini | high |
| Model name | Underlying model |
|---|---|
claude-haiku-4-5 |
Claude Haiku 4.5 |
claude-sonnet-4 |
Claude Sonnet 4 |
claude-sonnet-4-5 |
Claude Sonnet 4.5 |
claude-sonnet-4-6 |
Claude Sonnet 4.6 |
claude-opus-4-5 |
Claude Opus 4.5 |
claude-opus-4-6 |
Claude Opus 4.6 |
| Model name | Underlying model |
|---|---|
gemini-2-5-pro |
Gemini 2.5 Pro |
gemini-3-flash |
Gemini 3 Flash (preview) |
gemini-3-1-pro |
Gemini 3.1 Pro (preview) |
| Model name | Underlying model |
|---|---|
grok-code-fast-1 |
Grok Code Fast 1 |
| Option | Requirements |
|---|---|
| Dev Container / Docker | Docker with Compose plugin |
| Bare metal | macOS or Linux, curl, sudo access for system packages |
A valid GitHub Copilot subscription with an authenticated VS Code session (or GitHub CLI) is required for all options.
The repo includes a dev container that reuses docker-compose.yml — gives you a full Python 3.11 environment with all dependencies pre-installed, Claude Code CLI, and ports 4444/4445 forwarded automatically.
Open in VS Code:
- Install the Dev Containers extension
Ctrl+Shift+P→ Dev Containers: Reopen in Container
Open in GitHub Codespaces:
Once inside the container, start the proxies manually:
/app/entrypoint.shRun the full proxy stack in a container without installing Python, Poetry, or any dependencies locally.
# Build and start (detached)
docker compose up -d --build
# Check status / health
docker compose ps
# Tail logs
docker compose logs -f
# Stop
docker compose down
# Reload models without rebuilding
docker compose restart tokenleanThe container exposes:
:4444— aip-proxy (connect your OpenAI-compatible client here):4445— LiteLLM (internal, exposed for debugging)
copilot-config.yaml is mounted as a read-only volume — edit models and restart without rebuilding the image. Logs are persisted to logs/litellm.log and logs/aip-proxy.log in the project directory.
Note
The container uses a HEALTHCHECK on http://localhost:4444/health. Wait for status healthy before sending requests.
make install / make venv auto-installs every missing dependency in order — no manual steps required on most systems:
| Dependency | Auto-install method |
|---|---|
python3 |
apt-get / dnf / pacman (requires sudo). Fails with a clear message if no known package manager is found. |
poetry |
1st: pipx install poetry · 2nd: pip install --user poetry (if pip exists) · 3rd: official installer via curl https://install.python-poetry.org — handles PEP 668 / externally-managed-environment on Ubuntu 24.04+ |
| Claude Code | npm install -g @anthropic-ai/claude-code. If the global npm prefix requires root (/usr or /usr/local), automatically redirects to ~/.npm-global — handles EACCES on system npm installs |
rtk |
brew install rtk · fallback: curl official install script (Linux without Homebrew) |
make installRuns the complete setup in sequence:
venv— installs python3 + Poetry + project dependenciesinstall-claude— installs Claude Code CLI via npminstall-rtk— installs rtk and configures the Claude Code hookconfigure-claude— patches~/.claude/settings.jsonto point at the local proxy (timestamped backup saved)start— starts LiteLLM and aip-proxy in background
After running make install, restart Claude Code to activate the rtk hook.
make venvEnsures python3, Poetry and project dependencies are installed. Idempotent — only re-runs when pyproject.toml changes.
make startStarts LiteLLM (:4445) first, waits for readiness (up to 120s), then starts aip-proxy (:4444). Both run in the background with logs written to logs/litellm.log and logs/aip-proxy.log.
Warning
make start automatically stops any processes already running on ports 4444 and 4445.
Point any OpenAI-compatible client to http://localhost:4444:
curl http://localhost:4444/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4-1",
"messages": [{"role": "user", "content": "Hello!"}]
}'| Command | Description |
|---|---|
make |
Show all available targets (default) |
make install |
Full setup: venv + Claude Code + rtk + configure + start |
make install-claude |
Install Claude Code CLI via npm |
make install-rtk |
Install rtk and configure its Claude Code hook |
make configure-claude |
Patch ~/.claude/settings.json to use the local proxy (timestamped backup created) |
make unconfigure-claude |
Restore the most recent settings backup |
make venv |
Install Poetry and dependencies (idempotent) |
make start |
Start LiteLLM + aip-proxy in background (waits for readiness) |
make stop |
Graceful stop (SIGTERM → 3s grace → kill -9 fallback) |
make restart |
Stop then start |
make status |
Show RUNNING / STOPPED / DEAD state for each service |
make log-aip |
Tail the aip-proxy log |
make log-litellm |
Tail the LiteLLM log |
make savings |
Live token savings dashboard — aip-proxy + rtk (Ctrl+C to exit) |
make clean-logs |
Delete log files |
make clean |
Stop services + delete logs, PIDs, and virtualenv |
make configure-claude (or make install) patches ~/.claude/settings.json with:
"env": {
"ANTHROPIC_AUTH_TOKEN": "litellm",
"ANTHROPIC_BASE_URL": "http://localhost:4444",
"ANTHROPIC_MODEL": "claude-sonnet-4-6",
"ANTHROPIC_SMALL_FAST_MODEL": "gpt-4-1"
}All other settings (hooks, model, etc.) are preserved. A timestamped backup is saved before every modification (e.g. settings.json.20260327_143012.bak). Run make unconfigure-claude to roll back.
python3 configure_claude.py apply # point Claude at the proxy
python3 configure_claude.py restore # roll back to last backuprtk is a Rust CLI proxy that reduces LLM token consumption by 60–90% by filtering and compressing command outputs before they reach the model context.
| Saving layer | What it compresses | Reduction |
|---|---|---|
| rtk | Shell command outputs (git, cargo, ls, grep…) | 60–90% |
| aip-proxy | Input prompts (whitespace, comments, duplicate blocks) | 15–40% |
make install-rtk installs rtk and runs rtk init -g --auto-patch, which installs a PreToolUse hook into Claude Code that transparently rewrites common shell commands to their rtk-filtered equivalents — zero token overhead, no workflow changes.
Tip: run
rtk gainat any time to see how many tokens rtk has saved in your sessions.
GitHub Actions (.github/workflows/docker-tests.yml) runs on every push and pull request to main or develop when Docker-related files change. The workflow:
- Builds the OCI image with BuildKit + layer caching
- Runs
test_docker.sh --no-build(10 integration tests) - On failure: prints the last 100 lines of container logs
Tests cover: image build, OCI labels, exposed ports, container startup, healthcheck, aip-proxy /health, LiteLLM /health, /v1/models API, graceful restart, and volume mount mode.
The Makefile is fully compatible with macOS and Linux:
- python3 auto-installed via
apt-get/dnf/pacmanif missing (requiressudo) - Poetry installed via
pipx→pip --user→curlinstaller cascade (handles PEP 668 / externally-managed-environment on Ubuntu 24.04+) - npm global installs redirected to
~/.npm-globalautomatically when system prefix requires root (/usror/usr/local) - Port killing:
lsofwithfuserfallback (for minimal Linux distros) - Port readiness:
nc -zwithpython3 socketfallback (for distros usingncat) - All shell commands use POSIX-compatible syntax
make stopdoes not depend onvenv- Claude Code installed via
npm(universal cross-platform method) - rtk installed via Homebrew with
curlfallback for Linux
tokenlean/
├── .devcontainer/
│ └── devcontainer.json # VS Code / GitHub Codespaces dev container
├── .github/
│ └── workflows/
│ └── docker-tests.yml # CI: build + integration tests (main, develop)
├── Dockerfile # Multi-stage OCI image (builder + runtime)
├── docker-compose.yml # One-command container deployment
├── .dockerignore # Excludes .venv, logs, .git, .devcontainer, etc.
├── entrypoint.sh # Container entrypoint (graceful SIGTERM handling)
├── test_docker.sh # Docker integration test suite (10 tests)
├── copilot-config.yaml # LiteLLM model definitions + reasoning_effort
├── configure_claude.py # Patches ~/.claude/settings.json
├── savings.py # Live token savings dashboard
├── Makefile # Cross-platform automation (macOS + Linux)
├── pyproject.toml # Poetry configuration and dependencies
├── .claude/CLAUDE.md # Claude Code project instructions
├── logs/ # Service logs (generated)
│ ├── litellm.log # LiteLLM runtime log
│ └── aip-proxy.log # aip-proxy runtime log
├── litellm.pid # LiteLLM PID (generated, bare-metal only)
├── aip-proxy.pid # aip-proxy PID (generated, bare-metal only)
└── .venv/ # Python virtualenv (generated, bare-metal only)
- LiteLLM — Unified OpenAI-compatible proxy that translates API calls across dozens of LLM providers.
- aip-proxy — Token compression proxy; reduces input prompts 15–40% via whitespace normalization, comment removal, and block deduplication.
- FastAPI & Uvicorn — Async web framework and ASGI server powering both proxy layers.
- httpx — Modern async HTTP client used internally for proxying requests.
- rtk — Rust Token Killer. Reduces LLM token consumption 60–90% by filtering and compressing dev command outputs. Single binary, zero dependencies.