claudely

pron. "CLAW-dlee" — Claude, locally.

Launch Claude Code against a local LLM (LM Studio, Ollama, llama.cpp, or any Anthropic-compatible endpoint), without affecting the regular claude command — that one keeps talking to the official Anthropic API.

Disclaimer. This is an unofficial, community-maintained helper. It is not affiliated with, endorsed by, or sponsored by Anthropic. Claude and Claude Code are trademarks of Anthropic, used here only descriptively to identify the upstream tool this CLI wraps. claudely does not modify the claude binary; it only sets documented environment variables and spawns claude unchanged.

Why this exists

Plenty of CLI coding agents will happily talk to a local LLM. The catch is the ecosystem: skills, slash commands, MCP servers, plugins, hooks — the interesting tooling has been built specifically for Claude Code, and the parity story on every other agent is patchy at best. Trying to reuse a Claude-shaped workflow on a different agent quickly turns into "rewrite all the plugins" or "do without."

claudely skips that fight. Keep Claude Code as the client (and its entire plugin / skill / MCP ecosystem with it), and just point it at a model running on your own hardware. The official claude command still talks to Anthropic; claudely is a separate entrypoint that sets the right environment variables for the local path and spawns claude unchanged.

Install

Published on npm as claudely.

# global (recommended)
npm i -g claudely

# or one-shot, no install
npx claudely

Requires Node.js ≥ 20 and the claude CLI on your PATH. (npm 7+ will install Claude Code automatically as a peer dependency; users who got claude via the native installer or Homebrew are unaffected.)

Quickstart

# LM Studio (default), interactive picker over your downloaded models
claudely

# Ollama
claudely -p ollama

# llama.cpp (whichever GGUF llama-server is currently serving)
claudely -p llamacpp

# Skip the picker by naming a model
claudely -p ollama -m gpt-oss:20b
claudely -p lmstudio -m openai/gpt-oss-20b

# Just print what's available, don't launch claude
claudely -p ollama --list

# Custom Anthropic-compatible endpoint (e.g. a litellm proxy)
claudely -p custom -u http://localhost:4000 -t sk-anything -m my-model

# Any flag claudely doesn't recognize is forwarded verbatim to claude
claudely -p ollama -m gpt-oss:20b --print "explain this repo"

# `--` is an escape hatch to force a token through, e.g. if claude grows
# a flag whose name collides with one of claudely's own
claudely -p ollama -- --provider force-this-to-claude

Supported providers

Provider	Default base URL	Native?	Docs
`lmstudio` (default)	`http://localhost:1234`	yes	https://lmstudio.ai/blog/claudecode
`ollama`	`http://localhost:11434`	yes	https://docs.ollama.com/integrations/claude-code
`llamacpp`	`http://localhost:8080`	yes	https://unsloth.ai/docs/basics/claude-code
`custom`	(you supply it)	depends	point at any Anthropic-compatible endpoint or proxy

For backends that only speak the OpenAI protocol (vLLM, text-generation-webui, TabbyAPI, …), front them with a translation proxy such as litellm or claude-code-router and point claudely at the proxy via -p custom.

Prerequisites

Node.js ≥ 20 and the claude CLI on your PATH
A running local server for the provider you want:
- LM Studio — lms server start --port 1234 plus at least one downloaded model (lms ls --llm)
- Ollama — ollama serve plus at least one pulled model (ollama list)
- llama.cpp — llama-server --port 8080 -m /path/to/model.gguf (single model per server instance)

Selection precedence

Setting	Sources, first match wins
Provider	`-p` flag → `$CLAUDELY_PROVIDER` → `lmstudio`
Model	`-m` flag → `$CLAUDELY_MODEL` → `$LMSTUDIO_MODEL` / `$OLLAMA_MODEL` / `$LLAMACPP_MODEL` → interactive picker
Base URL	`-u` flag → `$CLAUDELY_BASE_URL` → provider default
Token	`-t` flag → `$CLAUDELY_TOKEN` → provider default
Port	`$LMSTUDIO_PORT` / `$OLLAMA_PORT` / `$LLAMACPP_PORT` (only affect provider defaults)

What claudely exports to `claude`

Every variable is set in the spawned process only — your shell (and the regular claude command) is untouched.

ANTHROPIC_BASE_URL=<provider base URL>

# auth_token style (lmstudio, ollama, custom):
ANTHROPIC_AUTH_TOKEN=<provider token>
ANTHROPIC_API_KEY=""           # blanks any inherited real Anthropic key

# api_key style (llamacpp, per unsloth's docs):
ANTHROPIC_API_KEY=<provider token>
# ANTHROPIC_AUTH_TOKEN unset

# KV-cache fix (only set if not already in your env):
CLAUDE_CODE_ATTRIBUTION_HEADER=0

KV-cache speedup (handled automatically)

Claude Code prepends an attribution string to the system prompt that contains a per-request hash (x-anthropic-billing-header: cc_version=…; cch=…;). On a local server every turn hashes differently, so the prompt cache misses every single time — unsloth measured ~90% slowdown. The fix is a single env var: CLAUDE_CODE_ATTRIBUTION_HEADER=0. claudely sets it for you in the spawned process, so the regular claude command is unaffected. Override per-invocation by exporting your own value first.

References: official env-vars docs, claude-code#50085.

Optional, not set by default

# Skip Claude Code's telemetry / feedback traffic. Useful when the model is
# local, but it's left to your judgment — claudely does not disable analytics
# Anthropic uses to improve Claude Code without an explicit opt-in.
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Notes and limitations

LM Studio and Ollama JIT-load models on first request. llama.cpp serves whichever GGUF was passed at startup; switch models by restarting llama-server with a different -m path.
Claude Code's in-session /model command does not auto-discover backend models; it accepts an arbitrary id string. To switch mid-session, type /model <id> with one of the ids shown by claudely --list.
Effort levels: effortLevel: "xhigh" (Anthropic Opus 4.7 only) gets rejected by local Anthropic-compatible servers (LM Studio, Ollama, etc.) with HTTP 400. When claudely detects xhigh in ~/.claude/settings.json and the target is not api.anthropic.com, it prints a one-line stderr warning and injects --effort high into the spawned claude argv for that session — your settings file is left untouched. To make it permanent, run /effort high inside Claude Code or edit settings.json. An explicit --effort you pass after -- always wins over the override.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.claude		.claude
.github/workflows		.github/workflows
.hooks		.hooks
docs/superpowers/specs		docs/superpowers/specs
graphify-out		graphify-out
src		src
.gitignore		.gitignore
.graphifyignore		.graphifyignore
.npmignore		.npmignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claudely

Why this exists

Install

Quickstart

Supported providers

Prerequisites

Selection precedence

What claudely exports to `claude`

KV-cache speedup (handled automatically)

Optional, not set by default

Notes and limitations

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

claudely

Why this exists

Install

Quickstart

Supported providers

Prerequisites

Selection precedence

What claudely exports to claude

KV-cache speedup (handled automatically)

Optional, not set by default

Notes and limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What claudely exports to `claude`

Packages