A terminal coding agent powered by Kimi K2.6 on Cloudflare Workers AI — with optional routing through your own AI Gateway for first-class observability, caching, and authoritative cost.
All on your Cloudflare account.
You bring your own Cloudflare Account ID + API Token. KimiFlare calls Workers AI directly by default — fastest path, fewest moving parts. You can optionally turn on routing through an AI Gateway in your account (provisioned or reused on first run) for observability, caching, and cost reporting. Either way, nothing leaves your Cloudflare tenancy.
With AI Gateway enabled you get this for free:
- Per-request logs with full payload, latency, and status — visible in the Cloudflare dashboard
- Response caching with configurable TTL (
/gateway cache-ttl <seconds>) - Authoritative per-turn cost pulled from the Gateway logs API — no estimates
- Cache-hit ratio and per-feature cost breakdown in
/cost - Auto-tagging of every request with
feature/sessionId/turnIdxmetadata for downstream attribution
- 262k context window — Read entire modules, large configs, and full stack traces without the model losing track.
- Image understanding — Drop image paths (PNG, JPG, WebP, GIF, BMP up to 5 MB) into any prompt. Great for UI reviews, diagrams, and screenshots.
- Plan / Edit / Auto modes —
planis a whitelist-only research mode: only read-only tools (read, glob, grep, web search, GitHub read-only, browser fetch) are allowed. Writes, edits, mutating bash, MCP tools, and LSP renames are all blocked.edit(default) prompts per mutating call.autoapproves everything for trusted tasks. - Windows support — OS-aware shell auto-detects
cmd.exe/ PowerShell on Windows,bashon Unix. Thebashtool works out of the box on all platforms. - Message queuing — Submit multiple messages while the agent is busy; they queue and auto-drain. Escape interrupts the current turn but preserves the queue.
- Smart permission modal — Denying a tool opens inline feedback so you can tell the agent what to do instead. Keyboard-native navigation (
↑/↓,j/k,Alt+1/2/3). - Loop guardrails — Agent hard-stops when all tools in a turn are blocked, preventing infinite token-burning cycles.
- Persistent all-time cost history — Append-only
history.jsonltracks daily usage forever, so/costshows true all-time and monthly totals that survive across sessions and version updates. - Live, gateway-confirmed cost tracking — Status bar shows a fast local estimate (
≈$0.12) that flips to the real, Cloudflare-billed number once the AI Gateway log reconciles. Per-turn latency renders next to cost. - LSP + MCP — Semantic code intelligence (hover, go-to-definition, references, diagnostics) via Language Server Protocol. Extend with external tools via Model Context Protocol.
- Local structured memory — SQLite + embeddings cross-session memory. The agent recalls facts, instructions, and preferences across sessions via
remember,recall, andforgettools. - Web search, GitHub, and headless browser — Research the web, read GitHub repos, and fetch JavaScript-rendered pages without leaving your terminal.
- OS-aware shell with Windows support — Auto-detects
cmd.exe, PowerShell, or bash based on platform. Override withKIMIFLARE_SHELLor/shell. - Smart permission modal with inline feedback — Deny a tool and immediately tell the agent what to do instead. Keyboard-native navigation with
↑/↓,j/k,Alt+1/2/3. - True message queuing — Enter queues messages while the agent is busy; Escape interrupts and auto-drains the queue.
- Hard-stop loop guardrail — Stops token-burning cycles when all tools in a turn are blocked.
- Persistent all-time usage history —
history.jsonltracks daily usage forever;/costshows true all-time and monthly totals. - Humanized Cloudflare API errors — Actionable error codes and structured error display instead of raw JSON dumps.
- 429 rate limit retry — Automatic backoff and retry when Cloudflare rate-limits requests.
- Tool state visualization — Queued, rejected, and cancelled tools are clearly labeled in the TUI.
- Paste preview placeholders — Pasted content shows a snippet preview with sequential IDs instead of random hashes.
- Headless SDK — Programmatic
createAgentSessionAPI and JSONL-over-stdio RPC mode for building on top of KimiFlare.
See the full changelog at github.com/sinameraji/kimiflare/releases.
npm install -g kimiflare
kimiflareOn first run, an interactive onboarding wizard collects your Cloudflare credentials and provisions (or picks) an AI Gateway. That's it.
Or run without installing:
npx kimiflareRequires Node.js ≥ 20.
The onboarding wizard provisions or picks an AI Gateway in your account. Your Cloudflare API token needs:
Workers AI:ReadAI Gateway:Read(to list gateways)AI Gateway:Edit(to create gateways)
Edit your token at: https://dash.cloudflare.com/profile/api-tokens
Once configured, /cost shows the Gateway-confirmed totals, cache hit ratio, per-feature breakdown, and direct dashboard links to each request log. /gateway status shows the current TTL, skip-cache flag, metadata tags, and live cache-hit ratio.
KimiFlare runs on Kimi K2.6 via Cloudflare Workers AI — no API key needed beyond your Cloudflare token:
@cf/moonshotai/kimi-k2.6— 262k context, reasoning, tools
@cf/moonshotai/kimi-k2.5 is also available for older sessions.
kimiflare -p "summarize PLAN.md" # stream answer to stdout
kimiflare -p "..." --dangerously-allow-all # auto-approve mutating tools (for scripts)
kimiflare -p "..." --reasoning # include chain-of-thought in stderrUse KimiFlare programmatically from your own application — no TUI required.
import { createAgentSession } from "kimiflare/sdk";
const { session } = await createAgentSession({
cwd: "/path/to/project",
config: {
accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
apiToken: process.env.CLOUDFLARE_API_TOKEN,
aiGatewayId: process.env.CLOUDFLARE_AI_GATEWAY_ID,
model: "@cf/moonshotai/kimi-k2.6",
},
});
// Stream every event: text deltas, tool calls, tasks, usage
session.subscribe((event) => {
console.log(event.type, event);
});
// Send a prompt
await session.prompt("Refactor auth to JWT + Redis");
// Mid-flight correction while the agent is still running
await session.steer("Use Redis instead of in-memory store");
// After the turn finishes
await session.followUp("Also add unit tests");
// Clean up
session.dispose();Key features:
subscribe()— receive typed events (text_delta,tool_call,tool_result,task_update,usage,warning,error,done, etc.)prompt()/steer()/followUp()— full conversation lifecyclepause()/resume()— graceful preemptiongetStatus()/getUsage()— inspect session state- Custom
permissionHandler— decide programmatically whether to allow mutating tools - Optional
memoryEnabled,lspEnabled,costAttributionflags
The SDK needs a Cloudflare Account ID, API Token, and AI Gateway ID. Credentials are resolved in this priority order:
- Explicit
configobject (recommended for apps) - Environment variables:
CLOUDFLARE_ACCOUNT_ID/CF_ACCOUNT_ID,CLOUDFLARE_API_TOKEN/CF_API_TOKEN - Config file:
~/.config/kimiflare/config.json
For Electron / desktop apps, we recommend storing credentials in the OS keychain (e.g. Electron safeStorage or keytar) and passing them explicitly:
import { createAgentSession } from "kimiflare/sdk";
const accountId = await keytar.getPassword("kimiflare", "accountId");
const apiToken = await keytar.getPassword("kimiflare", "apiToken");
const { session } = await createAgentSession({
cwd: projectPath,
config: { accountId, apiToken },
});If you need process isolation or a non-Node consumer, run KimiFlare in JSONL-over-stdio RPC mode:
node bin/kimiflare.mjs --mode rpcimport { spawn } from "node:child_process";
const proc = spawn("npx", ["kimiflare", "--mode", "rpc"], {
cwd: projectPath,
stdio: ["pipe", "pipe", "pipe"],
});
// Read events
proc.stdout.on("data", (chunk) => {
for (const line of chunk.toString().split("\n")) {
if (!line.trim()) continue;
const event = JSON.parse(line);
console.log(event.type, event);
}
});
// Send commands
proc.stdin.write(JSON.stringify({ type: "new_session" }) + "\n");
proc.stdin.write(JSON.stringify({ type: "prompt", message: "Hello" }) + "\n");
// Resolve a permission request
proc.stdin.write(
JSON.stringify({ type: "resolve_permission", requestId: "req_0", decision: "allow" }) + "\n"
);kimiflare
› fix the layout bug in this screenshot docs/bug.png
› convert this mockup design.png to Tailwind HTML| Command | Effect |
|---|---|
/mode edit|plan|auto |
Switch permission mode |
/shell auto|bash|cmd|powershell |
Show or set the shell for the bash tool |
/thinking low|medium|high |
Reasoning effort (persists) |
/theme |
Interactive theme picker (Ctrl+T) |
/resume |
Pick a past conversation to restore |
/compact |
Summarize older turns to free context |
/init |
Scan repo and write KIMI.md project context |
/memory |
Show memory stats and search |
/mcp list / /mcp reload |
Manage MCP servers |
/reasoning |
Toggle chain-of-thought display |
/cost |
Show Gateway-confirmed cost, cache hit ratio, and per-feature breakdown |
/gateway status |
Show AI Gateway config and live cache-hit ratio |
/update |
Check for updates |
/help |
List all commands |
| Shortcut | Action |
|---|---|
Ctrl+C / Esc |
Interrupt current turn when busy; exit when idle |
Ctrl+R |
Toggle reasoning display |
Ctrl+O |
Toggle verbose tool output |
Ctrl+T |
Open theme picker |
Shift+Tab |
Cycle mode (edit → plan → auto) |
↑ / ↓ |
Walk prompt history |
KimiFlare writes structured JSON logs of agent-side activity (tool calls,
permission decisions, MCP/LSP lifecycle, session events, errors) to
~/.config/kimiflare/logs/<date>.jsonl, one file per day, with 7-day
retention pruned automatically at startup.
The logs deliberately exclude prompts and completions — those live in
Cloudflare AI Gateway
already, and each log entry includes the Gateway request_id so you
can join them when you need the network side.
kimiflare logs path # today's file
kimiflare logs dir # log directory
kimiflare logs prune # delete files older than 7 days
# Tail this session's activity, formatted:
tail -f $(kimiflare logs path) | jq
# Find the slowest tool calls in the last day:
jq -r 'select(.event == "tool:end") | "\(.data.duration_ms)\t\(.data.tool)"' \
$(kimiflare logs path) | sort -rn | headDisable the file sink entirely with KIMIFLARE_LOG_SINK=off. The
separate KIMIFLARE_LOG_LEVEL env var (default off) controls stderr
output — independent of the file sink.
If you set KIMIFLARE_OTEL_ENDPOINT, KimiFlare also ships each log
entry to that endpoint over OTLP/HTTP
so it lands in Datadog, Honeycomb, Grafana Loki, an internal collector,
or any other backend that speaks OTel. Batched every 5 s (or every
100 entries, whichever first) and best-effort — never blocks the agent
loop.
# Full path:
export KIMIFLARE_OTEL_ENDPOINT="https://otel.example.com/v1/logs"
# Or just the base URL (we auto-append /v1/logs):
export KIMIFLARE_OTEL_ENDPOINT="https://otel.example.com"
# Optional headers (comma-separated key=value pairs) — e.g. for auth:
export KIMIFLARE_OTEL_HEADERS="Authorization=Bearer xyz,X-Tenant=acme"Each log entry maps to one OTel LogRecord. Correlation IDs
(session_id, turn_id, request_id) become record attributes,
data.* fields are flattened to attributes with type-preserving
encoding, and a service.name=kimiflare + service.version pair sits
on the resource. The same request_id joins to Cloudflare AI Gateway's
per-request log without any extra work.
KimiFlare can fire shell commands at five points in an agent turn,
configured per-project (.kimiflare/settings.json) or globally
(~/.config/kimiflare/settings.json):
| Event | Fires when | Veto? |
|---|---|---|
PreToolUse |
A tool call is about to run | Yes |
PostToolUse |
A tool call just finished | No |
UserPromptSubmit |
You hit Enter on a prompt | Yes |
Stop |
A turn ended cleanly | No |
PreCompact |
Auto-compaction is about to run | No |
Hooks receive the event payload as JSON on stdin and as
KIMIFLARE_HOOK_* env vars (for shell-one-liner ergonomics).
Non-zero exit on a veto event cancels the underlying action and
surfaces the hook's stdout as the rejection reason.
/hooks # list configured hooks
/hooks recommended # list starter hooks shipped with kimiflare
/hooks enable stop-bell # enable one (writes to .kimiflare/settings.json)
/hooks enable stop-bell global # ...or the global file
/hooks disable stop-bell
/hooks path # print settings.json paths
/hooks reload # re-read settings.json after a manual edit
The recommended catalog includes terminal bells / macOS notifications
on Stop, secret-file guards on PreToolUse (e.g. block edits to
*.env), auto-format-with-prettier on PostToolUse, and a tool-call
audit log. All ship disabled — /hooks recommended lists them.
{
"hooks": {
"PreToolUse": [
{
"id": "no-secrets",
"matcher": "^(edit|write)$",
"command": "case \"$KIMIFLARE_HOOK_PATH\" in *.env|*.pem) echo 'blocked'; exit 1;; esac"
}
],
"PostToolUse": [
{
"id": "format-ts",
"matcher": "^(edit|write)$",
"command": "npx --no-install prettier --write \"$KIMIFLARE_HOOK_PATH\" >/dev/null 2>&1 || true"
}
],
"Stop": [
{ "id": "bell", "command": "printf '\\a'" }
]
}
}Per-hook fields:
command(required) — the shell command.matcher(optional) — anchored regex matched against the tool name forPreToolUse/PostToolUse. Ignored for other events.id(optional) — stable handle for/hooks enable|disable. Auto-derived fromevent + commandwhen omitted.enabled(defaulttrue) — setfalseto keep a hook in config but skip it.timeoutMs(default30000) — hard kill if the hook hangs.description(optional) — shown by/hooks list.
Hooks are always-on infrastructure: they fire whether the TUI is open
or kimiflare is running in --print mode. They also fire for tool
calls generated from inside the Code Mode sandbox (heavy-tier turns),
because hook firing lives on the ToolExecutor itself — every call
path uses the same plumbing.
When intent classification has assigned a tier, hook payloads include
it as tier: "light" | "medium" | "heavy" (on UserPromptSubmit,
PreToolUse, PostToolUse) and as $KIMIFLARE_HOOK_TIER. Useful for
"skip auto-format on light turns" or "audit every heavy-turn write."
SDK consumers opt in to hooks with enableHooks: true on
createAgentSession. Default is off because the SDK is a primitive,
not the TUI.
git clone https://github.com/sinameraji/kimiflare
cd kimiflare
npm install
npm run build
npm linkScripts:
npm run build— bundle with tsupnpm run dev— run via tsxnpm run typecheck—tsc --noEmitnpm test— run tests
- Fork the repository
- Create a branch:
git checkout -b feat/your-feature - Make your changes
- Run
npm run typecheckandnpm run build - Commit with Conventional Commits
- Open a Pull Request
Built by Sina Meraji and contributors · MIT License

