Skip to content

feat(llm): add --backend claude-cli (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856

Closed
spindle79 wants to merge 1 commit into
safishamsi:v7from
spindle79:feat/claude-cli-backend
Closed

feat(llm): add --backend claude-cli (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856
spindle79 wants to merge 1 commit into
safishamsi:v7from
spindle79:feat/claude-cli-backend

Conversation

@spindle79
Copy link
Copy Markdown
Contributor

Closes #855.

Summary

Adds a claude-cli backend that shells out to the locally-installed claude CLI (Claude Code) via claude -p --output-format json instead of calling the Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic pass against their plan instead of provisioning a separate ANTHROPIC_API_KEY.

Verified end-to-end on a real codebase: full semantic extract (167k in / 8k out tokens, Opus 4.7 1M context), billed against the user's Max plan, $0 marginal cost.

Changes

  • graphify/llm.py:

    • New BACKENDS["claude-cli"] entry — env_key: None, zero pricing (calls bill against the subscription, not API credit, so estimator should treat this as $0).
    • New _call_claude_cli(user_message, max_tokens):
      • Verifies shutil.which("claude") finds the CLI.
      • Subprocess-calls claude -p --output-format json --append-system-prompt <_EXTRACTION_SYSTEM> with the user message piped on stdin.
      • Parses the JSON envelope (result → model output; usage → token counts; stop_reasonfinish_reason; modelUsage → model id).
      • Reports total input tokens as usage.input_tokens + cache_read_input_tokens + cache_creation_input_tokens so accounting reflects what the model actually processed (the CLI's default system prompt is cached aggressively, so subsequent calls within a session see high cache_read counts).
    • New dispatch arm in extract_files_direct plus the lower _call_llm helper used by the dedup tiebreaker.
    • New env-key skip for claude-cli (mirrors the existing bedrock skip in the validation).
  • graphify/__main__.py:

    • New env-key carve-out in the extract command's backend validation: when backend == "claude-cli", check shutil.which("claude") instead of an env var. Same shape as the existing ollama local-loopback carve-out, with a helpful error message if the CLI isn't installed.
  • tests/test_claude_cli_backend.py:

    • 7 tests, all mocking subprocess.run + shutil.which so the suite runs on CI without the claude binary or network.
    • Coverage: success path (parsed result + usage sums + model id), stop_reason == "max_tokens"finish_reason == "length", missing CLI raises, non-zero exit raises, unparseable JSON envelope raises, end-to-end through extract_files_direct, and pricing-is-zero contract.

Why not --bare mode?

The CLI's --bare flag strips the default Claude Code system prompt but requires ANTHROPIC_API_KEY to authenticate — defeating the purpose. So this backend uses --append-system-prompt instead. The default Claude Code prompt gets appended to _EXTRACTION_SYSTEM, but the extraction prompt is strict enough that Claude produces valid JSON anyway. Empirically verified: ran the full graphify extract pipeline through this backend on a 1,083-node graph and got valid extraction output.

One side effect: usage.input_tokens for the first call in a session is large (the CLI's system prompt is part of the model input). Subsequent calls hit cache_read_input_tokens for that prefix and the fresh-input count drops to just the user message. This is normal Anthropic prompt caching behavior; the test fixture uses a realistic envelope showing this pattern.

Test plan

  • Seven new tests pass locally.
  • Full suite: 742 passed, same 11 baseline failures (SQL needs optional tree-sitter-sql; one Fortran preprocessed test; two Ollama env-var tests) — zero new failures.
  • Manual end-to-end: graphify extract . --backend claude-cli on a real 336-code-file / 32-doc corpus completed in a single chunk, produced 1,083 graph nodes and 2,140 edges, $0 marginal cost on the user's Max plan.

Notes for the reviewer

  • I considered adding a detect_backend() arm that auto-selects claude-cli when the binary is present and no API keys are set. Skipped for this PR to keep behavior change minimal — --backend claude-cli must be explicit. Happy to add auto-detection in a follow-up if you'd prefer.
  • The pricing dict is {"input": 0.0, "output": 0.0} because cost is plan-based, not per-token. If you'd rather track plan token usage (without dollarizing it) the estimator could grow a plan_tokens field — also a follow-up.

Adds a new `--backend claude-cli` that shells out to the locally-installed
Claude Code CLI (`claude -p --output-format json`) instead of calling the
Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic
pass without provisioning a separate ANTHROPIC_API_KEY — costs bill against
the plan instead of pay-as-you-go API credit.

Implementation:
- New BACKENDS entry "claude-cli" with `env_key: None` and zero pricing.
- New `_call_claude_cli()` runs `claude -p` via subprocess with the
  graphify _EXTRACTION_SYSTEM prompt passed through `--append-system-prompt`.
  Parses the JSON envelope, extracts `result` as the model output, and
  reports total tokens including cache_read/cache_creation so usage
  accounting reflects what the model actually processed.
- New dispatch arm in `extract_files_direct` plus the lower `_call_llm`
  helper (for dedup tiebreaker etc).
- New env-key carve-out in `__main__.py`'s `extract` validation that
  checks `shutil.which("claude")` instead of an env variable, mirroring
  ollama's local-loopback carve-out.

The CLI must be on $PATH and the user must have authenticated `claude`
interactively at least once (OAuth flow stores credentials locally).
Functions that require the SDK-shaped path (anthropic.Anthropic client)
remain unchanged for `--backend claude`.

Tests in `tests/test_claude_cli_backend.py` mock subprocess.run so the
suite runs anywhere — no live CLI or network needed for CI.
@safishamsi
Copy link
Copy Markdown
Owner

safishamsi commented May 13, 2026

Implemented in commit 258d260 — great PR @spindle7

Two small additions on top of your implementation before landing:

  1. --no-session-persistence added to the subprocess args — prevents session state accumulating across sequential extraction chunks, no auth impact.
  2. max_concurrency=1 guard added to extract_corpus_parallel (mirrors the ollama guard) — parallel claude -p processes conflict over Claude Code session state. Can be opted out with GRAPHIFY_CLAUDE_CLI_PARALLEL=1 for users who know what they're doing.

9/9 tests pass. End-to-end verified.

@safishamsi safishamsi closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: --backend claude-cli (route through Claude Code CLI, no ANTHROPIC_API_KEY needed)

2 participants