feat(llm): add `--backend claude-cli` (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed) by spindle79 · Pull Request #856 · safishamsi/graphify

spindle79 · 2026-05-13T20:40:15Z

Closes #855.

Summary

Adds a claude-cli backend that shells out to the locally-installed claude CLI (Claude Code) via claude -p --output-format json instead of calling the Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic pass against their plan instead of provisioning a separate ANTHROPIC_API_KEY.

Verified end-to-end on a real codebase: full semantic extract (167k in / 8k out tokens, Opus 4.7 1M context), billed against the user's Max plan, $0 marginal cost.

Changes

graphify/llm.py:
- New BACKENDS["claude-cli"] entry — env_key: None, zero pricing (calls bill against the subscription, not API credit, so estimator should treat this as $0).
- New _call_claude_cli(user_message, max_tokens):
  - Verifies shutil.which("claude") finds the CLI.
  - Subprocess-calls claude -p --output-format json --append-system-prompt <_EXTRACTION_SYSTEM> with the user message piped on stdin.
  - Parses the JSON envelope (result → model output; usage → token counts; stop_reason → finish_reason; modelUsage → model id).
  - Reports total input tokens as usage.input_tokens + cache_read_input_tokens + cache_creation_input_tokens so accounting reflects what the model actually processed (the CLI's default system prompt is cached aggressively, so subsequent calls within a session see high cache_read counts).
- New dispatch arm in extract_files_direct plus the lower _call_llm helper used by the dedup tiebreaker.
- New env-key skip for claude-cli (mirrors the existing bedrock skip in the validation).
graphify/__main__.py:
- New env-key carve-out in the extract command's backend validation: when backend == "claude-cli", check shutil.which("claude") instead of an env var. Same shape as the existing ollama local-loopback carve-out, with a helpful error message if the CLI isn't installed.
tests/test_claude_cli_backend.py:
- 7 tests, all mocking subprocess.run + shutil.which so the suite runs on CI without the claude binary or network.
- Coverage: success path (parsed result + usage sums + model id), stop_reason == "max_tokens" → finish_reason == "length", missing CLI raises, non-zero exit raises, unparseable JSON envelope raises, end-to-end through extract_files_direct, and pricing-is-zero contract.

Why not `--bare` mode?

The CLI's --bare flag strips the default Claude Code system prompt but requires ANTHROPIC_API_KEY to authenticate — defeating the purpose. So this backend uses --append-system-prompt instead. The default Claude Code prompt gets appended to _EXTRACTION_SYSTEM, but the extraction prompt is strict enough that Claude produces valid JSON anyway. Empirically verified: ran the full graphify extract pipeline through this backend on a 1,083-node graph and got valid extraction output.

One side effect: usage.input_tokens for the first call in a session is large (the CLI's system prompt is part of the model input). Subsequent calls hit cache_read_input_tokens for that prefix and the fresh-input count drops to just the user message. This is normal Anthropic prompt caching behavior; the test fixture uses a realistic envelope showing this pattern.

Test plan

Seven new tests pass locally.
Full suite: 742 passed, same 11 baseline failures (SQL needs optional tree-sitter-sql; one Fortran preprocessed test; two Ollama env-var tests) — zero new failures.
Manual end-to-end: graphify extract . --backend claude-cli on a real 336-code-file / 32-doc corpus completed in a single chunk, produced 1,083 graph nodes and 2,140 edges, $0 marginal cost on the user's Max plan.

Notes for the reviewer

I considered adding a detect_backend() arm that auto-selects claude-cli when the binary is present and no API keys are set. Skipped for this PR to keep behavior change minimal — --backend claude-cli must be explicit. Happy to add auto-detection in a follow-up if you'd prefer.
The pricing dict is {"input": 0.0, "output": 0.0} because cost is plan-based, not per-token. If you'd rather track plan token usage (without dollarizing it) the estimator could grow a plan_tokens field — also a follow-up.

Adds a new `--backend claude-cli` that shells out to the locally-installed Claude Code CLI (`claude -p --output-format json`) instead of calling the Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic pass without provisioning a separate ANTHROPIC_API_KEY — costs bill against the plan instead of pay-as-you-go API credit. Implementation: - New BACKENDS entry "claude-cli" with `env_key: None` and zero pricing. - New `_call_claude_cli()` runs `claude -p` via subprocess with the graphify _EXTRACTION_SYSTEM prompt passed through `--append-system-prompt`. Parses the JSON envelope, extracts `result` as the model output, and reports total tokens including cache_read/cache_creation so usage accounting reflects what the model actually processed. - New dispatch arm in `extract_files_direct` plus the lower `_call_llm` helper (for dedup tiebreaker etc). - New env-key carve-out in `__main__.py`'s `extract` validation that checks `shutil.which("claude")` instead of an env variable, mirroring ollama's local-loopback carve-out. The CLI must be on $PATH and the user must have authenticated `claude` interactively at least once (OAuth flow stores credentials locally). Functions that require the SDK-shaped path (anthropic.Anthropic client) remain unchanged for `--backend claude`. Tests in `tests/test_claude_cli_backend.py` mock subprocess.run so the suite runs anywhere — no live CLI or network needed for CI.

safishamsi · 2026-05-13T22:53:12Z

Implemented in commit 258d260 — great PR @spindle7

Two small additions on top of your implementation before landing:

--no-session-persistence added to the subprocess args — prevents session state accumulating across sequential extraction chunks, no auth impact.
max_concurrency=1 guard added to extract_corpus_parallel (mirrors the ollama guard) — parallel claude -p processes conflict over Claude Code session state. Can be opted out with GRAPHIFY_CLAUDE_CLI_PARALLEL=1 for users who know what they're doing.

9/9 tests pass. End-to-end verified.

safishamsi closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llm): add `--backend claude-cli` (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856

feat(llm): add `--backend claude-cli` (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856
spindle79 wants to merge 1 commit into
safishamsi:v7from
spindle79:feat/claude-cli-backend

spindle79 commented May 13, 2026

Uh oh!

safishamsi commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

spindle79 commented May 13, 2026

Summary

Changes

Why not --bare mode?

Test plan

Notes for the reviewer

Uh oh!

safishamsi commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why not `--bare` mode?

safishamsi commented May 13, 2026 •

edited

Loading