feat(llm): add --backend claude-cli (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856
Closed
spindle79 wants to merge 1 commit into
Closed
feat(llm): add --backend claude-cli (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856spindle79 wants to merge 1 commit into
--backend claude-cli (routes through Claude Code CLI, no ANTHROPIC_API_KEY needed)#856spindle79 wants to merge 1 commit into
Conversation
Adds a new `--backend claude-cli` that shells out to the locally-installed
Claude Code CLI (`claude -p --output-format json`) instead of calling the
Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic
pass without provisioning a separate ANTHROPIC_API_KEY — costs bill against
the plan instead of pay-as-you-go API credit.
Implementation:
- New BACKENDS entry "claude-cli" with `env_key: None` and zero pricing.
- New `_call_claude_cli()` runs `claude -p` via subprocess with the
graphify _EXTRACTION_SYSTEM prompt passed through `--append-system-prompt`.
Parses the JSON envelope, extracts `result` as the model output, and
reports total tokens including cache_read/cache_creation so usage
accounting reflects what the model actually processed.
- New dispatch arm in `extract_files_direct` plus the lower `_call_llm`
helper (for dedup tiebreaker etc).
- New env-key carve-out in `__main__.py`'s `extract` validation that
checks `shutil.which("claude")` instead of an env variable, mirroring
ollama's local-loopback carve-out.
The CLI must be on $PATH and the user must have authenticated `claude`
interactively at least once (OAuth flow stores credentials locally).
Functions that require the SDK-shaped path (anthropic.Anthropic client)
remain unchanged for `--backend claude`.
Tests in `tests/test_claude_cli_backend.py` mock subprocess.run so the
suite runs anywhere — no live CLI or network needed for CI.
Owner
|
Implemented in commit Two small additions on top of your implementation before landing:
9/9 tests pass. End-to-end verified. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #855.
Summary
Adds a
claude-clibackend that shells out to the locally-installedclaudeCLI (Claude Code) viaclaude -p --output-format jsoninstead of calling the Anthropic API directly. Lets Pro/Max subscribers run graphify's semantic pass against their plan instead of provisioning a separateANTHROPIC_API_KEY.Verified end-to-end on a real codebase: full semantic extract (167k in / 8k out tokens, Opus 4.7 1M context), billed against the user's Max plan, $0 marginal cost.
Changes
graphify/llm.py:BACKENDS["claude-cli"]entry —env_key: None, zero pricing (calls bill against the subscription, not API credit, so estimator should treat this as $0)._call_claude_cli(user_message, max_tokens):shutil.which("claude")finds the CLI.claude -p --output-format json --append-system-prompt <_EXTRACTION_SYSTEM>with the user message piped on stdin.result→ model output;usage→ token counts;stop_reason→finish_reason;modelUsage→ model id).usage.input_tokens + cache_read_input_tokens + cache_creation_input_tokensso accounting reflects what the model actually processed (the CLI's default system prompt is cached aggressively, so subsequent calls within a session see high cache_read counts).extract_files_directplus the lower_call_llmhelper used by the dedup tiebreaker.claude-cli(mirrors the existingbedrockskip in the validation).graphify/__main__.py:backend == "claude-cli", checkshutil.which("claude")instead of an env var. Same shape as the existing ollama local-loopback carve-out, with a helpful error message if the CLI isn't installed.tests/test_claude_cli_backend.py:subprocess.run+shutil.whichso the suite runs on CI without theclaudebinary or network.stop_reason == "max_tokens"→finish_reason == "length", missing CLI raises, non-zero exit raises, unparseable JSON envelope raises, end-to-end throughextract_files_direct, and pricing-is-zero contract.Why not
--baremode?The CLI's
--bareflag strips the default Claude Code system prompt but requiresANTHROPIC_API_KEYto authenticate — defeating the purpose. So this backend uses--append-system-promptinstead. The default Claude Code prompt gets appended to_EXTRACTION_SYSTEM, but the extraction prompt is strict enough that Claude produces valid JSON anyway. Empirically verified: ran the full graphify extract pipeline through this backend on a 1,083-node graph and got valid extraction output.One side effect:
usage.input_tokensfor the first call in a session is large (the CLI's system prompt is part of the model input). Subsequent calls hitcache_read_input_tokensfor that prefix and the fresh-input count drops to just the user message. This is normal Anthropic prompt caching behavior; the test fixture uses a realistic envelope showing this pattern.Test plan
tree-sitter-sql; one Fortran preprocessed test; two Ollama env-var tests) — zero new failures.graphify extract . --backend claude-clion a real 336-code-file / 32-doc corpus completed in a single chunk, produced 1,083 graph nodes and 2,140 edges, $0 marginal cost on the user's Max plan.Notes for the reviewer
detect_backend()arm that auto-selectsclaude-cliwhen the binary is present and no API keys are set. Skipped for this PR to keep behavior change minimal —--backend claude-climust be explicit. Happy to add auto-detection in a follow-up if you'd prefer.{"input": 0.0, "output": 0.0}because cost is plan-based, not per-token. If you'd rather track plan token usage (without dollarizing it) the estimator could grow aplan_tokensfield — also a follow-up.