fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)#108
Conversation
Eliminates the cache detector's 100% false-positive rate on corpus v1
by tightening two distinct emission paths:
Python (src/scanner/python-waste-detector.ts):
- Reorder isReadLikeCall(): generative-method denylist + write-verb
check now run BEFORE the cacheCapable shortcut, so chat.completions,
messages, generate, etc. can never sneak through as "read-like" via
fingerprint-default cacheCapable=true.
- Add explicit-write-HTTP-method authority (POST/PUT/PATCH/DELETE):
raw requests.post to URLs whose path contains a read keyword
(e.g. /v1/embed) is no longer misclassified as cacheable.
AST (src/ast/waste/cache-detector.ts):
- Bucket redundancy by methodChain + endpoint URL for universal HTTP
method names (fetch, axios.*, got, ky, superagent.*, requests.*).
Three different fetch() URLs in one file no longer collapse into a
single "fetch" bucket and trip the redundant-call signal. Evidence
string still renders methodChain only.
Measurement (benchmark/baseline.json, docs/accuracy/findings.md):
- cache: 7 FP / 0 TP → 0 FP / 0 TP (row removed from baseline)
- batch: 7 FP / 0 TP → 9 FP / 0 TP (pre-existing corpus drift,
not caused by PR-2; next C1 PR tightens batch)
- finding precision: 6.25% → 9.09% globally
- All other per-detector metrics unchanged
- No detection or recall regressions
Tests (src/test/c1-pr2-cache-tightening.test.ts, fixtures/c1-pr2/):
- 5 new test cases pinning both regression directions: chat-completion,
POST /v1/embed, three-different-fetch-URLs (negative); stripe.retrieve
and same-URL-fetch (positive recall preservation).
- Total: 352 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR implements the second tightening phase (C1 PR-2) for the cache waste detector by fixing two distinct false-positive modes: Python generative/LLM endpoints incorrectly treated as cacheable reads, and TypeScript AST redundancy bucketing collapsing distinct HTTP endpoints by method chain alone. The changes include detection logic improvements, test fixtures, a comprehensive test suite, documentation of the investigation and approach, and updated benchmark baselines. ChangesC1 PR-2 Cache Detector False Positive Fixes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/ast/waste/cache-detector.ts`:
- Around line 63-68: chainKey currently builds redundancy buckets for
UNIVERSAL_HTTP_CHAIN using only match.methodChain and match.endpoint, which
collapses different HTTP methods (e.g., GET vs POST) for the same endpoint;
update chainKey to include match.method in the key for universal chains so
method-specific calls are separated (use match.method along with
match.methodChain and match.endpoint in the returned string), and apply the same
change to the other similar key constructions referenced by the reviewer (the
same logic used around the blocks identified by the reviewer).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e6a3e024-be7c-47d9-82c1-63511ead551e
📒 Files selected for processing (12)
benchmark/baseline.jsondocs/accuracy/findings.mddocs/superpowers/plans/2026-05-13-c1-pr2-cache-detector-tightening.mdpackage.jsonsrc/ast/waste/cache-detector.tssrc/scanner/python-waste-detector.tssrc/test/c1-pr2-cache-tightening.test.tssrc/test/fixtures/c1-pr2/python_chat_completion.pysrc/test/fixtures/c1-pr2/python_post_embed.pysrc/test/fixtures/c1-pr2/python_real_read.pysrc/test/fixtures/c1-pr2/ts_diff_fetch_urls.tssrc/test/fixtures/c1-pr2/ts_same_fetch_url.ts
…al HTTP chains Addresses CodeRabbit review on #108. chainCount is built from all matches (write-likes are skipped only later, at the per-match emission loop), so a sibling POST /x mutation would inflate the same bucket as a GET /x and falsely mark the GET as redundant. Including match.method in the bucket key keeps GET and POST traffic to the same URL in separate buckets. Adds a regression fixture (`ts_get_post_same_url.ts` — one fetch GET and one fetch POST to the same URL) and the corresponding test (6th c1-pr2 case). 353/353 tests pass; benchmark unchanged (cache still 0/0/0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @coderabbitai — valid catch. Pushed b9c8b4c:
Tests: 353/353 PASS. Benchmark unchanged (cache still 0/0/0). |
|
Tip For best results, initiate chat on the files or code changes.
( ´◡` ) 🐇 ✨ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
Summary
Eliminates the cache detector's 100% false-positive rate on corpus v1 (7 FPs / 0 TPs → 0 FPs / 0 TPs) without losing recall, by tightening two distinct emission paths.
Builds on #106 (per-detector measurement infrastructure). The per-type gate this PR exercises is what proves the change is real, not anecdotal.
What changed
Python detector (
src/scanner/python-waste-detector.ts)isReadLikeCall()reorder + add explicit-write-method authority. Three layered fixes for the 6 chat-completion + 1 raw-POST cases that were leaking through:messages.create,responses.create,images.generate, TTS/STT, etc. are conceptually non-cacheable even when the fingerprint registry marks themcacheCapable: true. NewPYTHON_GENERATIVE_METHODregex.cacheCapable—create/insert/update/etc. always win, so an SDK call like_client.chat.completions.create(...)can no longer be admitted as read-like via fingerprint default.requests.post("https://api.cohere.ai/v1/embed", ...)where the URL path contains a read keyword (embed) but the HTTP method is authoritative.AST detector (
src/ast/waste/cache-detector.ts)URL-aware redundancy bucketing for universal HTTP method names. Three different
fetch()URLs in one file (the elevenlabs FP shape) used to collapse into a single"fetch"bucket and trip theredundantsignal. NewchainKey(match)keys those bymethodChain + endpoint URLforfetch,axios.*,got,ky,superagent.*,requests.*. SDK chains likeclient.embeddings.createare unaffected (regex anchored).The rendered evidence string still shows
methodChainonly — the bucketing key is internal.Measurement (corpus v1, 7 fixtures)
Per-detector:
n_plus_onecachebatchrate_limitunbatched_parallelPlan predicted finding precision would land near 11% (1/9). Actual is 9.09% (1/11) only because batch FP count drifted upward 7 → 9 since the original baseline was set; the cache fix itself contributed exactly the predicted -7 emissions.
Tests
5 new TDD-style fixtures + tests in
src/test/c1-pr2-cache-tightening.test.tsandsrc/test/fixtures/c1-pr2/:python_chat_completion.pypython_real_read.pystripe.Customer.retrieve→ ≥1 cache findingts_diff_fetch_urls.tsts_same_fetch_url.tspython_post_embed.pyAll 352 tests pass. Per-type gate self-test verified end-to-end (faked baseline batch precision = 100%, run failed with
findings[batch].precision: 100.0% → 0.0% (Δ -100.00pp)).Acceptance criteria for issue #83 (part 2)
raw-fetch-elevenlabs/src/tts-service.ts:30npm testpasses (352/352)npm run benchmarkexits 0; per-type gate does not fail oncachefindingPrecisionrises (6.25% → 9.09%, exceeds the +2pp threshold even with batch drift)benchmark/baseline.jsonupdated;cacherow collapses to absentOut of scope (next C1 PRs)
batchdetector (now 9 FPs) — same conceptual fix shape (write-method/generative-method authority over chain/URL keyword inference). Separate PR.rate_limitdetector — defer until corpus has rate-limit positive cases.unbatched_parallel↔concurrency_controlcorpus terminology mismatch — corpus follow-up.Notes for reviewers
PYTHON_GENERATIVE_METHODis intentionally broad; it includes bare-word alternatives like\binvoke\b,\brun\b,\bstream\b,\bmessages\b,\bgenerate\b. The plan accepts this trade-off (LLM invocation verbs > the rare false-suppression risk on unrelated SDKs). If a real over-suppression FN shows up in a future corpus expansion, the regex tightens to chain-anchored matches (\.invoke\(, etc.).chainKey()falls back to chain-only whenmatch.endpointis null (e.g. unfolded dynamic URL). A2 const-folding handles most realistic cases; documented as acceptable failure mode.dedupeFindingskeys includetype, so removing cache findings cannot unmask findings of any other type. ThebatchFP count drift between the original baseline and now is independent of this PR (re-verified by stashing the diff and re-running the CLI on main — same 9 batch findings).Test plan
npm test— 352 PASSnpm run benchmark— exit 0, no regressionscacheemits zero findings everywhere🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests
Documentation