fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83) by AndresL230 · Pull Request #108 · recost-dev/extension

AndresL230 · 2026-05-13T23:04:00Z

Summary

Eliminates the cache detector's 100% false-positive rate on corpus v1 (7 FPs / 0 TPs → 0 FPs / 0 TPs) without losing recall, by tightening two distinct emission paths.

Builds on #106 (per-detector measurement infrastructure). The per-type gate this PR exercises is what proves the change is real, not anecdotal.

What changed

Python detector (`src/scanner/python-waste-detector.ts`)

isReadLikeCall() reorder + add explicit-write-method authority. Three layered fixes for the 6 chat-completion + 1 raw-POST cases that were leaking through:

Generative-method denylist runs first — chat completions, messages.create, responses.create, images.generate, TTS/STT, etc. are conceptually non-cacheable even when the fingerprint registry marks them cacheCapable: true. New PYTHON_GENERATIVE_METHOD regex.
Write-verb check now runs before cacheCapable — create/insert/update/etc. always win, so an SDK call like _client.chat.completions.create(...) can no longer be admitted as read-like via fingerprint default.
Explicit POST/PUT/PATCH/DELETE → not read-like — covers raw requests.post("https://api.cohere.ai/v1/embed", ...) where the URL path contains a read keyword (embed) but the HTTP method is authoritative.

AST detector (`src/ast/waste/cache-detector.ts`)

URL-aware redundancy bucketing for universal HTTP method names. Three different fetch() URLs in one file (the elevenlabs FP shape) used to collapse into a single "fetch" bucket and trip the redundant signal. New chainKey(match) keys those by methodChain + endpoint URL for fetch, axios.*, got, ky, superagent.*, requests.*. SDK chains like client.embeddings.create are unaffected (regex anchored).

The rendered evidence string still shows methodChain only — the bucketing key is internal.

Measurement (corpus v1, 7 fixtures)

Metric	Baseline	After PR-2	Δ
Detection precision	36.26%	36.26%	+0.00pp
Detection recall	48.53%	48.53%	+0.00pp
Provider attribution	82.14%	82.14%	+0.00pp
Finding precision	6.25%	9.09%	+2.84pp
Finding recall	33.33%	33.33%	+0.00pp

Per-detector:

Detector	Before (TP/FP/FN)	After (TP/FP/FN)	Note
`n_plus_one`	1 / 0 / 0	1 / 0 / 0	unchanged
`cache`	0 / 7 / 0	0 / 0 / 0	row removed from baseline (no emissions)
`batch`	0 / 7 / 1	0 / 9 / 1	pre-existing corpus drift (not caused by this PR; verified by re-running pre-PR build) — next C1 PR's target
`rate_limit`	0 / 1 / 0	0 / 1 / 0	unchanged
`unbatched_parallel`	0 / 0 / 1	0 / 0 / 1	unchanged

Plan predicted finding precision would land near 11% (1/9). Actual is 9.09% (1/11) only because batch FP count drifted upward 7 → 9 since the original baseline was set; the cache fix itself contributed exactly the predicted -7 emissions.

Tests

5 new TDD-style fixtures + tests in src/test/c1-pr2-cache-tightening.test.ts and src/test/fixtures/c1-pr2/:

#	Fixture	Direction	Asserts
1	`python_chat_completion.py`	negative	chat.completions.create → 0 cache findings
2	`python_real_read.py`	positive (recall)	`stripe.Customer.retrieve` → ≥1 cache finding
3	`ts_diff_fetch_urls.ts`	negative	three different fetch URLs → 0 cache findings
4	`ts_same_fetch_url.ts`	positive (recall)	two fetches to same URL → ≥1 cache finding
5	`python_post_embed.py`	negative	POST /v1/embed → 0 cache findings

All 352 tests pass. Per-type gate self-test verified end-to-end (faked baseline batch precision = 100%, run failed with findings[batch].precision: 100.0% → 0.0% (Δ -100.00pp)).

Acceptance criteria for issue #83 (part 2)

Python cache detector emits ZERO findings on the 6 listed FP call sites
AST cache detector emits ZERO findings on raw-fetch-elevenlabs/src/tts-service.ts:30
Synthetic positive-test fixtures still produce cache findings (recall preserved)
npm test passes (352/352)
npm run benchmark exits 0; per-type gate does not fail on cache
Global findingPrecision rises (6.25% → 9.09%, exceeds the +2pp threshold even with batch drift)
benchmark/baseline.json updated; cache row collapses to absent

Out of scope (next C1 PRs)

batch detector (now 9 FPs) — same conceptual fix shape (write-method/generative-method authority over chain/URL keyword inference). Separate PR.
rate_limit detector — defer until corpus has rate-limit positive cases.
unbatched_parallel ↔ concurrency_control corpus terminology mismatch — corpus follow-up.

Notes for reviewers

PYTHON_GENERATIVE_METHOD is intentionally broad; it includes bare-word alternatives like \binvoke\b, \brun\b, \bstream\b, \bmessages\b, \bgenerate\b. The plan accepts this trade-off (LLM invocation verbs > the rare false-suppression risk on unrelated SDKs). If a real over-suppression FN shows up in a future corpus expansion, the regex tightens to chain-anchored matches (\.invoke\(, etc.).
chainKey() falls back to chain-only when match.endpoint is null (e.g. unfolded dynamic URL). A2 const-folding handles most realistic cases; documented as acceptable failure mode.
dedupeFindings keys include type, so removing cache findings cannot unmask findings of any other type. The batch FP count drift between the original baseline and now is independent of this PR (re-verified by stashing the diff and re-running the CLI on main — same 9 batch findings).

Test plan

npm test — 352 PASS
npm run benchmark — exit 0, no regressions
Per-type gate self-test — fired correctly with synthetic baseline tweak
Manual CLI scan against each corpus fixture — confirmed cache emits zero findings everywhere

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Fixed cache detector incorrectly treating different API endpoints as redundant when using the same HTTP client library.
- Improved Python detector to correctly classify generative LLM calls as non-cacheable operations.
Tests
- Added comprehensive test coverage for cache detection across Python and TypeScript scenarios.
Documentation
- Updated calibration results with improved detection accuracy metrics.

Eliminates the cache detector's 100% false-positive rate on corpus v1 by tightening two distinct emission paths: Python (src/scanner/python-waste-detector.ts): - Reorder isReadLikeCall(): generative-method denylist + write-verb check now run BEFORE the cacheCapable shortcut, so chat.completions, messages, generate, etc. can never sneak through as "read-like" via fingerprint-default cacheCapable=true. - Add explicit-write-HTTP-method authority (POST/PUT/PATCH/DELETE): raw requests.post to URLs whose path contains a read keyword (e.g. /v1/embed) is no longer misclassified as cacheable. AST (src/ast/waste/cache-detector.ts): - Bucket redundancy by methodChain + endpoint URL for universal HTTP method names (fetch, axios.*, got, ky, superagent.*, requests.*). Three different fetch() URLs in one file no longer collapse into a single "fetch" bucket and trip the redundant-call signal. Evidence string still renders methodChain only. Measurement (benchmark/baseline.json, docs/accuracy/findings.md): - cache: 7 FP / 0 TP → 0 FP / 0 TP (row removed from baseline) - batch: 7 FP / 0 TP → 9 FP / 0 TP (pre-existing corpus drift, not caused by PR-2; next C1 PR tightens batch) - finding precision: 6.25% → 9.09% globally - All other per-detector metrics unchanged - No detection or recall regressions Tests (src/test/c1-pr2-cache-tightening.test.ts, fixtures/c1-pr2/): - 5 new test cases pinning both regression directions: chat-completion, POST /v1/embed, three-different-fetch-URLs (negative); stripe.retrieve and same-URL-fetch (positive recall preservation). - Total: 352 PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-13T23:04:11Z

Warning

Rate limit exceeded

@AndresL230 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 59 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 425cf8ad-cb33-4bcd-bbf3-a6794f94c72f

📥 Commits

Reviewing files that changed from the base of the PR and between 4f848e3 and b9c8b4c.

📒 Files selected for processing (3)

src/ast/waste/cache-detector.ts
src/test/c1-pr2-cache-tightening.test.ts
src/test/fixtures/c1-pr2/ts_get_post_same_url.ts

📝 Walkthrough

Walkthrough

This PR implements the second tightening phase (C1 PR-2) for the cache waste detector by fixing two distinct false-positive modes: Python generative/LLM endpoints incorrectly treated as cacheable reads, and TypeScript AST redundancy bucketing collapsing distinct HTTP endpoints by method chain alone. The changes include detection logic improvements, test fixtures, a comprehensive test suite, documentation of the investigation and approach, and updated benchmark baselines.

Changes

C1 PR-2 Cache Detector False Positive Fixes

Layer / File(s)	Summary
Python generative method filtering `src/scanner/python-waste-detector.ts`, `src/test/fixtures/c1-pr2/python_chat_completion.py`, `src/test/fixtures/c1-pr2/python_post_embed.py`, `src/test/fixtures/c1-pr2/python_real_read.py`	Added `PYTHON_GENERATIVE_METHOD` regex constant to recognize LLM/chat endpoints (chat, completions, message, generate, tts, stt, streaming), reordered `isReadLikeCall` logic to classify generative calls as non-cacheable before cache-capable inference, and provided three Python fixtures: chat completion (negative), POST to embed endpoint with read-keyword URL (negative), and Stripe customer retrieval (positive).
TypeScript HTTP URL-aware redundancy bucketing `src/ast/waste/cache-detector.ts`, `src/test/fixtures/c1-pr2/ts_diff_fetch_urls.ts`, `src/test/fixtures/c1-pr2/ts_same_fetch_url.ts`	Introduced `UNIVERSAL_HTTP_CHAIN` regex and `chainKey()` helper to scope redundancy detection by both method chain and endpoint URL, updated redundancy counting in `detectCacheWaste()` and occurrences lookup to use URL-scoped keys, preventing distinct endpoints sharing a universal HTTP client from false positive matching. Added five TypeScript fixtures: three functions fetching distinct URLs (`listProducts`, `listCustomers`, `listOrders`) and two functions fetching the same URL (`loadOnce`, `loadAgain`).
Test infrastructure and wiring `src/test/c1-pr2-cache-tightening.test.ts`, `package.json`	Added new TypeScript test file with WASM directory setup, fixture access helper, and `run()` utility that scans four fixture sets and asserts expected cache finding counts: zero for chat completion, at least one for GET-shaped reads, zero for distinct URLs, at least one for duplicate same-URL fetches, and zero for POST-with-read-keyword. Wired test into `test:scanner` npm script.
PR-2 design and acceptance criteria `docs/superpowers/plans/2026-05-13-c1-pr2-cache-detector-tightening.md`	Comprehensive plan document recording investigation of two false-positive modes, enumerating affected code locations and suspected root causes, defining PR-2 acceptance criteria (zero specific findings while preserving synthetic positive cases), and specifying four TDD-oriented task checklists (C1-PR2.1–C1-PR2.4) with implementation details, self-review matrix, and execution handoff.
Metrics baseline and findings documentation `benchmark/baseline.json`, `docs/accuracy/findings.md`	Updated benchmark baseline to remove cache metrics entry, adjust batch false positives, and modify overall precision/recall. Revised findings table and acceptance-criteria text to reflect new corpus measurements showing `cache` and `n_plus_one` detectors now pass FPR threshold while `batch` and `rate_limit` remain above threshold; retained `concurrency_control` mismatch note.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through detectors so fine,
Filtering chat clouds from reads that align,
URLs now scoped in each bucket's care,
No more false cache claims floating in air!
Tests pass, baselines dance—the fixture files cheer! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of `#83`)' directly and clearly describes the primary change: tightening the cache detector to eliminate false positives, which is the main objective throughout the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/c1-pr2-cache-detector

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ast/waste/cache-detector.ts`:
- Around line 63-68: chainKey currently builds redundancy buckets for
UNIVERSAL_HTTP_CHAIN using only match.methodChain and match.endpoint, which
collapses different HTTP methods (e.g., GET vs POST) for the same endpoint;
update chainKey to include match.method in the key for universal chains so
method-specific calls are separated (use match.method along with
match.methodChain and match.endpoint in the returned string), and apply the same
change to the other similar key constructions referenced by the reviewer (the
same logic used around the blocks identified by the reviewer).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e6a3e024-be7c-47d9-82c1-63511ead551e

📥 Commits

Reviewing files that changed from the base of the PR and between 6647746 and 4f848e3.

📒 Files selected for processing (12)

benchmark/baseline.json
docs/accuracy/findings.md
docs/superpowers/plans/2026-05-13-c1-pr2-cache-detector-tightening.md
package.json
src/ast/waste/cache-detector.ts
src/scanner/python-waste-detector.ts
src/test/c1-pr2-cache-tightening.test.ts
src/test/fixtures/c1-pr2/python_chat_completion.py
src/test/fixtures/c1-pr2/python_post_embed.py
src/test/fixtures/c1-pr2/python_real_read.py
src/test/fixtures/c1-pr2/ts_diff_fetch_urls.ts
src/test/fixtures/c1-pr2/ts_same_fetch_url.ts

…al HTTP chains Addresses CodeRabbit review on #108. chainCount is built from all matches (write-likes are skipped only later, at the per-match emission loop), so a sibling POST /x mutation would inflate the same bucket as a GET /x and falsely mark the GET as redundant. Including match.method in the bucket key keeps GET and POST traffic to the same URL in separate buckets. Adds a regression fixture (`ts_get_post_same_url.ts` — one fetch GET and one fetch POST to the same URL) and the corresponding test (6th c1-pr2 case). 353/353 tests pass; benchmark unchanged (cache still 0/0/0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AndresL230 · 2026-05-13T23:17:10Z

Thanks @coderabbitai — valid catch. Pushed b9c8b4c:

chainCount is built from all matches (write-likes are filtered later, only at the per-match emission loop). Without HTTP method in the key, a POST /x mutation inflated the bucket and falsely marked a sibling GET /x as redundant. chainKey() now includes match.method.toUpperCase() for universal HTTP chains, plus a regression fixture (ts_get_post_same_url.ts) and a 6th test case asserting the GET stays clean.

Tests: 353/353 PASS. Benchmark unchanged (cache still 0/0/0).

coderabbitai · 2026-05-13T23:17:32Z

Tip

For best results, initiate chat on the files or code changes.

@AndresL230 Perfect fix — including match.method.toUpperCase() in the chainKey() for universal HTTP chains is exactly the right approach. The POST /x + GET /x case is a classic bucketing pitfall and the regression fixture ensures it stays caught. Thanks for the clean follow-up!

(　´◡` ) 🐇 ✨

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/ast/waste/cache-detector.ts

AndresL230 merged commit 3b13ffe into main May 13, 2026
3 checks passed

This was referenced May 13, 2026

fix(detection): C1 PR-3 — tighten batch detector (closes part 3 of #83) #109

Merged

[Findings] Tighten CACHE_GUARD / BATCH_GUARD bare-word leak #112

Open

AndresL230 deleted the claude/c1-pr2-cache-detector branch May 22, 2026 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)#108

fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)#108
AndresL230 merged 2 commits into
mainfrom
claude/c1-pr2-cache-detector

AndresL230 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

AndresL230 commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AndresL230 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Python detector (src/scanner/python-waste-detector.ts)

AST detector (src/ast/waste/cache-detector.ts)

Measurement (corpus v1, 7 fixtures)

Tests

Acceptance criteria for issue #83 (part 2)

Out of scope (next C1 PRs)

Notes for reviewers

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndresL230 commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AndresL230 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Python detector (`src/scanner/python-waste-detector.ts`)

AST detector (`src/ast/waste/cache-detector.ts`)

coderabbitai Bot commented May 13, 2026 •

edited

Loading