Skip to content

fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)#108

Merged
AndresL230 merged 2 commits into
mainfrom
claude/c1-pr2-cache-detector
May 13, 2026
Merged

fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)#108
AndresL230 merged 2 commits into
mainfrom
claude/c1-pr2-cache-detector

Conversation

@AndresL230
Copy link
Copy Markdown
Contributor

@AndresL230 AndresL230 commented May 13, 2026

Summary

Eliminates the cache detector's 100% false-positive rate on corpus v1 (7 FPs / 0 TPs → 0 FPs / 0 TPs) without losing recall, by tightening two distinct emission paths.

Builds on #106 (per-detector measurement infrastructure). The per-type gate this PR exercises is what proves the change is real, not anecdotal.

What changed

Python detector (src/scanner/python-waste-detector.ts)

isReadLikeCall() reorder + add explicit-write-method authority. Three layered fixes for the 6 chat-completion + 1 raw-POST cases that were leaking through:

  1. Generative-method denylist runs first — chat completions, messages.create, responses.create, images.generate, TTS/STT, etc. are conceptually non-cacheable even when the fingerprint registry marks them cacheCapable: true. New PYTHON_GENERATIVE_METHOD regex.
  2. Write-verb check now runs before cacheCapablecreate/insert/update/etc. always win, so an SDK call like _client.chat.completions.create(...) can no longer be admitted as read-like via fingerprint default.
  3. Explicit POST/PUT/PATCH/DELETE → not read-like — covers raw requests.post("https://api.cohere.ai/v1/embed", ...) where the URL path contains a read keyword (embed) but the HTTP method is authoritative.

AST detector (src/ast/waste/cache-detector.ts)

URL-aware redundancy bucketing for universal HTTP method names. Three different fetch() URLs in one file (the elevenlabs FP shape) used to collapse into a single "fetch" bucket and trip the redundant signal. New chainKey(match) keys those by methodChain + endpoint URL for fetch, axios.*, got, ky, superagent.*, requests.*. SDK chains like client.embeddings.create are unaffected (regex anchored).

The rendered evidence string still shows methodChain only — the bucketing key is internal.

Measurement (corpus v1, 7 fixtures)

Metric Baseline After PR-2 Δ
Detection precision 36.26% 36.26% +0.00pp
Detection recall 48.53% 48.53% +0.00pp
Provider attribution 82.14% 82.14% +0.00pp
Finding precision 6.25% 9.09% +2.84pp
Finding recall 33.33% 33.33% +0.00pp

Per-detector:

Detector Before (TP/FP/FN) After (TP/FP/FN) Note
n_plus_one 1 / 0 / 0 1 / 0 / 0 unchanged
cache 0 / 7 / 0 0 / 0 / 0 row removed from baseline (no emissions)
batch 0 / 7 / 1 0 / 9 / 1 pre-existing corpus drift (not caused by this PR; verified by re-running pre-PR build) — next C1 PR's target
rate_limit 0 / 1 / 0 0 / 1 / 0 unchanged
unbatched_parallel 0 / 0 / 1 0 / 0 / 1 unchanged

Plan predicted finding precision would land near 11% (1/9). Actual is 9.09% (1/11) only because batch FP count drifted upward 7 → 9 since the original baseline was set; the cache fix itself contributed exactly the predicted -7 emissions.

Tests

5 new TDD-style fixtures + tests in src/test/c1-pr2-cache-tightening.test.ts and src/test/fixtures/c1-pr2/:

# Fixture Direction Asserts
1 python_chat_completion.py negative chat.completions.create → 0 cache findings
2 python_real_read.py positive (recall) stripe.Customer.retrieve → ≥1 cache finding
3 ts_diff_fetch_urls.ts negative three different fetch URLs → 0 cache findings
4 ts_same_fetch_url.ts positive (recall) two fetches to same URL → ≥1 cache finding
5 python_post_embed.py negative POST /v1/embed → 0 cache findings

All 352 tests pass. Per-type gate self-test verified end-to-end (faked baseline batch precision = 100%, run failed with findings[batch].precision: 100.0% → 0.0% (Δ -100.00pp)).

Acceptance criteria for issue #83 (part 2)

  • Python cache detector emits ZERO findings on the 6 listed FP call sites
  • AST cache detector emits ZERO findings on raw-fetch-elevenlabs/src/tts-service.ts:30
  • Synthetic positive-test fixtures still produce cache findings (recall preserved)
  • npm test passes (352/352)
  • npm run benchmark exits 0; per-type gate does not fail on cache
  • Global findingPrecision rises (6.25% → 9.09%, exceeds the +2pp threshold even with batch drift)
  • benchmark/baseline.json updated; cache row collapses to absent

Out of scope (next C1 PRs)

  • batch detector (now 9 FPs) — same conceptual fix shape (write-method/generative-method authority over chain/URL keyword inference). Separate PR.
  • rate_limit detector — defer until corpus has rate-limit positive cases.
  • unbatched_parallelconcurrency_control corpus terminology mismatch — corpus follow-up.

Notes for reviewers

  • PYTHON_GENERATIVE_METHOD is intentionally broad; it includes bare-word alternatives like \binvoke\b, \brun\b, \bstream\b, \bmessages\b, \bgenerate\b. The plan accepts this trade-off (LLM invocation verbs > the rare false-suppression risk on unrelated SDKs). If a real over-suppression FN shows up in a future corpus expansion, the regex tightens to chain-anchored matches (\.invoke\(, etc.).
  • chainKey() falls back to chain-only when match.endpoint is null (e.g. unfolded dynamic URL). A2 const-folding handles most realistic cases; documented as acceptable failure mode.
  • dedupeFindings keys include type, so removing cache findings cannot unmask findings of any other type. The batch FP count drift between the original baseline and now is independent of this PR (re-verified by stashing the diff and re-running the CLI on main — same 9 batch findings).

Test plan

  • npm test — 352 PASS
  • npm run benchmark — exit 0, no regressions
  • Per-type gate self-test — fired correctly with synthetic baseline tweak
  • Manual CLI scan against each corpus fixture — confirmed cache emits zero findings everywhere

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Fixed cache detector incorrectly treating different API endpoints as redundant when using the same HTTP client library.
    • Improved Python detector to correctly classify generative LLM calls as non-cacheable operations.
  • Tests

    • Added comprehensive test coverage for cache detection across Python and TypeScript scenarios.
  • Documentation

    • Updated calibration results with improved detection accuracy metrics.

Review Change Stack

Eliminates the cache detector's 100% false-positive rate on corpus v1
by tightening two distinct emission paths:

Python (src/scanner/python-waste-detector.ts):
- Reorder isReadLikeCall(): generative-method denylist + write-verb
  check now run BEFORE the cacheCapable shortcut, so chat.completions,
  messages, generate, etc. can never sneak through as "read-like" via
  fingerprint-default cacheCapable=true.
- Add explicit-write-HTTP-method authority (POST/PUT/PATCH/DELETE):
  raw requests.post to URLs whose path contains a read keyword
  (e.g. /v1/embed) is no longer misclassified as cacheable.

AST (src/ast/waste/cache-detector.ts):
- Bucket redundancy by methodChain + endpoint URL for universal HTTP
  method names (fetch, axios.*, got, ky, superagent.*, requests.*).
  Three different fetch() URLs in one file no longer collapse into a
  single "fetch" bucket and trip the redundant-call signal. Evidence
  string still renders methodChain only.

Measurement (benchmark/baseline.json, docs/accuracy/findings.md):
- cache:        7 FP / 0 TP → 0 FP / 0 TP (row removed from baseline)
- batch:        7 FP / 0 TP → 9 FP / 0 TP (pre-existing corpus drift,
                not caused by PR-2; next C1 PR tightens batch)
- finding precision: 6.25% → 9.09% globally
- All other per-detector metrics unchanged
- No detection or recall regressions

Tests (src/test/c1-pr2-cache-tightening.test.ts, fixtures/c1-pr2/):
- 5 new test cases pinning both regression directions: chat-completion,
  POST /v1/embed, three-different-fetch-URLs (negative); stripe.retrieve
  and same-URL-fetch (positive recall preservation).
- Total: 352 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@AndresL230 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 59 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 425cf8ad-cb33-4bcd-bbf3-a6794f94c72f

📥 Commits

Reviewing files that changed from the base of the PR and between 4f848e3 and b9c8b4c.

📒 Files selected for processing (3)
  • src/ast/waste/cache-detector.ts
  • src/test/c1-pr2-cache-tightening.test.ts
  • src/test/fixtures/c1-pr2/ts_get_post_same_url.ts
📝 Walkthrough

Walkthrough

This PR implements the second tightening phase (C1 PR-2) for the cache waste detector by fixing two distinct false-positive modes: Python generative/LLM endpoints incorrectly treated as cacheable reads, and TypeScript AST redundancy bucketing collapsing distinct HTTP endpoints by method chain alone. The changes include detection logic improvements, test fixtures, a comprehensive test suite, documentation of the investigation and approach, and updated benchmark baselines.

Changes

C1 PR-2 Cache Detector False Positive Fixes

Layer / File(s) Summary
Python generative method filtering
src/scanner/python-waste-detector.ts, src/test/fixtures/c1-pr2/python_chat_completion.py, src/test/fixtures/c1-pr2/python_post_embed.py, src/test/fixtures/c1-pr2/python_real_read.py
Added PYTHON_GENERATIVE_METHOD regex constant to recognize LLM/chat endpoints (chat, completions, message, generate, tts, stt, streaming), reordered isReadLikeCall logic to classify generative calls as non-cacheable before cache-capable inference, and provided three Python fixtures: chat completion (negative), POST to embed endpoint with read-keyword URL (negative), and Stripe customer retrieval (positive).
TypeScript HTTP URL-aware redundancy bucketing
src/ast/waste/cache-detector.ts, src/test/fixtures/c1-pr2/ts_diff_fetch_urls.ts, src/test/fixtures/c1-pr2/ts_same_fetch_url.ts
Introduced UNIVERSAL_HTTP_CHAIN regex and chainKey() helper to scope redundancy detection by both method chain and endpoint URL, updated redundancy counting in detectCacheWaste() and occurrences lookup to use URL-scoped keys, preventing distinct endpoints sharing a universal HTTP client from false positive matching. Added five TypeScript fixtures: three functions fetching distinct URLs (listProducts, listCustomers, listOrders) and two functions fetching the same URL (loadOnce, loadAgain).
Test infrastructure and wiring
src/test/c1-pr2-cache-tightening.test.ts, package.json
Added new TypeScript test file with WASM directory setup, fixture access helper, and run() utility that scans four fixture sets and asserts expected cache finding counts: zero for chat completion, at least one for GET-shaped reads, zero for distinct URLs, at least one for duplicate same-URL fetches, and zero for POST-with-read-keyword. Wired test into test:scanner npm script.
PR-2 design and acceptance criteria
docs/superpowers/plans/2026-05-13-c1-pr2-cache-detector-tightening.md
Comprehensive plan document recording investigation of two false-positive modes, enumerating affected code locations and suspected root causes, defining PR-2 acceptance criteria (zero specific findings while preserving synthetic positive cases), and specifying four TDD-oriented task checklists (C1-PR2.1–C1-PR2.4) with implementation details, self-review matrix, and execution handoff.
Metrics baseline and findings documentation
benchmark/baseline.json, docs/accuracy/findings.md
Updated benchmark baseline to remove cache metrics entry, adjust batch false positives, and modify overall precision/recall. Revised findings table and acceptance-criteria text to reflect new corpus measurements showing cache and n_plus_one detectors now pass FPR threshold while batch and rate_limit remain above threshold; retained concurrency_control mismatch note.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through detectors so fine,
Filtering chat clouds from reads that align,
URLs now scoped in each bucket's care,
No more false cache claims floating in air!
Tests pass, baselines dance—the fixture files cheer! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(detection): C1 PR-2 — tighten cache detector (closes part 2 of #83)' directly and clearly describes the primary change: tightening the cache detector to eliminate false positives, which is the main objective throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/c1-pr2-cache-detector

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ast/waste/cache-detector.ts`:
- Around line 63-68: chainKey currently builds redundancy buckets for
UNIVERSAL_HTTP_CHAIN using only match.methodChain and match.endpoint, which
collapses different HTTP methods (e.g., GET vs POST) for the same endpoint;
update chainKey to include match.method in the key for universal chains so
method-specific calls are separated (use match.method along with
match.methodChain and match.endpoint in the returned string), and apply the same
change to the other similar key constructions referenced by the reviewer (the
same logic used around the blocks identified by the reviewer).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e6a3e024-be7c-47d9-82c1-63511ead551e

📥 Commits

Reviewing files that changed from the base of the PR and between 6647746 and 4f848e3.

📒 Files selected for processing (12)
  • benchmark/baseline.json
  • docs/accuracy/findings.md
  • docs/superpowers/plans/2026-05-13-c1-pr2-cache-detector-tightening.md
  • package.json
  • src/ast/waste/cache-detector.ts
  • src/scanner/python-waste-detector.ts
  • src/test/c1-pr2-cache-tightening.test.ts
  • src/test/fixtures/c1-pr2/python_chat_completion.py
  • src/test/fixtures/c1-pr2/python_post_embed.py
  • src/test/fixtures/c1-pr2/python_real_read.py
  • src/test/fixtures/c1-pr2/ts_diff_fetch_urls.ts
  • src/test/fixtures/c1-pr2/ts_same_fetch_url.ts

Comment thread src/ast/waste/cache-detector.ts
…al HTTP chains

Addresses CodeRabbit review on #108. chainCount is built from all matches
(write-likes are skipped only later, at the per-match emission loop), so a
sibling POST /x mutation would inflate the same bucket as a GET /x and
falsely mark the GET as redundant. Including match.method in the bucket key
keeps GET and POST traffic to the same URL in separate buckets.

Adds a regression fixture (`ts_get_post_same_url.ts` — one fetch GET and one
fetch POST to the same URL) and the corresponding test (6th c1-pr2 case).
353/353 tests pass; benchmark unchanged (cache still 0/0/0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AndresL230
Copy link
Copy Markdown
Contributor Author

Thanks @coderabbitai — valid catch. Pushed b9c8b4c:

chainCount is built from all matches (write-likes are filtered later, only at the per-match emission loop). Without HTTP method in the key, a POST /x mutation inflated the bucket and falsely marked a sibling GET /x as redundant. chainKey() now includes match.method.toUpperCase() for universal HTTP chains, plus a regression fixture (ts_get_post_same_url.ts) and a 6th test case asserting the GET stays clean.

Tests: 353/353 PASS. Benchmark unchanged (cache still 0/0/0).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Tip

For best results, initiate chat on the files or code changes.

@AndresL230 Perfect fix — including match.method.toUpperCase() in the chainKey() for universal HTTP chains is exactly the right approach. The POST /x + GET /x case is a classic bucketing pitfall and the regression fixture ensures it stays caught. Thanks for the clean follow-up!

( ´◡` ) 🐇 ✨


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@AndresL230 AndresL230 merged commit 3b13ffe into main May 13, 2026
3 checks passed
@AndresL230 AndresL230 deleted the claude/c1-pr2-cache-detector branch May 22, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant