Skip to content

fix(detection): C1 PR-3 — tighten batch detector (closes part 3 of #83)#109

Merged
AndresL230 merged 1 commit into
mainfrom
claude/c1-pr3-batch-detector
May 14, 2026
Merged

fix(detection): C1 PR-3 — tighten batch detector (closes part 3 of #83)#109
AndresL230 merged 1 commit into
mainfrom
claude/c1-pr3-batch-detector

Conversation

@AndresL230
Copy link
Copy Markdown
Contributor

Summary

Eliminates 8 of 9 corpus batch false positives (9 → 1) by gating the sequential-batching detectors on enclosing function. Calls in different functions of the same file execute on independent code paths and cannot be batched together.

Builds on #106 (per-detector measurement) and #108 (cache detector). Same investigative shape, same review pattern.

What changed

TypeScript detector (src/ast/waste/batch-detector.ts)

detectSequential() now buckets by (provider, enclosingFunction) instead of provider alone:

  • Module-level calls bucket to \"<module>\" so they cluster only with other module-level calls.
  • The bucket Map carries provider as a struct field ({provider, matches}) instead of encoding it in the key — avoids a \"::\" delimiter footgun if a future provider id contains the separator. Per CodeRabbit-style code-quality review.
  • Within each bucket, dedupe by line: cross-file resolver expansion can produce multiple AstCallMatch entries at the same source line for one user-written call (e.g., bedrock-raw-fetch/src/index.ts where one await handleApi(...) resolves through 2 internal wrappers).
  • Evidence string lists unique sorted lines.

Python detector (src/scanner/python-waste-detector.ts)

detectSequentialBatching() applies the same (providerKey, enclosingFunction) bucketing, with the same struct-field pattern. The cluster.length >= 3 proximity threshold is preserved (Python doesn't have explicit await sequencing as an N+1 indicator, so it warrants a stronger signal than TS's >= 2).

Measurement (corpus v1, 7 fixtures)

Metric Baseline After PR-3 Δ
Detection precision 36.26% 36.26% +0.00pp
Detection recall 48.53% 48.53% +0.00pp
Provider attribution 82.14% 82.14% +0.00pp
Finding precision 9.09% 33.33% +24.24pp
Finding recall 33.33% 33.33% +0.00pp

Per-detector:

Detector Before (TP/FP/FN) After (TP/FP/FN) Note
n_plus_one 1 / 0 / 0 1 / 0 / 0 unchanged
cache 0 / 0 / 0 0 / 0 / 0 unchanged (PR-2 already cleared)
batch 0 / 9 / 1 0 / 1 / 1 TS detector: 7 → 1 FP. Python detector: 2 → 0 FP.
rate_limit 0 / 1 / 0 0 / 1 / 0 unchanged
unbatched_parallel 0 / 0 / 1 0 / 0 / 1 unchanged

The remaining batch FP is bedrock-raw-fetch/src/index.ts:5 — two sequential await handleApi(...) calls in the same main() function (lines 5 and 11). Arguably a true positive (Promise.all would parallelize them) that the corpus didn't label. Fixing it cleanly requires either control-flow-branch awareness or a corpus annotation update; out of scope for this PR.

Sample size for batch (TP+FP = 1) is below the per-type gate's ≥3 threshold, so any future regression will need 3+ emissions before tripping the gate. This is an intentional gate design choice (PR-1) to avoid noise on small samples.

Tests

4 new TDD-style fixtures + tests in src/test/c1-pr3-batch-tightening.test.ts and src/test/fixtures/c1-pr3/:

# Fixture Direction Asserts
1 ts_diff_functions.ts negative two openai calls in two different functions → 0 batch findings
2 ts_same_function.ts positive (recall) two openai calls in same function → ≥1 batch finding
3 py_diff_functions.py negative three anthropic calls in three different functions → 0 batch findings
4 py_same_function.py positive (recall) three anthropic calls in same function → ≥1 batch finding

All 357 tests pass.

Acceptance criteria for issue #83 (part 3)

  • TS detectSequential() emits ZERO findings on answer.ts, bedrock-client.ts, raw-fetch-client.ts, summarize.ts, tts-service.ts
  • Python detectSequentialBatching() emits ZERO findings on anthropic_helper.py and chat_completions_basic.py
  • Synthetic positive fixtures still produce batch findings (recall preserved)
  • npm test passes (357/357)
  • npm run benchmark exits 0; per-type gate doesn't fail
  • Global findingPrecision rises (9.09% → 33.33%)
  • benchmark/baseline.json updated; batch collapses to TP=0/FP=1/FN=1

Out of scope (next steps)

Notes for reviewers

  • enclosingFunction is string | null on AstCallMatch. Module-level calls (null) bucket to \"<module>\" rather than getting their own per-call buckets — desired behavior since two top-level awaits in a script DO live on the same execution path.
  • The line-dedup change reduces the displayed line count in the evidence string; existing ast-batch-detector.test.ts synthetic tests use enclosingFunction: null (module-level default) and don't assert on the exact line list, so they pass unchanged.
  • Bucket Map structure: Map<string, {provider: string; matches: AstCallMatch[]}> instead of Map<string, AstCallMatch[]> + key-split-on-"::". The structural pattern is what PR-2 review settled on for similar code; reused here.

Test plan

  • npm test — 357 PASS
  • npm run benchmark — exit 0, no regressions
  • Manual CLI scan against each corpus fixture — confirmed only bedrock-raw-fetch/src/index.ts:5 still emits

🤖 Generated with Claude Code

Eliminates 8 of 9 corpus batch false positives by gating both the TS
AST sequential-batching detector and the Python detectSequentialBatching
on enclosing function. Calls in different functions of the same file
execute on independent code paths and cannot be batched together.

TypeScript (src/ast/waste/batch-detector.ts):
- detectSequential() now buckets by (provider, enclosingFunction)
  instead of provider alone. Module-level calls bucket to "<module>"
  so they cluster only with other module-level calls.
- Within a bucket, dedupe by line — cross-file resolver expansion can
  produce multiple AstCallMatch entries at the same source line for one
  user-written call (e.g., bedrock-raw-fetch/src/index.ts where one
  await handleApi(...) resolves through 2 internal wrapper functions).
- Bucket Map carries the provider as a struct field rather than encoding
  it in the key string — avoids a "::" delimiter footgun if a future
  provider id contains the separator.
- Evidence string lists unique sorted lines.

Python (src/scanner/python-waste-detector.ts):
- detectSequentialBatching() applies the same (providerKey,
  enclosingFunction) bucketing. The cluster.length >= 3 proximity
  threshold is preserved (Python doesn't have explicit await sequencing
  as a clear N+1 indicator, so it warrants a stronger signal).

Measurement (benchmark/baseline.json, docs/accuracy/findings.md):
- batch:        9 FP / 0 TP / 1 FN → 1 FP / 0 TP / 1 FN
- finding precision: 9.09% → 33.33% globally (+24.24pp)
- All other per-detector metrics unchanged
- No detection or recall regressions
- Sample size for batch falls below the per-type gate's >=3 threshold,
  so the gate skips it (next regression on batch will need 3+ emissions
  before failing the build)

Tests (src/test/c1-pr3-batch-tightening.test.ts, fixtures/c1-pr3/):
- 4 new test cases pinning both regression directions for both
  languages: cross-function calls (-) and same-function calls (+).
- Total: 357 PASS.

Remaining 1 batch FP is bedrock-raw-fetch/src/index.ts:5 — two
sequential `await handleApi(...)` calls in `main()`. Arguably a true
positive (Promise.all would parallelize them) that the corpus didn't
label. Fixing it cleanly requires either control-flow-branch awareness
in the detector or a corpus annotation update; both are scoped to a
follow-up PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@AndresL230 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 34 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 17abab8b-32f6-4def-96e3-5ce161225a30

📥 Commits

Reviewing files that changed from the base of the PR and between 3b13ffe and dd657aa.

📒 Files selected for processing (11)
  • benchmark/baseline.json
  • docs/accuracy/findings.md
  • docs/superpowers/plans/2026-05-13-c1-pr3-batch-detector-tightening.md
  • package.json
  • src/ast/waste/batch-detector.ts
  • src/scanner/python-waste-detector.ts
  • src/test/c1-pr3-batch-tightening.test.ts
  • src/test/fixtures/c1-pr3/py_diff_functions.py
  • src/test/fixtures/c1-pr3/py_same_function.py
  • src/test/fixtures/c1-pr3/ts_diff_functions.ts
  • src/test/fixtures/c1-pr3/ts_same_function.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/c1-pr3-batch-detector

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AndresL230 AndresL230 merged commit 08eb32f into main May 14, 2026
3 checks passed
@AndresL230 AndresL230 deleted the claude/c1-pr3-batch-detector branch May 22, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant