test(round22): TD-192 live latency regression — fractal-entry 2 → 1 calls#32
Conversation
… → 1 Pin TD-192's LLM round-trip reduction with a real Ollama (qwen3:8b) test: spy LLMGateway counts complete() invocations during fractal-entry classification. Pre-TD-192 baseline = 2 calls/goal (bypass + output classifiers); post-TD-192 = 1 call/goal. Round 22 measured 2 LLM calls across 2 goals (artifact + text), saving 2 calls vs baseline. TD-191 guard re-verified end-to-end: artifact goal stays on fractal path, plain text Q&A still bypasses.
📝 WalkthroughWalkthroughThis PR adds a new Ollama-backed integration test suite that measures LLM call counts to validate TD-192 fractal-entry classification latency improvements. The tests confirm FractalBypassClassifier makes exactly one LLM call per goal and verify end-to-end call reduction matches the expected TD-192 target. ChangesTD-192 Call Count Reduction Test Suite
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integration/test_round22_td192_latency.py (1)
49-67: ⚡ Quick winAdd type annotations to test double.
The
_InMemoryCostRepoclass is missing type hints on therecordparameter and the_recordslist. Adding these would improve type safety and make the interface contract clearer.♻️ Proposed type annotations
+from typing import Any + class _InMemoryCostRepo: def __init__(self) -> None: - self._records: list = [] + self._records: list[Any] = [] - async def save(self, record) -> None: + async def save(self, record: Any) -> None: self._records.append(record)If the actual cost record type is available (e.g., from a domain model), use that instead of
Any.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_round22_td192_latency.py` around lines 49 - 67, The test double _InMemoryCostRepo should declare types for its internal list and the save parameter: annotate self._records as list[CostRecord] (or list[Any] if CostRecord isn't available) and change save(self, record) to save(self, record: CostRecord) -> None (or record: Any); add the appropriate typing import (from typing import Any) or the domain CostRecord import so the signatures on _InMemoryCostRepo, save, and the _records field clearly express the expected record type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/integration/test_round22_td192_latency.py`:
- Around line 49-67: The test double _InMemoryCostRepo should declare types for
its internal list and the save parameter: annotate self._records as
list[CostRecord] (or list[Any] if CostRecord isn't available) and change
save(self, record) to save(self, record: CostRecord) -> None (or record: Any);
add the appropriate typing import (from typing import Any) or the domain
CostRecord import so the signatures on _InMemoryCostRepo, save, and the _records
field clearly express the expected record type.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81545cb0-4417-4e18-9605-761722353db7
📒 Files selected for processing (1)
tests/integration/test_round22_td192_latency.py
Covers 33 commits since v0.6.1: - TD-194 Council Pilot full merge (#20) + 5 post-merge fix-ups (#22-#26) - TD-189 steps 1-4: per-task cache_hit_rate plumbing (#27-#30) - TD-192: fold OutputRequirementClassifier into FractalBypassClassifier (#31) - Round 22 live latency regression — fractal-entry 2 → 1 LLM calls (#32) - Haiku 4.5 cache threshold pinned at ~4096 tokens via --pad-entries (#33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
tests/integration/test_round22_td192_latency.pypins TD-192's LLM round-trip reduction.LLMGatewaycountscomplete()calls; runs against real Ollama (qwen3:8b).Live measurement (Round 22)
氷川神社のスライドを作ってWhat is 2+2?TD-191 guard re-verified
bypass=False).Test plan
pytest tests/integration/test_round22_td192_latency.py -v -s→ 3/3 PASS against real Ollama.ruff check tests/integration/test_round22_td192_latency.pyclean.🤖 Generated with Claude Code
Summary by CodeRabbit