test(round22): TD-192 live latency regression — fractal-entry 2 → 1 calls by engkimo · Pull Request #32 · engkimo/open-morphic

engkimo · 2026-05-15T09:05:36Z

Summary

New live E2E tests/integration/test_round22_td192_latency.py pins TD-192's LLM round-trip reduction.
Spy LLMGateway counts complete() calls; runs against real Ollama (qwen3:8b).
Pre-TD-192 baseline: 2 calls/goal (bypass classifier + output classifier).
Post-TD-192: 1 call/goal — 50% reduction at the fractal entry point.

Live measurement (Round 22)

Goal	calls	elapsed	bypass	output	complexity
`氷川神社のスライドを作って`	1	7.80s	False	file	medium
`What is 2+2?`	1	1.08s	True	text	simple
2-goal total	2	2.41s	—	—	(baseline 4, saved 2)

TD-191 guard re-verified

Artifact goal still takes fractal path (output != TEXT clamps bypass=False).
Plain text Q&A still bypasses (TD-167 latency win preserved).

Test plan

pytest tests/integration/test_round22_td192_latency.py -v -s → 3/3 PASS against real Ollama.
ruff check tests/integration/test_round22_td192_latency.py clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added integration tests verifying improved LLM call efficiency for classification workflows, with skip conditions for unavailable dependencies.

… → 1 Pin TD-192's LLM round-trip reduction with a real Ollama (qwen3:8b) test: spy LLMGateway counts complete() invocations during fractal-entry classification. Pre-TD-192 baseline = 2 calls/goal (bypass + output classifiers); post-TD-192 = 1 call/goal. Round 22 measured 2 LLM calls across 2 goals (artifact + text), saving 2 calls vs baseline. TD-191 guard re-verified end-to-end: artifact goal stays on fractal path, plain text Q&A still bypasses.

coderabbitai · 2026-05-15T09:05:49Z

📝 Walkthrough

Walkthrough

This PR adds a new Ollama-backed integration test suite that measures LLM call counts to validate TD-192 fractal-entry classification latency improvements. The tests confirm FractalBypassClassifier makes exactly one LLM call per goal and verify end-to-end call reduction matches the expected TD-192 target.

Changes

TD-192 Call Count Reduction Test Suite

Layer / File(s)	Summary
Test context and setup `tests/integration/test_round22_td192_latency.py`	Module docstring documents TD-192/TD-191 intent and expected call reduction target. Imports, Ollama CLI detection, pytest skip markers, and constants define pre/post TD-192 call counts for artifact and text goals.
Test infrastructure and fixtures `tests/integration/test_round22_td192_latency.py`	In-memory cost repository provides async cost tracking methods. _SpyLLMGateway wrapper increments call_count on each `complete()` call while delegating to real LiteLLMGateway. Module-scoped fixtures manage asyncio event loop, Ollama availability validation with qwen3 model check, and LiteLLMGateway construction with CostTracker and Settings.
Test cases for call count validation `tests/integration/test_round22_td192_latency.py`	TestTD192CallCountReduction runs three async tests: separate artifact goal and text Q&A goal tests each assert exactly one LLM call with correct bypass behavior and output requirements; `test_round22_summary` runs both goals end-to-end, measures elapsed time, asserts combined call count matches TD-192 reduced total, and prints baseline vs. saved call summary.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A spy counts calls with whispered care,
One per goal floats through the air,
TD-192's latency saved,
Integration tests now, bravely paved! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a test for TD-192 that verifies fractal-entry LLM call reduction from 2 to 1 calls.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/round22-td192-latency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/integration/test_round22_td192_latency.py (1)

49-67: ⚡ Quick win

Add type annotations to test double.

The _InMemoryCostRepo class is missing type hints on the record parameter and the _records list. Adding these would improve type safety and make the interface contract clearer.

♻️ Proposed type annotations

+from typing import Any
+
 class _InMemoryCostRepo:
     def __init__(self) -> None:
-        self._records: list = []
+        self._records: list[Any] = []
 
-    async def save(self, record) -> None:
+    async def save(self, record: Any) -> None:
         self._records.append(record)

If the actual cost record type is available (e.g., from a domain model), use that instead of Any.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_round22_td192_latency.py` around lines 49 - 67, The
test double _InMemoryCostRepo should declare types for its internal list and the
save parameter: annotate self._records as list[CostRecord] (or list[Any] if
CostRecord isn't available) and change save(self, record) to save(self, record:
CostRecord) -> None (or record: Any); add the appropriate typing import (from
typing import Any) or the domain CostRecord import so the signatures on
_InMemoryCostRepo, save, and the _records field clearly express the expected
record type.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/integration/test_round22_td192_latency.py`:
- Around line 49-67: The test double _InMemoryCostRepo should declare types for
its internal list and the save parameter: annotate self._records as
list[CostRecord] (or list[Any] if CostRecord isn't available) and change
save(self, record) to save(self, record: CostRecord) -> None (or record: Any);
add the appropriate typing import (from typing import Any) or the domain
CostRecord import so the signatures on _InMemoryCostRepo, save, and the _records
field clearly express the expected record type.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 81545cb0-4417-4e18-9605-761722353db7

📥 Commits

Reviewing files that changed from the base of the PR and between e7d4fc9 and d730396.

📒 Files selected for processing (1)

tests/integration/test_round22_td192_latency.py

Covers 33 commits since v0.6.1: - TD-194 Council Pilot full merge (#20) + 5 post-merge fix-ups (#22-#26) - TD-189 steps 1-4: per-task cache_hit_rate plumbing (#27-#30) - TD-192: fold OutputRequirementClassifier into FractalBypassClassifier (#31) - Round 22 live latency regression — fractal-entry 2 → 1 LLM calls (#32) - Haiku 4.5 cache threshold pinned at ~4096 tokens via --pad-entries (#33) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

engkimo merged commit 97f11a7 into main May 15, 2026
6 checks passed

engkimo deleted the test/round22-td192-latency branch May 15, 2026 09:08

engkimo mentioned this pull request May 18, 2026

release: v0.6.2 — council pilot merge + TD-189 plumbing + TD-192 latency cut #34

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(round22): TD-192 live latency regression — fractal-entry 2 → 1 calls#32

test(round22): TD-192 live latency regression — fractal-entry 2 → 1 calls#32
engkimo merged 1 commit into
mainfrom
test/round22-td192-latency

engkimo commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

engkimo commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Live measurement (Round 22)

TD-191 guard re-verified

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

engkimo commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading