Skip to content

release: v0.19.1 — Advanced Agent Patterns + Quality Infrastructure#34

Merged
johnnichev merged 10 commits intomainfrom
fix/v0192-bug-hunt
Mar 30, 2026
Merged

release: v0.19.1 — Advanced Agent Patterns + Quality Infrastructure#34
johnnichev merged 10 commits intomainfrom
fix/v0192-bug-hunt

Conversation

@johnnichev
Copy link
Copy Markdown
Owner

Summary

  • 4 advanced agent patterns: PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent
  • 50 evaluators (39→50, +11 new)
  • Quality infrastructure: ralph loop skill, bandit CI, property tests, concurrency tests, simulations
  • Ralph loop: ~90 bugs fixed across 8 passes (2664→2918 tests)
  • Temperature=0.0 fallback for newer OpenAI models (gpt-4o-mini and newer)
  • RAG Phase 3 bug fixes (8 edge-case fixes, 28 regression tests)

Checklist

  • All tests pass (2918 non-e2e, 3094 with e2e)
  • Lint clean (black, isort, flake8, mypy)
  • Bandit scan clean (0 HIGH/CRITICAL)
  • Ralph loop clean (all 7 modules, pass 8)
  • Docs updated (README, ROADMAP, CHANGELOG, module docs, CLAUDE.md, CONTRIBUTING.md)
  • mkdocs build clean
  • Real API e2e evals passed

Critical:
- astream(): guard response_msg.content with `or ""` (pitfall #7)
- FileCheckpointStore.save(): atomic write via temp file + os.replace()
- llm_evaluators: fence all user-controlled content in judge prompts to prevent injection
- evals/regression.py, snapshot.py: atomic baseline/snapshot writes; sanitise suite_name path
- tools/base.py: shared module-level executor instead of per-call ThreadPoolExecutor
- rag/stores/memory.py: threading.Lock() on InMemoryVectorStore add/search/delete/clear

High:
- fallback.py: threading.Lock() on circuit breaker _failures/_circuit_open_until dicts
- memory.py: ConversationMemory.branch() deep-copies tool_calls to prevent shared mutation
- evals/report.py: fix p95/p99 off-by-one (use math.ceil, not int)
- tools/registry.py: warn on silent tool name collision instead of silently overwriting
- H5: policy.py — deny empty tool_name immediately instead of falling
  through pattern matching where fnmatch("","*") would match allow/deny/*
- H6: pii.py — extend SSN regex to detect space-separated format
  (123 45 6789 was not detected; only dash-separated and 9-digit bare)
- H9: decorators.py — _unwrap_type now handles Python 3.10+ X | None
  syntax (types.UnionType); previously str | None annotations raised
  ToolValidationError on Python 3.10/3.11/3.12/3.13
- CLAUDE.md — add pitfalls #19 (eval judge prompt injection fencing),
  #20 (ThreadPoolExecutor singleton), #21 (types.UnionType in tools)
SQLiteVectorStore (C1, C2, H1, L2):
- Lock now wraps the entire DB operation (connect → commit → close) in
  add_documents(), search(), delete(), clear(), _init_db(); previously
  only sqlite3.connect() was inside with self._lock, leaving cursors and
  commits unprotected under concurrent access
- IDs switched from sha256+batch-index to uuid4 — the old scheme produced
  identical IDs for the same document order across batches (silent overwrite)
  and different IDs for different orderings (phantom duplicates)
- Removed dead _connect() method whose comment falsely implied callers
  held the lock

loaders.py (H2, M3):
- pypdf page.extract_text() can return None for image/encrypted pages;
  changed text.strip() → (text or "").strip()
- recursive=False now strips ** from caller-supplied glob patterns that
  contain ** instead of silently recursing anyway

chroma.py (H3):
- chromadb.Client() removed in chromadb ≥ 0.4; changed to EphemeralClient()
- Updated test mock from Client to EphemeralClient

hybrid.py (H4, M6):
- Removed dead doc_scores[key] = 0.0 line before get(key, 0.0) + ...
  (the get default made the explicit assignment unreachable)
- Added ValueError for top_k < 1

bm25.py (M1):
- _score_document divided by _avg_doc_len which is 0.0 when only empty-text
  documents have been indexed; guarded with max(avg_doc_len, 1e-8)

stores/memory.py (M2):
- Capacity check (TOCTOU) moved inside with self._lock so the count read
  and the subsequent add are atomic

chunking.py (M4, M5, L1):
- RecursiveTextSplitter overlap now built from complete segments (walk
  backward through current_chunk) instead of raw character slice, so
  multi-char separators like "\n\n" are never split mid-sequence
- ContextualChunker escapes </document> and </chunk> in user content
  before interpolating into the LLM prompt template
- TextSplitter raises ValueError if length_function("a") != 1 so
  token-counting functions are caught at construction time
- Added .hypothesis/ to .gitignore to exclude Hypothesis test artifacts.
- Updated CHANGELOG.md to document the addition of new advanced agent patterns (PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent) and expanded evaluators in the eval framework, bringing the total to 50.
- Incremented version to 0.19.1 in pyproject.toml and __init__.py.
- Added new example scripts demonstrating the usage of the new agent patterns.
- Updated CHANGELOG.md to reflect the addition of new advanced agent patterns (PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent) and expanded evaluators in the eval framework, increasing the total to 50.
- Added new entry in mkdocs.yml for the Patterns module.
- Revised README.md to include new features and examples, and updated evaluator count.
- Incremented test count in documentation to 2664 and examples to 73.
- Updated ROADMAP.md to mark v0.19.1 as complete and highlight advanced agent patterns.
- Removed the provider parameter from the PlanAndExecuteAgent instantiation.
- Updated evaluator count to 11 new evaluators, including the addition of ForbiddenWordsEvaluator.
- Marked several features as complete, indicating progress in the development roadmap.
- Added a section detailing new quality infrastructure initiatives, including Ralph loop, Bandit in CI, and various testing enhancements.
…AG bug fixes

- Added new quality infrastructure initiatives including the Ralph loop, Bandit integration in CI, property-based tests, thread-safety smoke suite, and production simulations.
- Documented 8 edge-case bug fixes in the RAG subsystem, improving stability and error handling across various components.
- Updated test and example counts to reflect recent additions.
8-pass convergence system run across agent, providers, tools, rag, memory,
evals, security. All modules achieved clean pass on pass 8.

Key fixes:
- ThreadPoolExecutor singletons for parallel dispatch + timeout enforcement
- Async observer events on LLM cache hits in arun()/astream()
- Prompt injection fencing on all LLM evaluators + coherence judge
- Non-atomic file writes → tmp+replace (html, junit, snapshot, history)
- Path traversal in BaselineStore, SnapshotStore, HistoryStore, policy
- BM25.search() atomic snapshot under lock (concurrent access safety)
- TextSplitter infinite loop guard; RecursiveTextSplitter empty chunk filter
- Optional[List[T]] type unwrapping for tool parameters
- Tool None optional params no longer raise ToolValidationError
- Naive datetime normalization in 6+ knowledge/session locations
- FallbackProvider mid-stream corruption, _is_retriable word-boundary regex
- Gemini timeout parameter applied to all 4 methods
- KnowledgeGraph confidence=null silent triple discard fixed
- Policy from_dict type coercion and validation at construction time
- GuardrailsPipeline trace step false-negative on WARN/REWRITE actions
- compress_keep_recent=0 current message drop fixed
- 254 new regression tests

Test count: 2664 → 2918
… models

- _openai_compat: retry without temperature if API rejects it (gpt-4o-mini
  and newer models no longer accept temperature=0.0)
- test_regression: replace deprecated asyncio.get_event_loop() with asyncio.run()
- test_evals_e2e: update snapshot key assertion (greeting -> greeting_0)
  after pass-5 snapshot key uniqueness fix

All 3094 e2e tests pass.
@johnnichev johnnichev merged commit 7696b0b into main Mar 30, 2026
9 checks passed
@johnnichev johnnichev deleted the fix/v0192-bug-hunt branch March 30, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant