release: v0.19.1 — Advanced Agent Patterns + Quality Infrastructure by johnnichev · Pull Request #34 · johnnichev/selectools

johnnichev · 2026-03-30T15:15:35Z

Summary

4 advanced agent patterns: PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent
50 evaluators (39→50, +11 new)
Quality infrastructure: ralph loop skill, bandit CI, property tests, concurrency tests, simulations
Ralph loop: ~90 bugs fixed across 8 passes (2664→2918 tests)
Temperature=0.0 fallback for newer OpenAI models (gpt-4o-mini and newer)
RAG Phase 3 bug fixes (8 edge-case fixes, 28 regression tests)

Checklist

All tests pass (2918 non-e2e, 3094 with e2e)
Lint clean (black, isort, flake8, mypy)
Bandit scan clean (0 HIGH/CRITICAL)
Ralph loop clean (all 7 modules, pass 8)
Docs updated (README, ROADMAP, CHANGELOG, module docs, CLAUDE.md, CONTRIBUTING.md)
mkdocs build clean
Real API e2e evals passed

Critical: - astream(): guard response_msg.content with `or ""` (pitfall #7) - FileCheckpointStore.save(): atomic write via temp file + os.replace() - llm_evaluators: fence all user-controlled content in judge prompts to prevent injection - evals/regression.py, snapshot.py: atomic baseline/snapshot writes; sanitise suite_name path - tools/base.py: shared module-level executor instead of per-call ThreadPoolExecutor - rag/stores/memory.py: threading.Lock() on InMemoryVectorStore add/search/delete/clear High: - fallback.py: threading.Lock() on circuit breaker _failures/_circuit_open_until dicts - memory.py: ConversationMemory.branch() deep-copies tool_calls to prevent shared mutation - evals/report.py: fix p95/p99 off-by-one (use math.ceil, not int) - tools/registry.py: warn on silent tool name collision instead of silently overwriting

- H5: policy.py — deny empty tool_name immediately instead of falling through pattern matching where fnmatch("","*") would match allow/deny/* - H6: pii.py — extend SSN regex to detect space-separated format (123 45 6789 was not detected; only dash-separated and 9-digit bare) - H9: decorators.py — _unwrap_type now handles Python 3.10+ X | None syntax (types.UnionType); previously str | None annotations raised ToolValidationError on Python 3.10/3.11/3.12/3.13 - CLAUDE.md — add pitfalls #19 (eval judge prompt injection fencing), #20 (ThreadPoolExecutor singleton), #21 (types.UnionType in tools)

SQLiteVectorStore (C1, C2, H1, L2): - Lock now wraps the entire DB operation (connect → commit → close) in add_documents(), search(), delete(), clear(), _init_db(); previously only sqlite3.connect() was inside with self._lock, leaving cursors and commits unprotected under concurrent access - IDs switched from sha256+batch-index to uuid4 — the old scheme produced identical IDs for the same document order across batches (silent overwrite) and different IDs for different orderings (phantom duplicates) - Removed dead _connect() method whose comment falsely implied callers held the lock loaders.py (H2, M3): - pypdf page.extract_text() can return None for image/encrypted pages; changed text.strip() → (text or "").strip() - recursive=False now strips ** from caller-supplied glob patterns that contain ** instead of silently recursing anyway chroma.py (H3): - chromadb.Client() removed in chromadb ≥ 0.4; changed to EphemeralClient() - Updated test mock from Client to EphemeralClient hybrid.py (H4, M6): - Removed dead doc_scores[key] = 0.0 line before get(key, 0.0) + ... (the get default made the explicit assignment unreachable) - Added ValueError for top_k < 1 bm25.py (M1): - _score_document divided by _avg_doc_len which is 0.0 when only empty-text documents have been indexed; guarded with max(avg_doc_len, 1e-8) stores/memory.py (M2): - Capacity check (TOCTOU) moved inside with self._lock so the count read and the subsequent add are atomic chunking.py (M4, M5, L1): - RecursiveTextSplitter overlap now built from complete segments (walk backward through current_chunk) instead of raw character slice, so multi-char separators like "\n\n" are never split mid-sequence - ContextualChunker escapes </document> and </chunk> in user content before interpolating into the LLM prompt template - TextSplitter raises ValueError if length_function("a") != 1 so token-counting functions are caught at construction time

- Added .hypothesis/ to .gitignore to exclude Hypothesis test artifacts. - Updated CHANGELOG.md to document the addition of new advanced agent patterns (PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent) and expanded evaluators in the eval framework, bringing the total to 50. - Incremented version to 0.19.1 in pyproject.toml and __init__.py. - Added new example scripts demonstrating the usage of the new agent patterns.

- Updated CHANGELOG.md to reflect the addition of new advanced agent patterns (PlanAndExecuteAgent, ReflectiveAgent, DebateAgent, TeamLeadAgent) and expanded evaluators in the eval framework, increasing the total to 50. - Added new entry in mkdocs.yml for the Patterns module. - Revised README.md to include new features and examples, and updated evaluator count. - Incremented test count in documentation to 2664 and examples to 73. - Updated ROADMAP.md to mark v0.19.1 as complete and highlight advanced agent patterns.

- Removed the provider parameter from the PlanAndExecuteAgent instantiation. - Updated evaluator count to 11 new evaluators, including the addition of ForbiddenWordsEvaluator. - Marked several features as complete, indicating progress in the development roadmap. - Added a section detailing new quality infrastructure initiatives, including Ralph loop, Bandit in CI, and various testing enhancements.

…AG bug fixes - Added new quality infrastructure initiatives including the Ralph loop, Bandit integration in CI, property-based tests, thread-safety smoke suite, and production simulations. - Documented 8 edge-case bug fixes in the RAG subsystem, improving stability and error handling across various components. - Updated test and example counts to reflect recent additions.

8-pass convergence system run across agent, providers, tools, rag, memory, evals, security. All modules achieved clean pass on pass 8. Key fixes: - ThreadPoolExecutor singletons for parallel dispatch + timeout enforcement - Async observer events on LLM cache hits in arun()/astream() - Prompt injection fencing on all LLM evaluators + coherence judge - Non-atomic file writes → tmp+replace (html, junit, snapshot, history) - Path traversal in BaselineStore, SnapshotStore, HistoryStore, policy - BM25.search() atomic snapshot under lock (concurrent access safety) - TextSplitter infinite loop guard; RecursiveTextSplitter empty chunk filter - Optional[List[T]] type unwrapping for tool parameters - Tool None optional params no longer raise ToolValidationError - Naive datetime normalization in 6+ knowledge/session locations - FallbackProvider mid-stream corruption, _is_retriable word-boundary regex - Gemini timeout parameter applied to all 4 methods - KnowledgeGraph confidence=null silent triple discard fixed - Policy from_dict type coercion and validation at construction time - GuardrailsPipeline trace step false-negative on WARN/REWRITE actions - compress_keep_recent=0 current message drop fixed - 254 new regression tests Test count: 2664 → 2918

… models - _openai_compat: retry without temperature if API rejects it (gpt-4o-mini and newer models no longer accept temperature=0.0) - test_regression: replace deprecated asyncio.get_event_loop() with asyncio.run() - test_evals_e2e: update snapshot key assertion (greeting -> greeting_0) after pass-5 snapshot key uniqueness fix All 3094 e2e tests pass.

johnnichev added 10 commits March 29, 2026 02:42

docs: update test counts (2529→2918) + mark v0.19.1 ✅ in CLAUDE.md

983246c

johnnichev merged commit 7696b0b into main Mar 30, 2026
9 checks passed

johnnichev deleted the fix/v0192-bug-hunt branch March 30, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.19.1 — Advanced Agent Patterns + Quality Infrastructure#34

release: v0.19.1 — Advanced Agent Patterns + Quality Infrastructure#34
johnnichev merged 10 commits intomainfrom
fix/v0192-bug-hunt

johnnichev commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant