johnnichev · johnnichev · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,54 @@ All notable changes to selectools will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.17.5] - 2026-03-23
+
+### Fixed — Bug Hunt (91 validated fixes across 7 subsystems)
+
+#### Critical (13)
+- **Path traversal in `JsonFileSessionStore`** — session IDs now validated against directory escape
+- **Unicode homoglyph bypass** in prompt injection screening — NFKD normalization + zero-width stripping
+- **`FallbackProvider` stream** records success after consumption, not before — circuit breaker works for streaming
+- **Gemini `response.text` ValueError** on tool-call-only responses — caught and handled
+- **`astream()` model_selector** was using `self.config.model` — now uses `self._effective_model`
+- **Sync `_check_policy`** silently approved async `confirm_action` — now rejects with clear error
+- **`aexecute()` ThreadPoolExecutor per call** — replaced with shared executor via `run_in_executor(None)`
+- **`execute()` on async tools** returned coroutine string repr — now awaits via `asyncio.run`
+- **Hybrid search O(n²)** `_find_matching_key` — replaced with O(1) `text_to_key` dict lookup
+- **`SQLiteVectorStore`** no thread safety — added `threading.Lock` + WAL mode
+- **`FileKnowledgeStore._save_all()`** not crash-safe — atomic write via tmp + `os.replace`
+- **`OutputEvaluator`** crashed on invalid regex — wrapped in `try/except re.error`
+- **`JsonValidityEvaluator`** ignored `expect_json=False` — guard now checks falsy, not just None
+
+#### High (26)
+- **`astream()` cancellation/budget paths** now build proper trace steps + fire async observer events
+- **`arun()` early exits** now fire `_anotify_observers("on_run_end")` for cancel/budget/max-iter
+- **`_aexecute_tools_parallel`** fires async observer events + tracks `tool_usage`/`tool_tokens`
+- **Sync `_streaming_call`** no longer stringifies `ToolCall` objects (pitfall #2)
+- **16 LLM evaluators** silently passed on unparseable scores — now return `EvalFailure`
+- **XSS in eval dashboard** — `innerHTML` replaced with `createElement`/`textContent`
+- **Donut SVG 360° arc** renders nothing — now draws two semicircles for full annulus
+- **SSN regex** matched ZIP+4 codes — now requires consistent separators
+- **Coherence LLM costs** tracked in `CoherenceResult.usage` + merged into agent usage
+- **Coherence `fail_closed`** option added (default: fail-open for backward compat)
+- Plus 16 more HIGH fixes across tools, RAG, memory, and security subsystems
+
+#### Medium (30) and Low (22)
+- `datetime.utcnow()` → `datetime.now(timezone.utc)` throughout knowledge stores
+- `ConversationMemory.clear()` now resets `_summary`
+- SQLite WAL mode + indexes for knowledge and session stores
+- Non-deterministic `hash()` → `hashlib.sha256` for document IDs in 3 vector stores
+- OpenAI `embed_texts()` batching at 2048 per request
+- Tool result caching: `_serialize_result` returns `""` for None, not `"None"`
+- Bool values rejected for int/float tool parameters
+- `ToolRegistry.tool()` now forwards `screen_output`, `terminal`, `requires_approval`
+- Plus 40+ more fixes (see `.private/BUG_HUNT_VALIDATED.md` for complete list)
+
+### Added
+- **Async guardrails** — `Guardrail.acheck()` with `asyncio.to_thread` default, `GuardrailsPipeline.acheck_input()`/`acheck_output()`, `Agent._arun_input_guardrails()`. `arun()`/`astream()` no longer block the event loop during guardrail checks.
+- 40 new regression tests covering all critical and high-severity fixes
+- 5 new entries in CLAUDE.md Common Pitfalls (#14-#18)
+
 ## [0.17.4] - 2026-03-22
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -94,7 +94,7 @@ src/selectools/
     ├── junit.py             # JUnit XML for CI
     └── __main__.py          # CLI: python -m selectools.evals
 
-tests/                       # 2113 tests (unit, integration, regression, E2E)
+tests/                       # 2183 tests (unit, integration, regression, E2E)
 ├── agent/                   # Agent core tests
 ├── providers/               # Provider-specific tests
 ├── rag/                     # RAG pipeline tests
@@ -304,6 +304,16 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
 
 13. **Hooks are deprecated — use observers**: `AgentConfig.hooks` (a plain dict of callbacks) is deprecated. Passing `hooks` emits a `DeprecationWarning` and internally wraps the dict via `_HooksAdapter(AgentObserver)`. New code should always use `AgentObserver` or `AsyncAgentObserver` instead.
 
+14. **FallbackProvider `stream()` / `astream()` must record success AFTER consumption**: The generator must be fully consumed before calling `_record_success()`. Recording before consumption means the circuit breaker never trips on streaming errors. Fixed in v0.17.5.
+
+15. **`astream()` direct provider calls must use `self._effective_model`**: Unlike `run()`/`arun()` which go through `_call_provider`/`_acall_provider`, `astream()` calls providers directly. All model references in `astream()` must use `self._effective_model`, not `self.config.model`.
+
+16. **Async observer events must fire in all exit paths**: The shared `_build_cancelled_result`, `_build_budget_exceeded_result`, and `_build_max_iterations_result` only fire sync observers. In `arun()`/`astream()`, always add `await self._anotify_observers(...)` after calling these helpers.
+
+17. **`datetime.utcnow()` is deprecated — use `datetime.now(timezone.utc)`**: All datetime defaults in dataclasses must use `field(default_factory=lambda: datetime.now(timezone.utc))`, not `default_factory=datetime.utcnow`. The `is_expired` property and pruning code must also use aware datetimes.
+
+18. **Guardrails have async support**: `Guardrail.acheck()` runs sync `check()` via `asyncio.to_thread` by default. `GuardrailsPipeline` has `acheck_input()`/`acheck_output()`. `arun()`/`astream()` use `_arun_input_guardrails()` with `skip_guardrails=True` in `_prepare_run()` to avoid blocking the event loop.
+
 ## Current Roadmap
 
 - **v0.15.0** ✅ Enterprise Reliability (guardrails, audit, screening, coherence)
@@ -319,7 +329,7 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
 - **v0.17.1** ✅ MCP Client/Server — MCPClient, mcp_tools(), MCPServer, MultiMCPClient, circuit breaker
 - **v0.17.3** ✅ Agent Runtime Controls — token budget, cancellation, cost attribution, structured results, approval gate, SimpleStepObserver
 - **v0.17.4** ✅ Agent Intelligence — token estimation, model switching, knowledge memory enhancement (4 store backends)
-- **v0.17.5** 🟡 Tech Debt & Quick Wins — bug fixes, ReAct/CoT strategies, tool result caching, Python 3.9–3.13 CI
+- **v0.17.5** ✅ Bug Hunt & Async Guardrails — 91 validated fixes, async guardrails, 40 regression tests
 - **v0.17.6** 🟡 Caching & Context — semantic caching, prompt compression, conversation branching
 - **v0.18.0** 🟡 Multi-Agent Orchestration — see `MULTI_AGENT_PLAN.md`
 - **v0.18.x** 🟡 Composability Layer — Pipeline with `@step` + `|` operator (LCEL alternative)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,7 +3,7 @@
 Thank you for your interest in contributing to Selectools! We welcome contributions from the community.
 
 **Current Version:** v0.17.4
-**Test Status:** 2113 tests passing (100%)
+**Test Status:** 2183 tests passing (100%)
 **Python:** 3.13+
 
 ## Getting Started
@@ -74,7 +74,7 @@ Similar to `npm run` scripts, here are the common commands for this project:
 ### Testing
 
 ```bash
-# Run all tests (2113 tests)
+# Run all tests (2183 tests)
 pytest tests/ -v
 
 # Run tests quietly (summary only)
@@ -264,7 +264,7 @@ selectools/
 │   ├── embeddings/             # Embedding providers
 │   ├── rag/                    # RAG: vector stores, chunking, loaders
 │   └── toolbox/                # 24 pre-built tools
-├── tests/                      # Test suite (2113 tests)
+├── tests/                      # Test suite (2183 tests)
 │   ├── agent/                  # Agent tests
 │   ├── rag/                    # RAG tests
 │   ├── tools/                  # Tool tests
@@ -370,7 +370,7 @@ We especially welcome contributions in these areas:
 - Add comparison guides (vs LangChain, LlamaIndex)
 
 ### 🧪 **Testing**
-- Increase test coverage (currently 2113 tests passing!)
+- Increase test coverage (currently 2183 tests passing!)
 - Add performance benchmarks
 - Improve E2E test stability with retry/rate-limit handling
 

diff --git a/README.md b/README.md
@@ -171,7 +171,7 @@ report.to_html("report.html")
 - **49 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
 - **Built-in Eval Framework**: 39 evaluators (21 deterministic + 18 LLM-as-judge), A/B testing, regression detection, HTML reports, JUnit XML, snapshot testing
 - **AgentObserver Protocol**: 31 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
-- **2113 Tests**: Unit, integration, regression, and E2E with real API calls
+- **2183 Tests**: Unit, integration, regression, and E2E with real API calls
 
 ## Install
 
@@ -740,7 +740,7 @@ pytest tests/ -x -q          # All tests
 pytest tests/ -k "not e2e"   # Skip E2E (no API keys needed)
 ```
 
-2082 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
+2183 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
 
 ## License
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -208,9 +208,9 @@ v0.17.3  ✅ Agent Runtime Controls
 v0.17.4  ✅ Agent Intelligence
          Token estimation → Model switching → Knowledge memory enhancement (4 store backends)
 
-v0.17.5  🟡 Tech Debt & Quick Wins
-         Stream fallback fix → abatch thread safety → async guardrails → ReAct/CoT strategies
-         → Tool result caching → Python 3.9–3.13 CI matrix
+v0.17.5  ✅ Bug Hunt & Async Guardrails
+         91 validated fixes (13 critical, 26 high, 52 medium+low) → Async guardrails
+         → 40 regression tests → 5 new Common Pitfalls
 
 v0.17.6  🟡 Caching & Context
          Semantic caching → Prompt compression → Conversation branching

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -5,6 +5,54 @@ All notable changes to selectools will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.17.5] - 2026-03-23
+
+### Fixed — Bug Hunt (91 validated fixes across 7 subsystems)
+
+#### Critical (13)
+- **Path traversal in `JsonFileSessionStore`** — session IDs now validated against directory escape
+- **Unicode homoglyph bypass** in prompt injection screening — NFKD normalization + zero-width stripping
+- **`FallbackProvider` stream** records success after consumption, not before — circuit breaker works for streaming
+- **Gemini `response.text` ValueError** on tool-call-only responses — caught and handled
+- **`astream()` model_selector** was using `self.config.model` — now uses `self._effective_model`
+- **Sync `_check_policy`** silently approved async `confirm_action` — now rejects with clear error
+- **`aexecute()` ThreadPoolExecutor per call** — replaced with shared executor via `run_in_executor(None)`
+- **`execute()` on async tools** returned coroutine string repr — now awaits via `asyncio.run`
+- **Hybrid search O(n²)** `_find_matching_key` — replaced with O(1) `text_to_key` dict lookup
+- **`SQLiteVectorStore`** no thread safety — added `threading.Lock` + WAL mode
+- **`FileKnowledgeStore._save_all()`** not crash-safe — atomic write via tmp + `os.replace`
+- **`OutputEvaluator`** crashed on invalid regex — wrapped in `try/except re.error`
+- **`JsonValidityEvaluator`** ignored `expect_json=False` — guard now checks falsy, not just None
+
+#### High (26)
+- **`astream()` cancellation/budget paths** now build proper trace steps + fire async observer events
+- **`arun()` early exits** now fire `_anotify_observers("on_run_end")` for cancel/budget/max-iter
+- **`_aexecute_tools_parallel`** fires async observer events + tracks `tool_usage`/`tool_tokens`
+- **Sync `_streaming_call`** no longer stringifies `ToolCall` objects (pitfall #2)
+- **16 LLM evaluators** silently passed on unparseable scores — now return `EvalFailure`
+- **XSS in eval dashboard** — `innerHTML` replaced with `createElement`/`textContent`
+- **Donut SVG 360° arc** renders nothing — now draws two semicircles for full annulus
+- **SSN regex** matched ZIP+4 codes — now requires consistent separators
+- **Coherence LLM costs** tracked in `CoherenceResult.usage` + merged into agent usage
+- **Coherence `fail_closed`** option added (default: fail-open for backward compat)
+- Plus 16 more HIGH fixes across tools, RAG, memory, and security subsystems
+
+#### Medium (30) and Low (22)
+- `datetime.utcnow()` → `datetime.now(timezone.utc)` throughout knowledge stores
+- `ConversationMemory.clear()` now resets `_summary`
+- SQLite WAL mode + indexes for knowledge and session stores
+- Non-deterministic `hash()` → `hashlib.sha256` for document IDs in 3 vector stores
+- OpenAI `embed_texts()` batching at 2048 per request
+- Tool result caching: `_serialize_result` returns `""` for None, not `"None"`
+- Bool values rejected for int/float tool parameters
+- `ToolRegistry.tool()` now forwards `screen_output`, `terminal`, `requires_approval`
+- Plus 40+ more fixes (see `.private/BUG_HUNT_VALIDATED.md` for complete list)
+
+### Added
+- **Async guardrails** — `Guardrail.acheck()` with `asyncio.to_thread` default, `GuardrailsPipeline.acheck_input()`/`acheck_output()`, `Agent._arun_input_guardrails()`. `arun()`/`astream()` no longer block the event loop during guardrail checks.
+- 40 new regression tests covering all critical and high-severity fixes
+- 5 new entries in CLAUDE.md Common Pitfalls (#14-#18)
+
 ## [0.17.4] - 2026-03-22
 
 ### Added

diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
@@ -3,7 +3,7 @@
 Thank you for your interest in contributing to Selectools! We welcome contributions from the community.
 
 **Current Version:** v0.17.4
-**Test Status:** 2113 tests passing (100%)
+**Test Status:** 2183 tests passing (100%)
 **Python:** 3.13+
 
 ## Getting Started
@@ -74,7 +74,7 @@ Similar to `npm run` scripts, here are the common commands for this project:
 ### Testing
 
 ```bash
-# Run all tests (2113 tests)
+# Run all tests (2183 tests)
 pytest tests/ -v
 
 # Run tests quietly (summary only)
@@ -264,7 +264,7 @@ selectools/
 │   ├── embeddings/             # Embedding providers
 │   ├── rag/                    # RAG: vector stores, chunking, loaders
 │   └── toolbox/                # 24 pre-built tools
-├── tests/                      # Test suite (2113 tests)
+├── tests/                      # Test suite (2183 tests)
 │   ├── agent/                  # Agent tests
 │   ├── rag/                    # RAG tests
 │   ├── tools/                  # Tool tests
@@ -370,7 +370,7 @@ We especially welcome contributions in these areas:
 - Add comparison guides (vs LangChain, LlamaIndex)
 
 ### 🧪 **Testing**
-- Increase test coverage (currently 2113 tests passing!)
+- Increase test coverage (currently 2183 tests passing!)
 - Add performance benchmarks
 - Improve E2E test stability with retry/rate-limit handling
 

diff --git a/docs/index.md b/docs/index.md
@@ -139,7 +139,7 @@ print(result.reasoning)       # Why the agent chose get_weather
 | **AgentObserver Protocol** | 31-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
 | **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration |
 | **Eval Framework** | 39 built-in evaluators, A/B testing, regression detection, HTML reports, JUnit XML |
-| **2113 Tests** | Unit, integration, regression, and E2E |
+| **2183 Tests** | Unit, integration, regression, and E2E |
 
 ---
 

diff --git a/landing/index.html b/landing/index.html
@@ -89,7 +89,7 @@ <h1 class="text-5xl md:text-6xl font-extrabold leading-tight mb-6">
           <span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">Ollama</span>
           <span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">146 Models</span>
           <span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">39 Evaluators</span>
-          <span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">2113 Tests</span>
+          <span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">2183 Tests</span>
         </div>
       </div>
     </div>

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "selectools"
-version = "0.17.4"
+version = "0.17.5"
 description = "Production-ready AI agents with tool calling, structured output, execution traces, and RAG. Provider-agnostic (OpenAI, Anthropic, Gemini, Ollama) with fallback chains, batch processing, tool policies, streaming, caching, and cost tracking."
 readme = "README.md"
 requires-python = ">=3.9"

diff --git a/src/selectools/__init__.py b/src/selectools/__init__.py
@@ -1,6 +1,6 @@
 """Public exports for the selectools package."""
 
-__version__ = "0.17.4"
+__version__ = "0.17.5"
 
 # Import submodules (lazy loading for optional dependencies)
 from . import embeddings, evals, guardrails, models, rag, toolbox

diff --git a/src/selectools/agent/_provider_caller.py b/src/selectools/agent/_provider_caller.py
@@ -208,10 +208,10 @@ def _streaming_call(self, stream_handler: Optional[Callable[[str], None]] = None
             max_tokens=self.config.max_tokens,
             timeout=self.config.request_timeout,
         ):
-            if chunk:
-                aggregated.append(str(chunk))
+            if isinstance(chunk, str) and chunk:
+                aggregated.append(chunk)
                 if stream_handler:
-                    stream_handler(str(chunk))
+                    stream_handler(chunk)
 
         return "".join(aggregated)