Skip to content
Merged
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,54 @@ All notable changes to selectools will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.5] - 2026-03-23

### Fixed — Bug Hunt (91 validated fixes across 7 subsystems)

#### Critical (13)
- **Path traversal in `JsonFileSessionStore`** — session IDs now validated against directory escape
- **Unicode homoglyph bypass** in prompt injection screening — NFKD normalization + zero-width stripping
- **`FallbackProvider` stream** records success after consumption, not before — circuit breaker works for streaming
- **Gemini `response.text` ValueError** on tool-call-only responses — caught and handled
- **`astream()` model_selector** was using `self.config.model` — now uses `self._effective_model`
- **Sync `_check_policy`** silently approved async `confirm_action` — now rejects with clear error
- **`aexecute()` ThreadPoolExecutor per call** — replaced with shared executor via `run_in_executor(None)`
- **`execute()` on async tools** returned coroutine string repr — now awaits via `asyncio.run`
- **Hybrid search O(n²)** `_find_matching_key` — replaced with O(1) `text_to_key` dict lookup
- **`SQLiteVectorStore`** no thread safety — added `threading.Lock` + WAL mode
- **`FileKnowledgeStore._save_all()`** not crash-safe — atomic write via tmp + `os.replace`
- **`OutputEvaluator`** crashed on invalid regex — wrapped in `try/except re.error`
- **`JsonValidityEvaluator`** ignored `expect_json=False` — guard now checks falsy, not just None

#### High (26)
- **`astream()` cancellation/budget paths** now build proper trace steps + fire async observer events
- **`arun()` early exits** now fire `_anotify_observers("on_run_end")` for cancel/budget/max-iter
- **`_aexecute_tools_parallel`** fires async observer events + tracks `tool_usage`/`tool_tokens`
- **Sync `_streaming_call`** no longer stringifies `ToolCall` objects (pitfall #2)
- **16 LLM evaluators** silently passed on unparseable scores — now return `EvalFailure`
- **XSS in eval dashboard** — `innerHTML` replaced with `createElement`/`textContent`
- **Donut SVG 360° arc** renders nothing — now draws two semicircles for full annulus
- **SSN regex** matched ZIP+4 codes — now requires consistent separators
- **Coherence LLM costs** tracked in `CoherenceResult.usage` + merged into agent usage
- **Coherence `fail_closed`** option added (default: fail-open for backward compat)
- Plus 16 more HIGH fixes across tools, RAG, memory, and security subsystems

#### Medium (30) and Low (22)
- `datetime.utcnow()` → `datetime.now(timezone.utc)` throughout knowledge stores
- `ConversationMemory.clear()` now resets `_summary`
- SQLite WAL mode + indexes for knowledge and session stores
- Non-deterministic `hash()` → `hashlib.sha256` for document IDs in 3 vector stores
- OpenAI `embed_texts()` batching at 2048 per request
- Tool result caching: `_serialize_result` returns `""` for None, not `"None"`
- Bool values rejected for int/float tool parameters
- `ToolRegistry.tool()` now forwards `screen_output`, `terminal`, `requires_approval`
- Plus 40+ more fixes (see `.private/BUG_HUNT_VALIDATED.md` for complete list)

### Added
- **Async guardrails** — `Guardrail.acheck()` with `asyncio.to_thread` default, `GuardrailsPipeline.acheck_input()`/`acheck_output()`, `Agent._arun_input_guardrails()`. `arun()`/`astream()` no longer block the event loop during guardrail checks.
- 40 new regression tests covering all critical and high-severity fixes
- 5 new entries in CLAUDE.md Common Pitfalls (#14-#18)

## [0.17.4] - 2026-03-22

### Added
Expand Down
14 changes: 12 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ src/selectools/
├── junit.py # JUnit XML for CI
└── __main__.py # CLI: python -m selectools.evals

tests/ # 2113 tests (unit, integration, regression, E2E)
tests/ # 2183 tests (unit, integration, regression, E2E)
├── agent/ # Agent core tests
├── providers/ # Provider-specific tests
├── rag/ # RAG pipeline tests
Expand Down Expand Up @@ -304,6 +304,16 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:

13. **Hooks are deprecated — use observers**: `AgentConfig.hooks` (a plain dict of callbacks) is deprecated. Passing `hooks` emits a `DeprecationWarning` and internally wraps the dict via `_HooksAdapter(AgentObserver)`. New code should always use `AgentObserver` or `AsyncAgentObserver` instead.

14. **FallbackProvider `stream()` / `astream()` must record success AFTER consumption**: The generator must be fully consumed before calling `_record_success()`. Recording before consumption means the circuit breaker never trips on streaming errors. Fixed in v0.17.5.

15. **`astream()` direct provider calls must use `self._effective_model`**: Unlike `run()`/`arun()` which go through `_call_provider`/`_acall_provider`, `astream()` calls providers directly. All model references in `astream()` must use `self._effective_model`, not `self.config.model`.

16. **Async observer events must fire in all exit paths**: The shared `_build_cancelled_result`, `_build_budget_exceeded_result`, and `_build_max_iterations_result` only fire sync observers. In `arun()`/`astream()`, always add `await self._anotify_observers(...)` after calling these helpers.

17. **`datetime.utcnow()` is deprecated — use `datetime.now(timezone.utc)`**: All datetime defaults in dataclasses must use `field(default_factory=lambda: datetime.now(timezone.utc))`, not `default_factory=datetime.utcnow`. The `is_expired` property and pruning code must also use aware datetimes.

18. **Guardrails have async support**: `Guardrail.acheck()` runs sync `check()` via `asyncio.to_thread` by default. `GuardrailsPipeline` has `acheck_input()`/`acheck_output()`. `arun()`/`astream()` use `_arun_input_guardrails()` with `skip_guardrails=True` in `_prepare_run()` to avoid blocking the event loop.

## Current Roadmap

- **v0.15.0** ✅ Enterprise Reliability (guardrails, audit, screening, coherence)
Expand All @@ -319,7 +329,7 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
- **v0.17.1** ✅ MCP Client/Server — MCPClient, mcp_tools(), MCPServer, MultiMCPClient, circuit breaker
- **v0.17.3** ✅ Agent Runtime Controls — token budget, cancellation, cost attribution, structured results, approval gate, SimpleStepObserver
- **v0.17.4** ✅ Agent Intelligence — token estimation, model switching, knowledge memory enhancement (4 store backends)
- **v0.17.5** 🟡 Tech Debt & Quick Winsbug fixes, ReAct/CoT strategies, tool result caching, Python 3.9–3.13 CI
- **v0.17.5** ✅ Bug Hunt & Async Guardrails91 validated fixes, async guardrails, 40 regression tests
- **v0.17.6** 🟡 Caching & Context — semantic caching, prompt compression, conversation branching
- **v0.18.0** 🟡 Multi-Agent Orchestration — see `MULTI_AGENT_PLAN.md`
- **v0.18.x** 🟡 Composability Layer — Pipeline with `@step` + `|` operator (LCEL alternative)
Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Thank you for your interest in contributing to Selectools! We welcome contributions from the community.

**Current Version:** v0.17.4
**Test Status:** 2113 tests passing (100%)
**Test Status:** 2183 tests passing (100%)
**Python:** 3.13+

## Getting Started
Expand Down Expand Up @@ -74,7 +74,7 @@ Similar to `npm run` scripts, here are the common commands for this project:
### Testing

```bash
# Run all tests (2113 tests)
# Run all tests (2183 tests)
pytest tests/ -v

# Run tests quietly (summary only)
Expand Down Expand Up @@ -264,7 +264,7 @@ selectools/
│ ├── embeddings/ # Embedding providers
│ ├── rag/ # RAG: vector stores, chunking, loaders
│ └── toolbox/ # 24 pre-built tools
├── tests/ # Test suite (2113 tests)
├── tests/ # Test suite (2183 tests)
│ ├── agent/ # Agent tests
│ ├── rag/ # RAG tests
│ ├── tools/ # Tool tests
Expand Down Expand Up @@ -370,7 +370,7 @@ We especially welcome contributions in these areas:
- Add comparison guides (vs LangChain, LlamaIndex)

### 🧪 **Testing**
- Increase test coverage (currently 2113 tests passing!)
- Increase test coverage (currently 2183 tests passing!)
- Add performance benchmarks
- Improve E2E test stability with retry/rate-limit handling

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ report.to_html("report.html")
- **49 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
- **Built-in Eval Framework**: 39 evaluators (21 deterministic + 18 LLM-as-judge), A/B testing, regression detection, HTML reports, JUnit XML, snapshot testing
- **AgentObserver Protocol**: 31 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
- **2113 Tests**: Unit, integration, regression, and E2E with real API calls
- **2183 Tests**: Unit, integration, regression, and E2E with real API calls

## Install

Expand Down Expand Up @@ -740,7 +740,7 @@ pytest tests/ -x -q # All tests
pytest tests/ -k "not e2e" # Skip E2E (no API keys needed)
```

2082 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
2183 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.

## License

Expand Down
6 changes: 3 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,9 +208,9 @@ v0.17.3 ✅ Agent Runtime Controls
v0.17.4 ✅ Agent Intelligence
Token estimation → Model switching → Knowledge memory enhancement (4 store backends)

v0.17.5 🟡 Tech Debt & Quick Wins
Stream fallback fix → abatch thread safety → async guardrails → ReAct/CoT strategies
Tool result cachingPython 3.9–3.13 CI matrix
v0.17.5 ✅ Bug Hunt & Async Guardrails
91 validated fixes (13 critical, 26 high, 52 medium+low) → Async guardrails
40 regression tests5 new Common Pitfalls

v0.17.6 🟡 Caching & Context
Semantic caching → Prompt compression → Conversation branching
Expand Down
48 changes: 48 additions & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,54 @@ All notable changes to selectools will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.5] - 2026-03-23

### Fixed — Bug Hunt (91 validated fixes across 7 subsystems)

#### Critical (13)
- **Path traversal in `JsonFileSessionStore`** — session IDs now validated against directory escape
- **Unicode homoglyph bypass** in prompt injection screening — NFKD normalization + zero-width stripping
- **`FallbackProvider` stream** records success after consumption, not before — circuit breaker works for streaming
- **Gemini `response.text` ValueError** on tool-call-only responses — caught and handled
- **`astream()` model_selector** was using `self.config.model` — now uses `self._effective_model`
- **Sync `_check_policy`** silently approved async `confirm_action` — now rejects with clear error
- **`aexecute()` ThreadPoolExecutor per call** — replaced with shared executor via `run_in_executor(None)`
- **`execute()` on async tools** returned coroutine string repr — now awaits via `asyncio.run`
- **Hybrid search O(n²)** `_find_matching_key` — replaced with O(1) `text_to_key` dict lookup
- **`SQLiteVectorStore`** no thread safety — added `threading.Lock` + WAL mode
- **`FileKnowledgeStore._save_all()`** not crash-safe — atomic write via tmp + `os.replace`
- **`OutputEvaluator`** crashed on invalid regex — wrapped in `try/except re.error`
- **`JsonValidityEvaluator`** ignored `expect_json=False` — guard now checks falsy, not just None

#### High (26)
- **`astream()` cancellation/budget paths** now build proper trace steps + fire async observer events
- **`arun()` early exits** now fire `_anotify_observers("on_run_end")` for cancel/budget/max-iter
- **`_aexecute_tools_parallel`** fires async observer events + tracks `tool_usage`/`tool_tokens`
- **Sync `_streaming_call`** no longer stringifies `ToolCall` objects (pitfall #2)
- **16 LLM evaluators** silently passed on unparseable scores — now return `EvalFailure`
- **XSS in eval dashboard** — `innerHTML` replaced with `createElement`/`textContent`
- **Donut SVG 360° arc** renders nothing — now draws two semicircles for full annulus
- **SSN regex** matched ZIP+4 codes — now requires consistent separators
- **Coherence LLM costs** tracked in `CoherenceResult.usage` + merged into agent usage
- **Coherence `fail_closed`** option added (default: fail-open for backward compat)
- Plus 16 more HIGH fixes across tools, RAG, memory, and security subsystems

#### Medium (30) and Low (22)
- `datetime.utcnow()` → `datetime.now(timezone.utc)` throughout knowledge stores
- `ConversationMemory.clear()` now resets `_summary`
- SQLite WAL mode + indexes for knowledge and session stores
- Non-deterministic `hash()` → `hashlib.sha256` for document IDs in 3 vector stores
- OpenAI `embed_texts()` batching at 2048 per request
- Tool result caching: `_serialize_result` returns `""` for None, not `"None"`
- Bool values rejected for int/float tool parameters
- `ToolRegistry.tool()` now forwards `screen_output`, `terminal`, `requires_approval`
- Plus 40+ more fixes (see `.private/BUG_HUNT_VALIDATED.md` for complete list)

### Added
- **Async guardrails** — `Guardrail.acheck()` with `asyncio.to_thread` default, `GuardrailsPipeline.acheck_input()`/`acheck_output()`, `Agent._arun_input_guardrails()`. `arun()`/`astream()` no longer block the event loop during guardrail checks.
- 40 new regression tests covering all critical and high-severity fixes
- 5 new entries in CLAUDE.md Common Pitfalls (#14-#18)

## [0.17.4] - 2026-03-22

### Added
Expand Down
8 changes: 4 additions & 4 deletions docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Thank you for your interest in contributing to Selectools! We welcome contributions from the community.

**Current Version:** v0.17.4
**Test Status:** 2113 tests passing (100%)
**Test Status:** 2183 tests passing (100%)
**Python:** 3.13+

## Getting Started
Expand Down Expand Up @@ -74,7 +74,7 @@ Similar to `npm run` scripts, here are the common commands for this project:
### Testing

```bash
# Run all tests (2113 tests)
# Run all tests (2183 tests)
pytest tests/ -v

# Run tests quietly (summary only)
Expand Down Expand Up @@ -264,7 +264,7 @@ selectools/
│ ├── embeddings/ # Embedding providers
│ ├── rag/ # RAG: vector stores, chunking, loaders
│ └── toolbox/ # 24 pre-built tools
├── tests/ # Test suite (2113 tests)
├── tests/ # Test suite (2183 tests)
│ ├── agent/ # Agent tests
│ ├── rag/ # RAG tests
│ ├── tools/ # Tool tests
Expand Down Expand Up @@ -370,7 +370,7 @@ We especially welcome contributions in these areas:
- Add comparison guides (vs LangChain, LlamaIndex)

### 🧪 **Testing**
- Increase test coverage (currently 2113 tests passing!)
- Increase test coverage (currently 2183 tests passing!)
- Add performance benchmarks
- Improve E2E test stability with retry/rate-limit handling

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ print(result.reasoning) # Why the agent chose get_weather
| **AgentObserver Protocol** | 31-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
| **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration |
| **Eval Framework** | 39 built-in evaluators, A/B testing, regression detection, HTML reports, JUnit XML |
| **2113 Tests** | Unit, integration, regression, and E2E |
| **2183 Tests** | Unit, integration, regression, and E2E |

---

Expand Down
2 changes: 1 addition & 1 deletion landing/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ <h1 class="text-5xl md:text-6xl font-extrabold leading-tight mb-6">
<span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">Ollama</span>
<span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">146 Models</span>
<span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">39 Evaluators</span>
<span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">2113 Tests</span>
<span class="pill text-brand-blue text-xs font-medium px-3 py-1 rounded-full">2183 Tests</span>
</div>
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "selectools"
version = "0.17.4"
version = "0.17.5"
description = "Production-ready AI agents with tool calling, structured output, execution traces, and RAG. Provider-agnostic (OpenAI, Anthropic, Gemini, Ollama) with fallback chains, batch processing, tool policies, streaming, caching, and cost tracking."
readme = "README.md"
requires-python = ">=3.9"
Expand Down
2 changes: 1 addition & 1 deletion src/selectools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Public exports for the selectools package."""

__version__ = "0.17.4"
__version__ = "0.17.5"

# Import submodules (lazy loading for optional dependencies)
from . import embeddings, evals, guardrails, models, rag, toolbox
Expand Down
6 changes: 3 additions & 3 deletions src/selectools/agent/_provider_caller.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,10 @@ def _streaming_call(self, stream_handler: Optional[Callable[[str], None]] = None
max_tokens=self.config.max_tokens,
timeout=self.config.request_timeout,
):
if chunk:
aggregated.append(str(chunk))
if isinstance(chunk, str) and chunk:
aggregated.append(chunk)
if stream_handler:
stream_handler(str(chunk))
stream_handler(chunk)

return "".join(aggregated)

Expand Down
Loading
Loading