Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,61 @@ All notable changes to selectools will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.7] - 2026-03-25

### Added

#### SemanticCache
- New `SemanticCache` in `src/selectools/cache_semantic.py` — drop-in replacement for `InMemoryCache`
- Embeds cache keys with any `EmbeddingProvider` and serves hits based on cosine similarity
- Configurable `similarity_threshold` (default 0.92), `max_size` (LRU), `default_ttl`
- Thread-safe (internal `threading.Lock`); pure-Python cosine similarity (no NumPy)
- `stats` property returns `CacheStats` with hit/miss/eviction counters and `hit_rate`

```python
from selectools.cache_semantic import SemanticCache
from selectools.embeddings.openai import OpenAIEmbeddingProvider

cache = SemanticCache(
embedding_provider=OpenAIEmbeddingProvider(),
similarity_threshold=0.92,
max_size=500,
)
config = AgentConfig(cache=cache)
# "What's the weather in NYC?" hits cache for "Weather in New York City?"
```

#### Prompt Compression
- New `compress_context`, `compress_threshold`, `compress_keep_recent` fields on `AgentConfig`
- Fires before each LLM call when estimated fill-rate ≥ threshold; summarises old messages into a `[Compressed context]` system message
- Only modifies `self._history` (per-call view) — `self.memory` is never touched
- New `StepType.PROMPT_COMPRESSED` added to `AgentTrace`
- New `on_prompt_compressed(run_id, before_tokens, after_tokens, messages_compressed)` observer event on both `AgentObserver` and `AsyncAgentObserver`

```python
config = AgentConfig(
compress_context=True,
compress_threshold=0.75, # trigger at 75 % context fill
compress_keep_recent=4, # keep last 4 turns verbatim
)
```

#### Conversation Branching
- New `ConversationMemory.branch()` — returns an independent snapshot; changes to branch don't affect original
- New `SessionStore.branch(source_id, new_id)` — forks a persisted session; supported by all three backends (`JsonFileSessionStore`, `SQLiteSessionStore`, `RedisSessionStore`)
- Raises `ValueError` if `source_id` not found

```python
checkpoint = agent.memory.branch() # snapshot in-memory
store.branch("main", "experiment") # fork a persisted session
```

### Stats
- **55 new tests** (total: 2275)
- **3 new examples** (52: semantic cache, 53: prompt compression, 54: conversation branching; total: 54)
- **1 new StepType** — `prompt_compressed` (total: 17)
- **1 new observer event** — `on_prompt_compressed` (total: 32 sync / 29 async)

## [0.17.6] - 2026-03-24

### Added
Expand Down
9 changes: 5 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ src/selectools/
├── models.py # 152 model registry with pricing (single source of truth)
├── pricing.py # Derives pricing from models.py
├── usage.py # Token + cost tracking
├── trace.py # AgentTrace, TraceStep (16 step types — see list below)
├── observer.py # AgentObserver (31 sync events) + AsyncAgentObserver (28 async events) + LoggingObserver + SimpleStepObserver
├── trace.py # AgentTrace, TraceStep (17 step types — see list below)
├── observer.py # AgentObserver (32 sync events) + AsyncAgentObserver (29 async events) + LoggingObserver + SimpleStepObserver
├── policy.py # ToolPolicy (allow/review/deny rules)
├── parser.py # ToolCallParser (JSON extraction from LLM responses)
├── prompt.py # PromptBuilder (system prompt generation)
Expand Down Expand Up @@ -94,7 +94,7 @@ src/selectools/
├── junit.py # JUnit XML for CI
└── __main__.py # CLI: python -m selectools.evals

tests/ # 2220 tests (unit, integration, regression, E2E)
tests/ # 2275 tests (unit, integration, regression, E2E)
├── agent/ # Agent core tests
├── providers/ # Provider-specific tests
├── rag/ # RAG pipeline tests
Expand All @@ -103,7 +103,7 @@ tests/ # 2220 tests (unit, integration, regression, E2E)
├── core/ # Framework-level tests
└── test_*.py # Module-level unit tests

examples/ # 38 numbered example scripts (01-38)
examples/ # 54 numbered example scripts (01-54)
notebooks/getting_started.ipynb # Interactive getting-started guide

docs/ # MkDocs Material documentation
Expand Down Expand Up @@ -275,6 +275,7 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
| `kg_extraction` | v0.16.0 | Knowledge graph triples extracted |
| `budget_exceeded` | v0.17.3 | Agent stopped due to token/cost budget limit |
| `cancelled` | v0.17.3 | Agent run cancelled via CancellationToken |
| `prompt_compressed` | v0.17.7 | Older history summarised to free context window |

## Common Pitfalls (from past bugs)

Expand Down
41 changes: 35 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,32 @@ An open-source project from **[NichevLabs](https://nichevlabs.com)**.

## What's New in v0.17

### v0.17.7 — Caching & Context

```python
from selectools.cache_semantic import SemanticCache
from selectools.embeddings.openai import OpenAIEmbeddingProvider

# Semantic cache — cache hits for paraphrased queries
cache = SemanticCache(
embedding_provider=OpenAIEmbeddingProvider(),
similarity_threshold=0.92,
)
config = AgentConfig(cache=cache)
# "Weather in NYC?" hits cache for "What's the weather in New York City?"

# Prompt compression — prevent context-window overflow
config = AgentConfig(
compress_context=True,
compress_threshold=0.75, # trigger at 75 % context fill
compress_keep_recent=4, # keep last 4 turns verbatim
)

# Conversation branching — fork history for A/B exploration
branch = agent.memory.branch() # independent snapshot
store.branch("main", "experiment") # fork a persisted session
```

### v0.17.6 — Quick Wins

```python
Expand Down Expand Up @@ -158,7 +184,7 @@ report.to_html("report.html")
| **Cross-Session Knowledge** | Daily logs + persistent facts with auto-registered `remember` tool. |
| **MCP Integration** | Connect to any MCP tool server (stdio + HTTP). MCPClient, MultiMCPClient, MCPServer. Circuit breaker, retry, graceful degradation. |
| **Eval Framework** | 39 built-in evaluators (21 deterministic + 18 LLM-as-judge). A/B testing, regression detection, snapshot testing, HTML reports, JUnit XML, CI integration. |
| **AgentObserver Protocol** | 31-event lifecycle observer with `run_id`/`call_id` correlation. Built-in `LoggingObserver` + `SimpleStepObserver`. |
| **AgentObserver Protocol** | 32-event lifecycle observer with `run_id`/`call_id` correlation. Built-in `LoggingObserver` + `SimpleStepObserver`. |
| **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration. |
| **Production Hardened** | Retries with backoff, per-tool timeouts, iteration caps, cost warnings, observability hooks + observers. |
| **Library-First** | Not a framework. No magic globals, no hidden state. Use as much or as little as you need. |
Expand Down Expand Up @@ -186,10 +212,13 @@ report.to_html("report.html")
- **Token Budget & Cancellation**: `max_total_tokens`, `max_cost_usd` hard limits; `CancellationToken` for cooperative stopping
- **Token Estimation**: `estimate_run_tokens()` for pre-execution budget checks
- **Model Switching**: `model_selector` callback for per-iteration model selection
- **51 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
- **Semantic Cache**: `SemanticCache` — embedding-based cache hits for paraphrased queries (cosine similarity, LRU + TTL)
- **Prompt Compression**: Auto-summarise old history when context window fills up; `compress_context`, `compress_threshold`, `compress_keep_recent`
- **Conversation Branching**: `ConversationMemory.branch()` and `SessionStore.branch()` for A/B exploration and checkpointing
- **54 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
- **Built-in Eval Framework**: 39 evaluators (21 deterministic + 18 LLM-as-judge), A/B testing, regression detection, HTML reports, JUnit XML, snapshot testing
- **AgentObserver Protocol**: 31 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
- **2220 Tests**: Unit, integration, regression, and E2E with real API calls
- **AgentObserver Protocol**: 32 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
- **2275 Tests**: Unit, integration, regression, and E2E with real API calls

## Install

Expand Down Expand Up @@ -513,7 +542,7 @@ agent = Agent(
)
```

28 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim, guardrail, coherence, screening, session, entity, KG. See `observer.py` for full reference.
32 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim, guardrail, coherence, screening, session, entity, KG, budget exceeded, cancelled, prompt compressed. See `observer.py` for full reference.

### E2E Streaming & Parallel Execution

Expand Down Expand Up @@ -758,7 +787,7 @@ pytest tests/ -x -q # All tests
pytest tests/ -k "not e2e" # Skip E2E (no API keys needed)
```

2220 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
2275 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.

## License

Expand Down
6 changes: 3 additions & 3 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,9 +215,9 @@ v0.17.5 ✅ Bug Hunt & Async Guardrails
v0.17.6 ✅ Quick Wins
ReAct/CoT reasoning strategies → Tool result caching → Python 3.9–3.13 CI matrix

v0.17.7 🟡 Caching & Context
v0.17.7 Caching & Context
Semantic caching → Prompt compression → Conversation branching
(~50 tests, 3 examples)
(55 tests, 3 examples)

v0.18.0 🟡 Multi-Agent Orchestration
AgentGraph → GraphState → Typed reducers → Resume-from-yield interrupts
Expand Down Expand Up @@ -387,7 +387,7 @@ First-class MCP support lets Selectools agents use any MCP-compatible tool serve

---

## v0.17.7: Caching & Context 🟡
## v0.17.7: Caching & Context

Focus: Smarter token management and memory exploration.

Expand Down
55 changes: 55 additions & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,61 @@ All notable changes to selectools will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.7] - 2026-03-25

### Added

#### SemanticCache
- New `SemanticCache` in `src/selectools/cache_semantic.py` — drop-in replacement for `InMemoryCache`
- Embeds cache keys with any `EmbeddingProvider` and serves hits based on cosine similarity
- Configurable `similarity_threshold` (default 0.92), `max_size` (LRU), `default_ttl`
- Thread-safe (internal `threading.Lock`); pure-Python cosine similarity (no NumPy)
- `stats` property returns `CacheStats` with hit/miss/eviction counters and `hit_rate`

```python
from selectools.cache_semantic import SemanticCache
from selectools.embeddings.openai import OpenAIEmbeddingProvider

cache = SemanticCache(
embedding_provider=OpenAIEmbeddingProvider(),
similarity_threshold=0.92,
max_size=500,
)
config = AgentConfig(cache=cache)
# "What's the weather in NYC?" hits cache for "Weather in New York City?"
```

#### Prompt Compression
- New `compress_context`, `compress_threshold`, `compress_keep_recent` fields on `AgentConfig`
- Fires before each LLM call when estimated fill-rate ≥ threshold; summarises old messages into a `[Compressed context]` system message
- Only modifies `self._history` (per-call view) — `self.memory` is never touched
- New `StepType.PROMPT_COMPRESSED` added to `AgentTrace`
- New `on_prompt_compressed(run_id, before_tokens, after_tokens, messages_compressed)` observer event on both `AgentObserver` and `AsyncAgentObserver`

```python
config = AgentConfig(
compress_context=True,
compress_threshold=0.75, # trigger at 75 % context fill
compress_keep_recent=4, # keep last 4 turns verbatim
)
```

#### Conversation Branching
- New `ConversationMemory.branch()` — returns an independent snapshot; changes to branch don't affect original
- New `SessionStore.branch(source_id, new_id)` — forks a persisted session; supported by all three backends (`JsonFileSessionStore`, `SQLiteSessionStore`, `RedisSessionStore`)
- Raises `ValueError` if `source_id` not found

```python
checkpoint = agent.memory.branch() # snapshot in-memory
store.branch("main", "experiment") # fork a persisted session
```

### Stats
- **55 new tests** (total: 2275)
- **3 new examples** (52: semantic cache, 53: prompt compression, 54: conversation branching; total: 54)
- **1 new StepType** — `prompt_compressed` (total: 17)
- **1 new observer event** — `on_prompt_compressed` (total: 32 sync / 29 async)

## [0.17.6] - 2026-03-24

### Added
Expand Down
7 changes: 5 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,12 +136,15 @@ print(result.reasoning) # Why the agent chose get_weather
| **Entity Memory** | Auto-extract named entities with LRU-pruned registry and context injection |
| **Knowledge Graph** | Relationship triple extraction with in-memory and SQLite storage |
| **Cross-Session Knowledge** | Daily logs + persistent facts with auto-registered `remember` tool |
| **AgentObserver Protocol** | 31-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
| **AgentObserver Protocol** | 32-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
| **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration |
| **Reasoning Strategies** | Built-in ReAct, Chain-of-Thought, and Plan-Then-Act via `reasoning_strategy` config |
| **Tool Result Caching** | `@tool(cacheable=True, cache_ttl=60)` — skip re-execution for identical tool calls |
| **Semantic Cache** | `SemanticCache` — embedding-based cache hits for paraphrased queries via cosine similarity |
| **Prompt Compression** | Proactive context-window management — summarises old messages when fill-rate exceeds threshold |
| **Conversation Branching** | `memory.branch()` and `store.branch()` — fork history for A/B exploration and checkpointing |
| **Eval Framework** | 39 built-in evaluators, A/B testing, regression detection, HTML reports, JUnit XML |
| **2220 Tests** | Unit, integration, regression, and E2E |
| **2275 Tests** | Unit, integration, regression, and E2E |

---

Expand Down
Loading
Loading