johnnichev · johnnichev · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,61 @@ All notable changes to selectools will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.17.7] - 2026-03-25
+
+### Added
+
+#### SemanticCache
+- New `SemanticCache` in `src/selectools/cache_semantic.py` — drop-in replacement for `InMemoryCache`
+- Embeds cache keys with any `EmbeddingProvider` and serves hits based on cosine similarity
+- Configurable `similarity_threshold` (default 0.92), `max_size` (LRU), `default_ttl`
+- Thread-safe (internal `threading.Lock`); pure-Python cosine similarity (no NumPy)
+- `stats` property returns `CacheStats` with hit/miss/eviction counters and `hit_rate`
+
+```python
+from selectools.cache_semantic import SemanticCache
+from selectools.embeddings.openai import OpenAIEmbeddingProvider
+
+cache = SemanticCache(
+    embedding_provider=OpenAIEmbeddingProvider(),
+    similarity_threshold=0.92,
+    max_size=500,
+)
+config = AgentConfig(cache=cache)
+# "What's the weather in NYC?" hits cache for "Weather in New York City?"
+```
+
+#### Prompt Compression
+- New `compress_context`, `compress_threshold`, `compress_keep_recent` fields on `AgentConfig`
+- Fires before each LLM call when estimated fill-rate ≥ threshold; summarises old messages into a `[Compressed context]` system message
+- Only modifies `self._history` (per-call view) — `self.memory` is never touched
+- New `StepType.PROMPT_COMPRESSED` added to `AgentTrace`
+- New `on_prompt_compressed(run_id, before_tokens, after_tokens, messages_compressed)` observer event on both `AgentObserver` and `AsyncAgentObserver`
+
+```python
+config = AgentConfig(
+    compress_context=True,
+    compress_threshold=0.75,  # trigger at 75 % context fill
+    compress_keep_recent=4,   # keep last 4 turns verbatim
+)
+```
+
+#### Conversation Branching
+- New `ConversationMemory.branch()` — returns an independent snapshot; changes to branch don't affect original
+- New `SessionStore.branch(source_id, new_id)` — forks a persisted session; supported by all three backends (`JsonFileSessionStore`, `SQLiteSessionStore`, `RedisSessionStore`)
+- Raises `ValueError` if `source_id` not found
+
+```python
+checkpoint = agent.memory.branch()     # snapshot in-memory
+store.branch("main", "experiment")     # fork a persisted session
+```
+
+### Stats
+- **55 new tests** (total: 2275)
+- **3 new examples** (52: semantic cache, 53: prompt compression, 54: conversation branching; total: 54)
+- **1 new StepType** — `prompt_compressed` (total: 17)
+- **1 new observer event** — `on_prompt_compressed` (total: 32 sync / 29 async)
+
 ## [0.17.6] - 2026-03-24
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -62,8 +62,8 @@ src/selectools/
 ├── models.py                # 152 model registry with pricing (single source of truth)
 ├── pricing.py               # Derives pricing from models.py
 ├── usage.py                 # Token + cost tracking
-├── trace.py                 # AgentTrace, TraceStep (16 step types — see list below)
-├── observer.py              # AgentObserver (31 sync events) + AsyncAgentObserver (28 async events) + LoggingObserver + SimpleStepObserver
+├── trace.py                 # AgentTrace, TraceStep (17 step types — see list below)
+├── observer.py              # AgentObserver (32 sync events) + AsyncAgentObserver (29 async events) + LoggingObserver + SimpleStepObserver
 ├── policy.py                # ToolPolicy (allow/review/deny rules)
 ├── parser.py                # ToolCallParser (JSON extraction from LLM responses)
 ├── prompt.py                # PromptBuilder (system prompt generation)
@@ -94,7 +94,7 @@ src/selectools/
     ├── junit.py             # JUnit XML for CI
     └── __main__.py          # CLI: python -m selectools.evals
 
-tests/                       # 2220 tests (unit, integration, regression, E2E)
+tests/                       # 2275 tests (unit, integration, regression, E2E)
 ├── agent/                   # Agent core tests
 ├── providers/               # Provider-specific tests
 ├── rag/                     # RAG pipeline tests
@@ -103,7 +103,7 @@ tests/                       # 2220 tests (unit, integration, regression, E2E)
 ├── core/                    # Framework-level tests
 └── test_*.py                # Module-level unit tests
 
-examples/                    # 38 numbered example scripts (01-38)
+examples/                    # 54 numbered example scripts (01-54)
 notebooks/getting_started.ipynb  # Interactive getting-started guide
 
 docs/                        # MkDocs Material documentation
@@ -275,6 +275,7 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
 | `kg_extraction` | v0.16.0 | Knowledge graph triples extracted |
 | `budget_exceeded` | v0.17.3 | Agent stopped due to token/cost budget limit |
 | `cancelled` | v0.17.3 | Agent run cancelled via CancellationToken |
+| `prompt_compressed` | v0.17.7 | Older history summarised to free context window |
 
 ## Common Pitfalls (from past bugs)
 

diff --git a/README.md b/README.md
@@ -12,6 +12,32 @@ An open-source project from **[NichevLabs](https://nichevlabs.com)**.
 
 ## What's New in v0.17
 
+### v0.17.7 — Caching & Context
+
+```python
+from selectools.cache_semantic import SemanticCache
+from selectools.embeddings.openai import OpenAIEmbeddingProvider
+
+# Semantic cache — cache hits for paraphrased queries
+cache = SemanticCache(
+    embedding_provider=OpenAIEmbeddingProvider(),
+    similarity_threshold=0.92,
+)
+config = AgentConfig(cache=cache)
+# "Weather in NYC?" hits cache for "What's the weather in New York City?"
+
+# Prompt compression — prevent context-window overflow
+config = AgentConfig(
+    compress_context=True,
+    compress_threshold=0.75,   # trigger at 75 % context fill
+    compress_keep_recent=4,    # keep last 4 turns verbatim
+)
+
+# Conversation branching — fork history for A/B exploration
+branch = agent.memory.branch()   # independent snapshot
+store.branch("main", "experiment")  # fork a persisted session
+```
+
 ### v0.17.6 — Quick Wins
 
 ```python
@@ -158,7 +184,7 @@ report.to_html("report.html")
 | **Cross-Session Knowledge** | Daily logs + persistent facts with auto-registered `remember` tool. |
 | **MCP Integration** | Connect to any MCP tool server (stdio + HTTP). MCPClient, MultiMCPClient, MCPServer. Circuit breaker, retry, graceful degradation. |
 | **Eval Framework** | 39 built-in evaluators (21 deterministic + 18 LLM-as-judge). A/B testing, regression detection, snapshot testing, HTML reports, JUnit XML, CI integration. |
-| **AgentObserver Protocol** | 31-event lifecycle observer with `run_id`/`call_id` correlation. Built-in `LoggingObserver` + `SimpleStepObserver`. |
+| **AgentObserver Protocol** | 32-event lifecycle observer with `run_id`/`call_id` correlation. Built-in `LoggingObserver` + `SimpleStepObserver`. |
 | **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration. |
 | **Production Hardened** | Retries with backoff, per-tool timeouts, iteration caps, cost warnings, observability hooks + observers. |
 | **Library-First** | Not a framework. No magic globals, no hidden state. Use as much or as little as you need. |
@@ -186,10 +212,13 @@ report.to_html("report.html")
 - **Token Budget & Cancellation**: `max_total_tokens`, `max_cost_usd` hard limits; `CancellationToken` for cooperative stopping
 - **Token Estimation**: `estimate_run_tokens()` for pre-execution budget checks
 - **Model Switching**: `model_selector` callback for per-iteration model selection
-- **51 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
+- **Semantic Cache**: `SemanticCache` — embedding-based cache hits for paraphrased queries (cosine similarity, LRU + TTL)
+- **Prompt Compression**: Auto-summarise old history when context window fills up; `compress_context`, `compress_threshold`, `compress_keep_recent`
+- **Conversation Branching**: `ConversationMemory.branch()` and `SessionStore.branch()` for A/B exploration and checkpointing
+- **54 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
 - **Built-in Eval Framework**: 39 evaluators (21 deterministic + 18 LLM-as-judge), A/B testing, regression detection, HTML reports, JUnit XML, snapshot testing
-- **AgentObserver Protocol**: 31 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
-- **2220 Tests**: Unit, integration, regression, and E2E with real API calls
+- **AgentObserver Protocol**: 32 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
+- **2275 Tests**: Unit, integration, regression, and E2E with real API calls
 
 ## Install
 
@@ -513,7 +542,7 @@ agent = Agent(
 )
 ```
 
-28 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim, guardrail, coherence, screening, session, entity, KG. See `observer.py` for full reference.
+32 lifecycle events: run, LLM, tool, iteration, batch, policy, structured output, fallback, retry, memory trim, guardrail, coherence, screening, session, entity, KG, budget exceeded, cancelled, prompt compressed. See `observer.py` for full reference.
 
 ### E2E Streaming & Parallel Execution
 
@@ -758,7 +787,7 @@ pytest tests/ -x -q          # All tests
 pytest tests/ -k "not e2e"   # Skip E2E (no API keys needed)
 ```
 
-2220 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
+2275 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
 
 ## License
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -215,9 +215,9 @@ v0.17.5  ✅ Bug Hunt & Async Guardrails
 v0.17.6  ✅ Quick Wins
          ReAct/CoT reasoning strategies → Tool result caching → Python 3.9–3.13 CI matrix
 
-v0.17.7  🟡 Caching & Context
+v0.17.7  ✅ Caching & Context
          Semantic caching → Prompt compression → Conversation branching
-         (~50 tests, 3 examples)
+         (55 tests, 3 examples)
 
 v0.18.0  🟡 Multi-Agent Orchestration
          AgentGraph → GraphState → Typed reducers → Resume-from-yield interrupts
@@ -387,7 +387,7 @@ First-class MCP support lets Selectools agents use any MCP-compatible tool serve
 
 ---
 
-## v0.17.7: Caching & Context 🟡
+## v0.17.7: Caching & Context ✅
 
 Focus: Smarter token management and memory exploration.
 

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -5,6 +5,61 @@ All notable changes to selectools will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.17.7] - 2026-03-25
+
+### Added
+
+#### SemanticCache
+- New `SemanticCache` in `src/selectools/cache_semantic.py` — drop-in replacement for `InMemoryCache`
+- Embeds cache keys with any `EmbeddingProvider` and serves hits based on cosine similarity
+- Configurable `similarity_threshold` (default 0.92), `max_size` (LRU), `default_ttl`
+- Thread-safe (internal `threading.Lock`); pure-Python cosine similarity (no NumPy)
+- `stats` property returns `CacheStats` with hit/miss/eviction counters and `hit_rate`
+
+```python
+from selectools.cache_semantic import SemanticCache
+from selectools.embeddings.openai import OpenAIEmbeddingProvider
+
+cache = SemanticCache(
+    embedding_provider=OpenAIEmbeddingProvider(),
+    similarity_threshold=0.92,
+    max_size=500,
+)
+config = AgentConfig(cache=cache)
+# "What's the weather in NYC?" hits cache for "Weather in New York City?"
+```
+
+#### Prompt Compression
+- New `compress_context`, `compress_threshold`, `compress_keep_recent` fields on `AgentConfig`
+- Fires before each LLM call when estimated fill-rate ≥ threshold; summarises old messages into a `[Compressed context]` system message
+- Only modifies `self._history` (per-call view) — `self.memory` is never touched
+- New `StepType.PROMPT_COMPRESSED` added to `AgentTrace`
+- New `on_prompt_compressed(run_id, before_tokens, after_tokens, messages_compressed)` observer event on both `AgentObserver` and `AsyncAgentObserver`
+
+```python
+config = AgentConfig(
+    compress_context=True,
+    compress_threshold=0.75,  # trigger at 75 % context fill
+    compress_keep_recent=4,   # keep last 4 turns verbatim
+)
+```
+
+#### Conversation Branching
+- New `ConversationMemory.branch()` — returns an independent snapshot; changes to branch don't affect original
+- New `SessionStore.branch(source_id, new_id)` — forks a persisted session; supported by all three backends (`JsonFileSessionStore`, `SQLiteSessionStore`, `RedisSessionStore`)
+- Raises `ValueError` if `source_id` not found
+
+```python
+checkpoint = agent.memory.branch()     # snapshot in-memory
+store.branch("main", "experiment")     # fork a persisted session
+```
+
+### Stats
+- **55 new tests** (total: 2275)
+- **3 new examples** (52: semantic cache, 53: prompt compression, 54: conversation branching; total: 54)
+- **1 new StepType** — `prompt_compressed` (total: 17)
+- **1 new observer event** — `on_prompt_compressed` (total: 32 sync / 29 async)
+
 ## [0.17.6] - 2026-03-24
 
 ### Added

diff --git a/docs/index.md b/docs/index.md
@@ -136,12 +136,15 @@ print(result.reasoning)       # Why the agent chose get_weather
 | **Entity Memory** | Auto-extract named entities with LRU-pruned registry and context injection |
 | **Knowledge Graph** | Relationship triple extraction with in-memory and SQLite storage |
 | **Cross-Session Knowledge** | Daily logs + persistent facts with auto-registered `remember` tool |
-| **AgentObserver Protocol** | 31-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
+| **AgentObserver Protocol** | 32-event lifecycle observer with run/call ID correlation, `SimpleStepObserver`, and OTel export |
 | **Runtime Controls** | Token/cost budget limits, cooperative cancellation, per-tool approval gates, model switching per iteration |
 | **Reasoning Strategies** | Built-in ReAct, Chain-of-Thought, and Plan-Then-Act via `reasoning_strategy` config |
 | **Tool Result Caching** | `@tool(cacheable=True, cache_ttl=60)` — skip re-execution for identical tool calls |
+| **Semantic Cache** | `SemanticCache` — embedding-based cache hits for paraphrased queries via cosine similarity |
+| **Prompt Compression** | Proactive context-window management — summarises old messages when fill-rate exceeds threshold |
+| **Conversation Branching** | `memory.branch()` and `store.branch()` — fork history for A/B exploration and checkpointing |
 | **Eval Framework** | 39 built-in evaluators, A/B testing, regression detection, HTML reports, JUnit XML |
-| **2220 Tests** | Unit, integration, regression, and E2E |
+| **2275 Tests** | Unit, integration, regression, and E2E |
 
 ---