Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .claude/skills/audit/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Verify each v0.17.3+ feature has:
- [ ] Example script in `examples/`
- [ ] Entry in `mkdocs.yml` nav

Features to check: Budget, Cancellation, Token Estimation, Model Switching, SimpleStepObserver, Structured Results, Approval Gate
Features to check: Budget, Cancellation, Token Estimation, Model Switching, SimpleStepObserver, Structured Results, Approval Gate, Reasoning Strategies, Tool Result Caching

### 5. Link Check
```bash
Expand All @@ -91,7 +91,7 @@ diff CHANGELOG.md docs/CHANGELOG.md
```

### 7. Private Docs (if accessible)
Check `.private/comparison-table.md` and `.private/competitive-gaps.md` for stale eval counts, test counts, version references.
Check `.private/master-competitive-plan.md`, `.private/competitive-analysis.md`, and `.private/growth-plan.md` for stale test counts, example counts, version references, and competitive scorecard accuracy.

### 8. Output

Expand Down
137 changes: 113 additions & 24 deletions .claude/skills/release/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ Preparing release version: $ARGUMENTS
- Examples: !`ls examples/*.py | wc -l | tr -d ' '`
- Models: !`grep -c "ModelInfo(" src/selectools/models.py`
- StepTypes: !`python3 -c "from selectools.trace import StepType; print(len(StepType))" 2>/dev/null`
- Observer events (sync): !`python3 -c "from selectools.observer import AgentObserver; print(len([m for m in dir(AgentObserver) if m.startswith('on_')]))" 2>/dev/null`
- Last example: !`ls examples/*.py | tail -1`

## CRITICAL: Git Workflow Rules

Expand All @@ -24,46 +26,110 @@ Preparing release version: $ARGUMENTS
- **Keep feature work on one branch** — don't merge WIP to main
- **No co-author lines** in commits

## 1. Pre-Release Checks
---

## Phase 1: Quality Gate — Lint & Tests

Run `/lint` (fix mode) to auto-format and check code quality. ALL four checks must pass:
- black
- isort
- flake8
- mypy

Then run the full test suite:
```bash
pytest tests/ -x -q
black src/ tests/ --line-length=100 --check
isort src/ tests/ --profile=black --line-length=100 --check
flake8 src/
mypy src/
cp CHANGELOG.md docs/CHANGELOG.md && mkdocs build
```

## 2. Version Bump
**STOP if any lint or test failure. Fix before proceeding.**

---

## Phase 2: Version Bump

Update version in TWO files (must match):
- `src/selectools/__init__.py`: `__version__ = "X.Y.Z"`
- `pyproject.toml`: `version = "X.Y.Z"`

## 3. CHANGELOG.md
---

## Phase 3: CHANGELOG.md

Add entry at the top following existing format. Include:
- Feature summary with code examples
- Bug fixes (if any)
- New test count, example count
- Migration notes (if breaking)

Add entry at the top following existing format. Then sync:
Then sync:
```bash
cp CHANGELOG.md docs/CHANGELOG.md
```

## 4. README.md Updates
---

## Phase 4: Documentation Sweep

This is the most commonly missed step. For EVERY new feature in this release, verify ALL of these exist. Use `/docs` guidance for each missing item.

### 4a. Feature-Level Doc Checklist

For each new feature, verify:
- [ ] **Module doc** exists in `docs/modules/<FEATURE>.md`
- [ ] **mkdocs.yml** nav includes the module doc
- [ ] **Example script** exists in `examples/`
- [ ] **docs/index.md** feature table updated (if user-facing)
- [ ] **docs/QUICKSTART.md** updated (if it changes the getting-started flow)
- [ ] **docs/ARCHITECTURE.md** updated (if it adds a new component)

- Update "What's New" section
- Update feature table if new capabilities
- Update stats: test count, example count
### 4b. Cross-Cutting Doc Updates

## 5. ROADMAP.md + CLAUDE.md Updates
These docs reference counts and features that change with every release:

- Mark completed version with ✅
- Update any stale counts in CLAUDE.md
| Document | What to check |
|----------|---------------|
| `README.md` | "What's New" section, feature table, stats (test count, example count, model count) |
| `ROADMAP.md` | Mark completed version ✅, update Implementation Order section |
| `CLAUDE.md` | Codebase Structure tree, StepType table, Observer counts, Current Roadmap section, Common Pitfalls |
| `docs/index.md` | Feature table, model count, test count |
| `CONTRIBUTING.md` + `docs/CONTRIBUTING.md` | Test count |
| `landing/index.html` | Badge counts (if exists) |

## 6. Count Audit
### 4c. Private Docs

Run `/audit` to verify all hardcoded counts match live values.
Update `.private/` tracking docs:
- `.private/session.md` — update Current State table, record what shipped
- `.private/master-competitive-plan.md` — mark completed items, update scorecard/counts
- `.private/competitive-analysis.md` — update comparison matrix, test count, advantages list
- `.private/growth-plan.md` — update product features list

## 7. Commit (DO NOT push yet)
### 4d. Doc Build Verification

```bash
cp CHANGELOG.md docs/CHANGELOG.md && mkdocs build
```

Report any warnings. Fix broken links.

---

## Phase 5: Cross-Reference Audit

Run `/audit` to catch anything missed in Phase 4. This is a HARD GATE — do not proceed to commit if there are count mismatches or content drift.

The audit checks:
- Version consistency (__init__.py vs pyproject.toml)
- Hardcoded counts across all docs (tests, examples, models, observers, StepTypes)
- Content drift (CLAUDE.md structure tree, StepType table, observer counts)
- New feature doc coverage (module docs, examples, mkdocs nav)
- CHANGELOG sync
- Link validity (mkdocs build)

**Fix ALL mismatches before proceeding.**

---

## Phase 6: Commit (DO NOT push yet)

```bash
git checkout -b release/vX.Y.Z # or feat/<name> for feature releases
Expand All @@ -73,16 +139,33 @@ git commit -m "release: vX.Y.Z — Feature Theme Name"

**Stop here and tell the user the commit is ready. Wait for explicit push approval.**

## 8. After User Approves Push
---

## Phase 7: After User Approves Push

```bash
git push -u origin HEAD
gh pr create --title "release: vX.Y.Z" --body "..."
gh pr create --title "release: vX.Y.Z — Theme" --body "$(cat <<'EOF'
## Summary
- Feature 1
- Feature 2
- ...

## Checklist
- [ ] All tests pass (N tests)
- [ ] Lint clean (black, isort, flake8, mypy)
- [ ] Docs updated (README, ROADMAP, CHANGELOG, module docs, index, architecture)
- [ ] Audit passed (counts consistent)
- [ ] mkdocs build clean
EOF
)"
```

Wait for user to approve merge.

## 9. After PR Merged
---

## Phase 8: After PR Merged

```bash
gh pr merge <number> --merge --delete-branch
Expand All @@ -91,20 +174,26 @@ git tag -a vX.Y.Z -m "vX.Y.Z — Feature Theme Name"
git push origin main --tags
```

## 10. PyPI Publish (after user confirms)
---

## Phase 9: PyPI Publish (after user confirms)

```bash
rm -rf dist/
python3 -m build
python3 -m twine upload dist/*
```

## 11. Post-Release Verification
---

## Phase 10: Post-Release Verification

- Verify GitHub Pages auto-deploys docs
- Verify PyPI page shows new version
- `pip install selectools==X.Y.Z` in a clean env

---

## Version Numbering

- **Patch** (0.X.Y): Bug fixes, small features
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,16 @@ on:
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
name: test (py${{ matrix.python-version }})
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.13"
python-version: ${{ matrix.python-version }}
cache: "pip"
- name: Install dependencies
run: |
Expand Down
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,45 @@ All notable changes to selectools will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.6] - 2026-03-24

### Added

#### Reasoning Strategies
- New `reasoning_strategy` field on `AgentConfig`: `"react"`, `"cot"`, `"plan_then_act"`
- Injects structured reasoning instructions into the system prompt via `PromptBuilder`
- Works with existing `result.reasoning` extraction for full visibility into agent thought process
- New export: `REASONING_STRATEGIES` dict for discovering available strategies

```python
config = AgentConfig(reasoning_strategy="react") # Thought → Action → Observation
config = AgentConfig(reasoning_strategy="cot") # Chain-of-Thought step-by-step
config = AgentConfig(reasoning_strategy="plan_then_act") # Plan first, then execute
```

#### Tool Result Caching
- New `cacheable` and `cache_ttl` parameters on `Tool` and `@tool()` decorator
- Cacheable tools skip re-execution when called with the same arguments within TTL
- Cache key: `tool_result:{tool_name}:{sha256(sorted_params)}`
- Wired into all 4 execution paths (single sync/async, parallel sync/async)
- Records `StepType.CACHE_HIT` trace step on cache hits
- Reuses the agent's existing `config.cache` (InMemoryCache, RedisCache)

```python
@tool(description="Search the web", cacheable=True, cache_ttl=60)
def web_search(query: str) -> str:
return expensive_api_call(query)
```

#### Python 3.9–3.13 CI Matrix
- GitHub Actions now tests against Python 3.9, 3.10, 3.11, 3.12, and 3.13
- Full codebase audit confirmed zero 3.10+ only syntax (all `X | Y` unions guarded by `from __future__ import annotations`)
- Added `Programming Language :: Python :: 3.13` classifier to pyproject.toml

### Stats
- **37 new tests** (total: 2220)
- **2 new examples** (50: reasoning strategies, 51: tool result caching; total: 51)

## [0.17.5] - 2026-03-23

### Fixed — Bug Hunt (91 validated fixes across 7 subsystems)
Expand Down
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ src/selectools/
├── junit.py # JUnit XML for CI
└── __main__.py # CLI: python -m selectools.evals

tests/ # 2183 tests (unit, integration, regression, E2E)
tests/ # 2220 tests (unit, integration, regression, E2E)
├── agent/ # Agent core tests
├── providers/ # Provider-specific tests
├── rag/ # RAG pipeline tests
Expand Down Expand Up @@ -330,7 +330,7 @@ Every `AgentTrace` contains `TraceStep` entries with one of these types:
- **v0.17.3** ✅ Agent Runtime Controls — token budget, cancellation, cost attribution, structured results, approval gate, SimpleStepObserver
- **v0.17.4** ✅ Agent Intelligence — token estimation, model switching, knowledge memory enhancement (4 store backends)
- **v0.17.5** ✅ Bug Hunt & Async Guardrails — 91 validated fixes, async guardrails, 40 regression tests
- **v0.17.6** 🟡 Quick Wins — ReAct/CoT reasoning strategies, tool result caching, Python 3.9–3.13 CI matrix
- **v0.17.6** Quick Wins — ReAct/CoT reasoning strategies, tool result caching, Python 3.9–3.13 CI matrix
- **v0.17.7** 🟡 Caching & Context — semantic caching, prompt compression, conversation branching
- **v0.18.0** 🟡 Multi-Agent Orchestration — see `MULTI_AGENT_PLAN.md`
- **v0.18.x** 🟡 Composability Layer — Pipeline with `@step` + `|` operator (LCEL alternative)
Expand Down
12 changes: 6 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Thank you for your interest in contributing to Selectools! We welcome contributions from the community.

**Current Version:** v0.17.4
**Test Status:** 2183 tests passing (100%)
**Python:** 3.13+
**Current Version:** v0.17.6
**Test Status:** 2220 tests passing (100%)
**Python:** 3.9+

## Getting Started

Expand Down Expand Up @@ -74,7 +74,7 @@ Similar to `npm run` scripts, here are the common commands for this project:
### Testing

```bash
# Run all tests (2183 tests)
# Run all tests (2220 tests)
pytest tests/ -v

# Run tests quietly (summary only)
Expand Down Expand Up @@ -264,7 +264,7 @@ selectools/
│ ├── embeddings/ # Embedding providers
│ ├── rag/ # RAG: vector stores, chunking, loaders
│ └── toolbox/ # 24 pre-built tools
├── tests/ # Test suite (2183 tests)
├── tests/ # Test suite (2220 tests)
│ ├── agent/ # Agent tests
│ ├── rag/ # RAG tests
│ ├── tools/ # Tool tests
Expand Down Expand Up @@ -370,7 +370,7 @@ We especially welcome contributions in these areas:
- Add comparison guides (vs LangChain, LlamaIndex)

### 🧪 **Testing**
- Increase test coverage (currently 2183 tests passing!)
- Increase test coverage (currently 2220 tests passing!)
- Add performance benchmarks
- Improve E2E test stability with retry/rate-limit handling

Expand Down
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,24 @@ An open-source project from **[NichevLabs](https://nichevlabs.com)**.

## What's New in v0.17

### v0.17.6 — Quick Wins

```python
from selectools import AgentConfig, REASONING_STRATEGIES, tool

# Reasoning strategies — guide the LLM's thought process
config = AgentConfig(reasoning_strategy="react") # Thought → Action → Observation
config = AgentConfig(reasoning_strategy="cot") # Chain-of-Thought step-by-step
config = AgentConfig(reasoning_strategy="plan_then_act") # Plan first, then execute

# Tool result caching — skip re-execution for identical calls
@tool(description="Search the web", cacheable=True, cache_ttl=60)
def web_search(query: str) -> str:
return expensive_api_call(query)
```

Also: Python 3.9–3.13 CI matrix (verified zero compatibility issues).

### v0.17.4 — Agent Intelligence

```python
Expand Down Expand Up @@ -168,10 +186,10 @@ report.to_html("report.html")
- **Token Budget & Cancellation**: `max_total_tokens`, `max_cost_usd` hard limits; `CancellationToken` for cooperative stopping
- **Token Estimation**: `estimate_run_tokens()` for pre-execution budget checks
- **Model Switching**: `model_selector` callback for per-iteration model selection
- **49 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
- **51 Examples**: RAG, hybrid search, streaming, structured output, traces, batch, policy, observer, guardrails, audit, sessions, entity memory, knowledge graph, eval framework, and more
- **Built-in Eval Framework**: 39 evaluators (21 deterministic + 18 LLM-as-judge), A/B testing, regression detection, HTML reports, JUnit XML, snapshot testing
- **AgentObserver Protocol**: 31 lifecycle events with `run_id` correlation, `LoggingObserver`, `SimpleStepObserver`, OTel export
- **2183 Tests**: Unit, integration, regression, and E2E with real API calls
- **2220 Tests**: Unit, integration, regression, and E2E with real API calls

## Install

Expand Down Expand Up @@ -740,7 +758,7 @@ pytest tests/ -x -q # All tests
pytest tests/ -k "not e2e" # Skip E2E (no API keys needed)
```

2183 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.
2220 tests covering parsing, agent loop, providers, RAG pipeline, hybrid search, advanced chunking, dynamic tools, caching, streaming, guardrails, sessions, memory, eval framework, budget/cancellation, knowledge stores, and E2E integration.

## License

Expand Down
Loading
Loading