feat: v0.17.0 release prep — 10 items across code, docs, and landing page by johnnichev · Pull Request #21 · johnnichev/selectools

johnnichev · 2026-03-22T18:06:40Z

Summary

10 items to complete the v0.17.0 release:

#	Item	Type
1	Notebook eval section (Step 19)	Docs
2	README eval showcase + code block	Docs
3	Landing page — dual comparison tables (frameworks + eval tools)	Marketing
4	`report.to_markdown()` — paste into GitHub/Slack/PRs	Code
5	Evaluator count badge in README	Docs
6	CHANGELOG.md — comprehensive v0.17.0 entry	Release
7	Blog draft in .private/	Content
8	`pip install selectools[evals]` — optional PyYAML	Code
9	Observer events: `on_eval_start/case_end/end`	Code
10	Trend chart sparkline in HTML report	Code

17 new tests (total eval: 309). All hooks pass. MkDocs builds clean.

Test plan

309 eval tests pass
All pre-commit hooks pass
MkDocs build passes
Notebook updated with eval section

…charts, pip extra, notebook, README showcase, badge, CHANGELOG, landing page 10 items for the v0.17.0 release: 1. Notebook — eval section in getting_started.ipynb (Step 19) 2. README — eval showcase with code block, evaluator badge 3. Landing page — expanded comparison tables (frameworks + eval tools), updated evaluator counts (22→39), added new evaluator pills 4. report.to_markdown() — markdown summary for GitHub issues/PRs 5. Evaluator count badge in README badges 6. CHANGELOG.md — comprehensive v0.17.0 entry 7. Blog draft in .private/blog-v0.17.0-eval.md 8. pip install selectools[evals] — optional PyYAML dependency 9. Observer events — on_eval_start, on_eval_case_end, on_eval_end wired into EvalSuite, LoggingObserver, and AsyncAgentObserver 10. Trend chart — accuracy sparkline SVG in HTML report when HistoryTrend is provided 17 new tests (total eval: 309).

Agent core observers (6 fixes): - astream() cancellation/budget paths now build proper results with trace steps and async observer events (#14) - arun() fires async observers for cancel/budget/max-iter (#15) - _aexecute_tools_parallel fires async observer events (#16) - _aexecute_tools_parallel tracks tool_usage/tool_tokens (#17) - _acheck_policy fires async on_policy_decision observer (#10M) - astream() max-iter path fires async on_run_end (#12M) Tools + providers (7 fixes): - Anthropic empty content list guard (#19) - Bool rejected for int/float params (#20) - ToolRegistry.tool() has screen_output/terminal/requires_approval (#21) - MultiMCPClient list_all_tools() copies tools before prefixing (#22) - Streamable-http 3-tuple unpacking robust handling (#23) - _serialize_result returns "" for None (#24) - StructuredOutputEvaluator handles __slots__ (#45) RAG (6 fixes): - SQLiteVectorStore search documented limitation (#25) - InMemoryVectorStore max_documents warning (#26) - Pinecone metadata.get instead of .pop (#27) - ContextualChunker None content guard (#28) - Filter overfetch: top_k*4 when filter present (#29) - OpenAI embed_texts batching at 2048 (#30) Memory (5 fixes): - FileKnowledgeStore reads under lock (#32) - SQLiteSessionStore WAL mode (#33) - SQLiteKnowledgeStore indexes on query columns (#34) - query() LIMIT after TTL filter (#35) - Redis save() category update in pipeline (#36) Evals (4 fixes): - 16 LLM evaluators fail on unparseable score (#37) - XSS fix: textContent instead of innerHTML (#38) - Donut SVG 360° arc: two semicircles (#39) - Suite completed counter under threading.Lock (#46) Security (5 fixes): - REWRITE/WARN guardrails tracked in trace (#40) - SSN regex requires consistent separators (#41) - Topic guardrail Unicode normalization (#42) - Coherence usage tracked in agent costs (#43) - Coherence fail_closed option (#44) Full suite: 2013 passed.

- H5: policy.py — deny empty tool_name immediately instead of falling through pattern matching where fnmatch("","*") would match allow/deny/* - H6: pii.py — extend SSN regex to detect space-separated format (123 45 6789 was not detected; only dash-separated and 9-digit bare) - H9: decorators.py — _unwrap_type now handles Python 3.10+ X | None syntax (types.UnionType); previously str | None annotations raised ToolValidationError on Python 3.10/3.11/3.12/3.13 - CLAUDE.md — add pitfalls #19 (eval judge prompt injection fencing), #20 (ThreadPoolExecutor singleton), #21 (types.UnionType in tools)

johnnichev merged commit e498cb4 into main Mar 22, 2026

johnnichev deleted the feat/eval-release-prep branch March 22, 2026 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v0.17.0 release prep — 10 items across code, docs, and landing page#21

feat: v0.17.0 release prep — 10 items across code, docs, and landing page#21
johnnichev merged 1 commit intomainfrom
feat/eval-release-prep

johnnichev commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnnichev commented Mar 22, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant