release: v0.17.0 — Built-in Eval Framework#23
Merged
johnnichev merged 1 commit intomainfrom Mar 22, 2026
Merged
Conversation
Version bump 0.16.7 → 0.17.0 across __init__.py and pyproject.toml. Count fixes across 14 files: - Tests: 1758 → 1960 (README, CLAUDE.md, CONTRIBUTING, docs/index, landing) - Evaluators: 22 → 39 (README, CLAUDE.md, EVALS.md) - Examples: 39 → 40 (README, docs/index) - Observer events: 25 → 28 (CLAUDE.md, README, docs/index, ARCHITECTURE, AGENT) - Eval tests: 309 → 340 (README, CHANGELOG) - CONTRIBUTING version: v0.16.7 → v0.17.0 - ROADMAP: v0.17.0 marked complete CHANGELOG updated with comprehensive v0.17.0 entry. MkDocs build clean. 1906 tests passing.
johnnichev
added a commit
that referenced
this pull request
Mar 24, 2026
Agent core observers (6 fixes): - astream() cancellation/budget paths now build proper results with trace steps and async observer events (#14) - arun() fires async observers for cancel/budget/max-iter (#15) - _aexecute_tools_parallel fires async observer events (#16) - _aexecute_tools_parallel tracks tool_usage/tool_tokens (#17) - _acheck_policy fires async on_policy_decision observer (#10M) - astream() max-iter path fires async on_run_end (#12M) Tools + providers (7 fixes): - Anthropic empty content list guard (#19) - Bool rejected for int/float params (#20) - ToolRegistry.tool() has screen_output/terminal/requires_approval (#21) - MultiMCPClient list_all_tools() copies tools before prefixing (#22) - Streamable-http 3-tuple unpacking robust handling (#23) - _serialize_result returns "" for None (#24) - StructuredOutputEvaluator handles __slots__ (#45) RAG (6 fixes): - SQLiteVectorStore search documented limitation (#25) - InMemoryVectorStore max_documents warning (#26) - Pinecone metadata.get instead of .pop (#27) - ContextualChunker None content guard (#28) - Filter overfetch: top_k*4 when filter present (#29) - OpenAI embed_texts batching at 2048 (#30) Memory (5 fixes): - FileKnowledgeStore reads under lock (#32) - SQLiteSessionStore WAL mode (#33) - SQLiteKnowledgeStore indexes on query columns (#34) - query() LIMIT after TTL filter (#35) - Redis save() category update in pipeline (#36) Evals (4 fixes): - 16 LLM evaluators fail on unparseable score (#37) - XSS fix: textContent instead of innerHTML (#38) - Donut SVG 360° arc: two semicircles (#39) - Suite completed counter under threading.Lock (#46) Security (5 fixes): - REWRITE/WARN guardrails tracked in trace (#40) - SSN regex requires consistent separators (#41) - Topic guardrail Unicode normalization (#42) - Coherence usage tracked in agent costs (#43) - Coherence fail_closed option (#44) Full suite: 2013 passed.
johnnichev
added a commit
that referenced
this pull request
Apr 11, 2026
BUG-16: _build_cancelled_result called _session_save but was missing _extract_entities and _extract_kg_triples. When a run was cancelled via CancellationToken, any entities/KG triples that had been collected during the turn were silently lost. Now mirrors _build_max_iterations_result and _build_budget_ exceeded_result which call all three persistence methods. BUG-22: @tool() treated Optional[T] without a default value as required. Some LLMs refuse to call a tool when an 'optional' parameter has no way to represent None. Now detects Optional types via Union[T, None] and marks them is_optional=True even without a default value. Cross-referenced from CLAUDE.md pitfall #23 and Agno #7066.
johnnichev
added a commit
that referenced
this pull request
Apr 13, 2026
…amples ## Summary Three rounds of competitive bug mining across 9 repositories (~325k combined stars) surfaced and shipped 34 confirmed-live bugs with TDD regression tests. ### Round 1 (Agno + PraisonAI): BUG-01 – BUG-22 22 bugs: streaming tool-call drops, typing.Literal crashes, asyncio.run re-entry, HITL interrupt propagation, ConversationMemory thread safety, think-tag stripping, RAG batch limits, MCP concurrent race, str→typed coercion, Union typing, multi-interrupt generators, GraphState fail-fast, session namespace, summary cap, cancelled-result persistence, AgentTrace lock, async observer logging, clone isolation, OTel/Langfuse locks, vector store dedup, Optional[T] handling. ### Round 2 (LangChain + LangGraph + CrewAI + n8n + LlamaIndex + AutoGen): BUG-23 – BUG-26 4 bugs: reranker top_k=0 falsy fallback, _dedup_search_results text-only keying, in-memory filter operator-dict silent-ignore, Gemini provider or-0 usage metadata. ### Round 3 (LiteLLM + Pydantic AI + Haystack): BUG-27 – BUG-34 8 bugs: FallbackProvider retry list incomplete (529/504/408/522/524), Azure deployment-name family detection bypass, bare list/dict tool schemas missing items/properties, pipeline.parallel shared input mutation, malformed tool-call JSON silent drop, run_in_executor drops contextvars at 5 sites, astream missing aclosing on provider generators, max_iterations consumed by structured-retry budget. ### Documentation & Content - Cookbook expanded from 7 to 30 recipes - 6 new examples (89–94) - Module docs updated: TOOLS, VECTOR_STORES, PROVIDERS, AGENT, PIPELINE - CLAUDE.md pitfalls 27–30 added - Stale counts swept across 13 files ### Stats - **5,064 tests** (up from 5,015; +104 regression tests) - **94 examples** (up from 88) - **30 cookbook recipes** (up from 7) - Cross-referenced: Agno (16), PraisonAI (5), LlamaIndex (3), LangChain (1), LiteLLM (2), Pydantic AI (4), Haystack (2), CLAUDE.md pitfall #23 (1) - First cross-round compound validation: CrewAI round-2 candidate confirmed by Haystack round-3 grep ## Test plan - [x] Full non-E2E suite: 5064 passed, 3 skipped, 0 failed - [x] All 6 new examples verified with PYTHONPATH=src - [x] Pre-commit hooks (black, isort, flake8, mypy, bandit) green on every commit - [x] CI green on PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Selectools v0.17.0 — Built-in Eval Framework
The only AI agent framework with a comprehensive evaluation suite built in. No separate install, no SaaS account, no external dependencies.
Highlights
python -m selectools.evals run cases.jsonsuite.estimate_cost()before runningpip install selectools[evals]— optional PyYAML dependencyreport.to_markdown()— paste into GitHub issues/Slack/PRsFiles changed
__init__.py+pyproject.toml(0.16.7 → 0.17.0)Test plan