Skip to content

release: v0.18.0b1 — Multi-Agent Orchestration + Composable Pipelines (Beta)#33

Merged
johnnichev merged 16 commits intomainfrom
feat/v0.18.0-multi-agent
Mar 26, 2026
Merged

release: v0.18.0b1 — Multi-Agent Orchestration + Composable Pipelines (Beta)#33
johnnichev merged 16 commits intomainfrom
feat/v0.18.0-multi-agent

Conversation

@johnnichev
Copy link
Copy Markdown
Owner

Summary

The biggest release since launch. Two headline features, both in plain Python:

Multi-Agent Orchestration

  • AgentGraph — directed graph of agent nodes with plain Python routing
  • SupervisorAgent — 4 strategies (plan_and_execute, round_robin, dynamic, magentic)
  • HITL via generator nodes — resumes at exact yield point (LangGraph restarts the node)
  • Parallel execution with 3 merge policies, checkpointing with 3 backends
  • Subgraph composition, loop/stall detection, budget propagation

Composable Pipelines

  • @step decorator on plain functions — zero learning curve
  • | operator for composition — thin sugar, not a DSL
  • parallel() and branch() primitives
  • Auto-tracing, retry with skip, kwargs filtering
  • Bridge to AgentGraph (pipeline as graph node)

Also in this release

  • License switched from LGPL-3.0 to Apache-2.0
  • 35 bug fixes across 7 bug hunt rounds
  • Anthropic SYSTEM role fix (400 error with prompt compression/entity memory)
  • 31 real-API e2e tests across OpenAI, Anthropic, Gemini
  • 5 config.model_effective_model fixes (coherence, summarization, entity/KG extraction)

Stats

  • Tests: 2275 → 2435 (+160)
  • Examples: 54 → 61 (+7)
  • StepTypes: 17 → 27 (+10)
  • Observer events: 32/29 → 45/42 sync/async (+13 each)
  • Bug fixes: 35
  • New source files: 7

Beta

This is a pre-release (0.18.0b1). Install with pip install selectools==0.18.0b1.
Regular pip install selectools stays on 0.17.7 until stable.

Checklist

  • All tests pass (2435 passed, 85 skipped)
  • Lint clean (black, isort, flake8, mypy, bandit)
  • Docs updated (README, ROADMAP, CHANGELOG, ARCHITECTURE, QUICKSTART, module docs, index)
  • Cross-reference audit passed (all counts consistent)
  • mkdocs build clean
  • 31 real-API evals pass across all 3 providers
  • 7 bug hunt rounds — codebase clean

AgentGraph engine with plain Python routing, SupervisorAgent with 4
strategies (plan_and_execute, round_robin, dynamic, magentic), HITL
via generator nodes that resume at exact yield point, parallel execution
with 3 merge policies, checkpointing with 3 backends, subgraph composition,
loop/stall detection, AgentGraph.chain() one-liner.

License switched from LGPL-3.0 to Apache-2.0.

10 new StepTypes (27 total), 13 new observer events (45/42 sync/async),
7 examples, 2 module docs, full roadmap detail through v0.21.0.

Tests: 2397 passing (+122), Examples: 61 (+7)
Prompt compression, entity memory, knowledge graph, and knowledge memory
all inject Message(role=Role.SYSTEM) into conversation history. Anthropic
requires system instructions in the top-level `system` parameter only —
passing role="system" in the messages array returns 400.

Fix: AnthropicProvider._format_messages() now converts SYSTEM messages to
user role. GeminiProvider._format_contents() gets the same explicit
handling (was accidentally working via the else fallback).

2 regression tests added.
…Gemini)

Validates AgentGraph with actual API calls against all 3 providers:
- Linear graphs, conditional routing, callable nodes, cross-provider graphs
- SYSTEM message injection (the exact bug reported by user)
- Mixed OpenAI+Anthropic pipeline in one graph

All 9 pass. Run with: pytest tests/test_orchestration_e2e.py --run-e2e
…cks)

Replaced all mock-based evals with actual API calls against OpenAI,
Anthropic, and Gemini. Every test parametrized across all 3 providers.

Validates real scenarios:
- Tool calling accuracy (correct tool for capital/math questions)
- Multi-step pipelines (2-agent chains produce coherent output)
- SYSTEM message survival (prompt compression, entity memory injection)
- Cross-provider pipelines (OpenAI -> Anthropic in one graph)
- HITL interrupt/resume with real LLM agent after gate
- Parallel execution with real agents

22 tests, 0 mocks, all pass. Run: pytest tests/test_orchestration_evals.py --run-e2e
astream()/arun() parity (5 fixes):
- astream() now fires all observer events (on_graph_start/end, on_node_start/end, etc.)
- astream() enforces max_visits per node
- astream() increments stall_count and records GRAPH_STALL trace steps
- astream() records GRAPH_LOOP_DETECTED trace steps
- astream() wraps _resolve_next_node in try-except

Routing fixes (2):
- _Update return type from routers now handled (applies patch, follows edge)
- Router returning list of non-Scatter objects raises clear error

Security (1):
- FileCheckpointStore sanitizes graph_id to prevent path traversal

Observer/trace (2):
- Sync generator nodes now fire on_graph_interrupt event
- on_graph_resume event fires when loading from checkpoint

Supervisor (3):
- _looks_complete() uses end-of-string matching (no more false positives)
- _call_planner() logs exceptions instead of swallowing silently
- Error messages updated for _Update support
Tests that would have caught the 15 bugs from the bug hunt:
- astream() observer events fire (caught bugs 1-4)
- astream() enforces max_visits (caught bug 2)
- astream() tracks stall count (caught bug 3)
- astream() routing error yields ERROR event (caught bug 10)
- astream() result matches arun() output (catches parity drift)
- update() routing applies patch and follows edge (caught bug 5)
- Router returning list of strings raises (caught bug 6)
- on_graph_resume fires on checkpoint load (caught bug 15)
- FileCheckpointStore rejects path traversal (caught bug 7)

These are framework tests (no API calls) that verify plumbing,
not LLM behavior. They complement the real-API evals.
Critical:
- SimpleStepObserver: swapped arg order in 13 graph event callbacks
  (was run_id, event_name — should be event_name, run_id)

High:
- Supervisor: "DONE" check now case-insensitive (handles "done", "Done")

Medium:
- Magentic strategy: tries "task" field before "reason" from LLM
- Removed unused ThreadPoolExecutor import in graph.py
- _scatter_patches cleaned up on error (try-finally)
- Checkpoint: deep copy _interrupt_responses (was shallow)
Agent core:
- _acall_provider() fired BOTH sync and async on_llm_start/end — now
  only fires async observers in async path (was double-notifying)
- _astreaming_call() sync fallback used bare `if chunk:` instead of
  `isinstance(chunk, str)` — ToolCall objects could be stringified
- Entity/KG extraction used config.model instead of _effective_model
  (wrong model when model_selector active)

Providers:
- OpenAI/Ollama _format_content() now guards None content with `or ""`
- Gemini tool_result guards None content with `or ""`
- Anthropic tool_result guards None content with `or ""`
Reverted the _acall_provider observer change — sync and async observers
are DIFFERENT types that both need to fire in async paths. The "duplicate
observer" report was a false positive.

Remaining 5 real fixes from full-system bug hunt:
- _astreaming_call() sync fallback: isinstance(chunk, str) guard
- Entity/KG extraction: _effective_model instead of config.model
- OpenAI/Ollama _format_content(): None guard with `or ""`
- Gemini tool_result: None guard with `or ""`
- Anthropic tool_result: None guard with `or ""`
Plain Python composability layer (anti-LCEL). No Runnable protocol,
no base class, no paid debugger required.

- @step decorator wraps plain functions (callable as normal Python)
- | operator creates Pipeline (thin list of callables)
- parallel() fans out to multiple steps, returns dict of results
- branch() routes to named steps via classifier function
- retry and on_error="skip" per step
- Pipeline.__call__ bridges to AgentGraph (usable as graph node)
- Async support: pipeline.arun() awaits async steps
- Auto-tracing: every step records name, duration, status

36 tests, all pass.
- on_error="skip" no longer increments steps_run (was counting skipped steps)
- Removed unreachable else blocks in retry for-loop (dead code)
- branch() no longer calls asyncio.run() (was crashing in async contexts)
- Removed dead code in async arun() retry path
- _filter_kwargs() inspects function signature before passing kwargs.
  Steps without **kwargs no longer crash when pipeline has extra kwargs.
  Applied to _execute_step, _aexecute_step, parallel(), and branch().
- Replaced global test counter with make_flaky() factory for isolation.
- Added 5 kwargs tests.

Tests: 2435 passed (+5)
Same pattern as the entity/KG extraction fix: coherence_model and
summarize_model fallbacks used config.model (static) instead of
_effective_model (respects model_selector). 3 locations fixed.
- CLAUDE.md: added pipeline.py to tree, test count 2435, roadmap updated
- README.md: test count 2435, composable pipelines in What's New + features
- CHANGELOG.md: added composable pipelines section, 35 bug fixes, stats
- docs/index.md: test count 2435, pipeline in feature table + learning path
- ROADMAP.md: v0.18.0 includes pipelines, v0.18.x now "Advanced Composition"
- landing/index.html: test 2435, composable pipelines card, pills updated
- mkdocs build: clean (no warnings)
Version 0.18.0b1 (beta). Regular pip install stays on 0.17.7.
Install with: pip install selectools==0.18.0b1
Stable 0.18.0 will ship after real-world validation.
- README.md: added beta install note at top of What's New
- CHANGELOG.md: header changed to [0.18.0b1] with beta note
- landing/index.html: footer version updated to v0.18.0b1
- docs/CHANGELOG.md: synced
@johnnichev johnnichev merged commit ad60a6b into main Mar 26, 2026
3 of 8 checks passed
@johnnichev johnnichev deleted the feat/v0.18.0-multi-agent branch March 26, 2026 16:16
johnnichev added a commit that referenced this pull request Apr 7, 2026
Three small fixes from the latest review pass.

1. **pip install terminal full-width on desktop** (Image #32):
   Removed `max-width: 440px` from `.terminal-install`. The terminal
   now spans 100% of the hero-content column, matching the width of
   the "Try the Builder" + "Read the Docs" button row directly below
   it. The hero grid (1fr 1fr split until 1024px) still constrains
   the column width itself, so this doesn't span the entire viewport
   on wide screens — it spans the column where the buttons live,
   which is what aligns them.

2. **SVG flow lines lingering after nodes fade out** (Image #33):
   Root cause: in the hero flow scene transitions, nodes fade via
   CSS `transition: opacity 0.3s` when `el.style.opacity = '0'`, but
   the SVG `<line>` elements (and `<circle>` pulses inside the same
   `<svg>`) had no fade animation — they stayed fully opaque until
   the next scene's `buildScene()` cleared the SVG via
   `svg.textContent = ''`. The visible result: nodes vanish at
   t=300ms, lines hang in the air alone for ~100ms, then snap to the
   next scene at t=400ms.

   Fix: added `transition: opacity 0.3s var(--ease)` to `.hf-svg`,
   and in playScene's "all done" handler set `svg.style.opacity = '0'`
   at the same time as `el.style.opacity = '0'` on the nodes. In
   buildScene, reset `svg.style.opacity = '1'` before drawing the
   next scene's lines so the new content fades in cleanly. Lines and
   nodes now vanish on the same curve.

3. **CONTRIBUTING.md stale facts** (user pointed out the version):
   The user noticed v0.19.2 in the header. While I was there I caught
   several other stale items and fixed them in the same pass:

   - Header version: v0.19.2 → v0.20.1 (current)
   - Header Python: "3.9+" → "3.9 – 3.13" (matches actual classifiers)
   - Header test status: "100%" → "95% coverage" (matches reality)
   - Project structure: "24 pre-built tools" → "33 pre-built tools"
     (verified via `grep -c '^@tool' src/selectools/toolbox/*.py`)
   - Project structure: "61 numbered examples (01–61)" → "88 numbered
     examples" (verified via `find examples -maxdepth 1 -name '*.py'
     | wc -l`)
   - Release script examples: 0.5.1 → 0.20.2 (current minor + 1)
   - Test command: `python tests/test_framework.py` → `pytest tests/`
     (the legacy single-file runner doesn't exist anymore)
   - Provider test path: `tests/test_framework.py` → `tests/providers/
     test_your_provider.py` (current convention)
   - Section header: "Adding RAG Features (New in v0.8.0!)" → "Adding
     RAG Features" (v0.8.0 was many releases ago)

   The biggest substantive fix: rewrote the "Adding a New Tool"
   example. The old example used `Tool(name=..., parameters=[
   ToolParameter(...)])`, which is the legacy class-based API.
   Selectools has used the `@tool()` decorator pattern for many
   versions now, where the function signature and docstring are
   introspected automatically. The old example was actively
   misleading new contributors into using a deprecated API. New
   example shows the modern decorator pattern with proper docstring
   conventions.

What's NOT in this PR:

- Full rewrite of the project structure block — only the genuinely
  stale numeric facts (24, 61) were fixed; the listed file names are
  still mostly accurate and a full architectural audit is out of
  scope for "the version is outdated"
- No CHANGELOG entry — these are doc fixes, not user-facing code
- No version bump
johnnichev added a commit that referenced this pull request Apr 7, 2026
Three small fixes from the latest review pass.

1. **pip install terminal full-width on desktop** (Image #32):
   Removed `max-width: 440px` from `.terminal-install`. The terminal
   now spans 100% of the hero-content column, matching the width of
   the "Try the Builder" + "Read the Docs" button row directly below
   it. The hero grid (1fr 1fr split until 1024px) still constrains
   the column width itself, so this doesn't span the entire viewport
   on wide screens — it spans the column where the buttons live,
   which is what aligns them.

2. **SVG flow lines lingering after nodes fade out** (Image #33):
   Root cause: in the hero flow scene transitions, nodes fade via
   CSS `transition: opacity 0.3s` when `el.style.opacity = '0'`, but
   the SVG `<line>` elements (and `<circle>` pulses inside the same
   `<svg>`) had no fade animation — they stayed fully opaque until
   the next scene's `buildScene()` cleared the SVG via
   `svg.textContent = ''`. The visible result: nodes vanish at
   t=300ms, lines hang in the air alone for ~100ms, then snap to the
   next scene at t=400ms.

   Fix: added `transition: opacity 0.3s var(--ease)` to `.hf-svg`,
   and in playScene's "all done" handler set `svg.style.opacity = '0'`
   at the same time as `el.style.opacity = '0'` on the nodes. In
   buildScene, reset `svg.style.opacity = '1'` before drawing the
   next scene's lines so the new content fades in cleanly. Lines and
   nodes now vanish on the same curve.

3. **CONTRIBUTING.md stale facts** (user pointed out the version):
   The user noticed v0.19.2 in the header. While I was there I caught
   several other stale items and fixed them in the same pass:

   - Header version: v0.19.2 → v0.20.1 (current)
   - Header Python: "3.9+" → "3.9 – 3.13" (matches actual classifiers)
   - Header test status: "100%" → "95% coverage" (matches reality)
   - Project structure: "24 pre-built tools" → "33 pre-built tools"
     (verified via `grep -c '^@tool' src/selectools/toolbox/*.py`)
   - Project structure: "61 numbered examples (01–61)" → "88 numbered
     examples" (verified via `find examples -maxdepth 1 -name '*.py'
     | wc -l`)
   - Release script examples: 0.5.1 → 0.20.2 (current minor + 1)
   - Test command: `python tests/test_framework.py` → `pytest tests/`
     (the legacy single-file runner doesn't exist anymore)
   - Provider test path: `tests/test_framework.py` → `tests/providers/
     test_your_provider.py` (current convention)
   - Section header: "Adding RAG Features (New in v0.8.0!)" → "Adding
     RAG Features" (v0.8.0 was many releases ago)

   The biggest substantive fix: rewrote the "Adding a New Tool"
   example. The old example used `Tool(name=..., parameters=[
   ToolParameter(...)])`, which is the legacy class-based API.
   Selectools has used the `@tool()` decorator pattern for many
   versions now, where the function signature and docstring are
   introspected automatically. The old example was actively
   misleading new contributors into using a deprecated API. New
   example shows the modern decorator pattern with proper docstring
   conventions.

What's NOT in this PR:

- Full rewrite of the project structure block — only the genuinely
  stale numeric facts (24, 61) were fixed; the listed file names are
  still mostly accurate and a full architectural audit is out of
  scope for "the version is outdated"
- No CHANGELOG entry — these are doc fixes, not user-facing code
- No version bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant