Skip to content

Releases: robot-accomplice/roboticus-rust

v0.11.4

06 Apr 10:03

Choose a tag to compare

Theme: Adaptive Intelligence

The router learns from every turn. The pipeline owns its own crate.

Added

  • Pipeline decomposition: run_pipeline(), core inference engine, stage functions, decomposition, heuristics, and task state moved to roboticus-pipeline crate (50 source files, ~6,500 lines).
  • Capability traits: PipelineSecurity, InferenceRunner, PipelineDeps — trait-scoped dependency injection replaces &AppState across the entire inference path.
  • Shadow routing predictions: Live recording of metascore vs heuristic routing decisions for offline outcome evaluation.
  • Metascore routing fitness tests: 16 profile-level + 4 end-to-end routing tests verifying metascore overrides, breaker exclusion, cost weight, session penalty, accuracy floor.
  • Startup baselining fitness tests: 5 tests verifying cold-start model quality seeding and its influence on routing.
  • Architecture fitness tests: 14 tests enforcing connector thinness, dependency direction, no AppState in pipeline/core/stages.
  • Dashboard markdown rendering: Consistent ##, ###, *italic*, **bold** rendering across memory page entries.

Changed

  • PipelineRequest uses PipelineDeps (trait refs) instead of &AppState.
  • PipelineError uses u16 status codes (framework-agnostic).
  • Bot command dispatch moved from pipeline to connector pre-pipeline step.
  • Nickname refinement uses InferenceRunner trait instead of direct provider key resolution.

Fixed

  • Degraded response on follow-up requests: Cache discard falls through to fresh inference; protocol rescue tries stripping before fallback.
  • Migration 31+33 crash on fresh databases: Guard has_table() before ALTER TABLE pipeline_traces.
  • All #[ignore] classifier tests: Fixed to pass with n-gram fallback.
  • MSRV compliance: is_multiple_of() replaced with modulo for Rust 1.85.

v0.11.3

03 Apr 08:41

Choose a tag to compare

Theme: Complete the Operating Picture

The agent can perceive everything, act everywhere, and explain every decision.

Added

  • Matrix E2EE (Olm/Megolm): vodozemac-based encryption for Matrix adapter — device key lifecycle, Olm session establishment, Megolm group encrypt/decrypt, to-device key exchange, JSON-file key persistence.
  • Windows AppContainer sandboxing: Job Object confinement for script execution — memory limits, kill-on-close, process count caps with graceful degradation.
  • Cognitive scaffold architecture principle (ARCHITECTURE.md §4): durable context injection, structured self-knowledge, continuity preservation, learning from failure, referenceability.
  • Topic segmentation: Embedding-free topic detection at message storage, topic-aware context assembly (current-topic full, off-topic summarized), topic_tag column on session_messages.
  • Context compaction: Deduplication, compression, budget enforcement between retrieval and context assembly — 25% L0 budget cap for memory.
  • Ambient recency injection: Last 2 hours of episodic memories injected into every turn regardless of query similarity.
  • SVG pipeline flow graph: Interactive flow visualization with connected nodes, directional arrows, click-to-inspect floating popovers. Full pipeline trace from input_validation through guard_chain.
  • Guard stage in pipeline traces: Guard outcomes visible in flow visualization with per-guard annotations.
  • TASK_DEFERRAL + FALSE_COMPLETION semantic banks: 10 exemplars each for detecting narrated-future-action and unverified completion claims.
  • EXECUTION intent exemplars expanded: Task-management verbs (close out, implement, carry out, handle, process, complete).
  • DELEGATION intent exemplars expanded: Indirect delegation patterns (implement recommendations using agents, direct agents to carry out plan).
  • Delegation workflow Report step: Mandatory step 6 — orchestrator must present delegation results to user since subagents run in isolated sessions.
  • FIRMWARE.toml rules migration: roboticus update auto-migrates [[rules]] array to [rules] table format.
  • Generic channel poll loop: Single channel_poll_loop replaces 4 near-identical platform loops.
  • Database indexes: tasks(status), cron_jobs(enabled, next_run_at), transactions(created_at DESC).
  • sandbox_required config flag: Abort script execution if OS-level sandboxing unavailable.
  • #[instrument] on pipeline functions: Automatic span tracing on run_pipeline, agent_message_stream, process_channel_message.
  • CORS layer: tower-http CorsLayer on the API router.
  • CSP Google Fonts: fonts.googleapis.com + fonts.gstatic.com in Content-Security-Policy.
  • God file splits: main.rs 3408→1454, run.rs 3113→2432, update.rs 3058→973, transform.rs 2876→289.

Fixed

  • Dashboard JS syntax error: Orphaned renderWallet in efficiency.js + duplicate IIFE close in websocket.js — killed all dashboard interactivity since 2.27 decomposition.
  • InternalJargonGuard: Migrated from 11-string word list to NARRATED_DELEGATION semantic classifier (threshold 0.8). Subagent leak check respects user prompt context.
  • TaskDeferralGuard: Migrated from "let me"/"I'll" word list to TASK_DEFERRAL semantic bank.
  • ExecutionTruthGuard: Uses FALSE_COMPLETION score, not bare intent trigger. Recommendations no longer treated as false delegation claims.
  • ModelIdentityTruth: ||&& — stops replacing 1,649-char responses that have ≤3 lines. Redacts model name in substantive responses instead of replacing entire content.
  • Deterministic fallback: Preserves user's topic snippet instead of generic "did not meet quality standards."
  • Tool retry loop: Surfaces actual tool error instead of "same tool call kept repeating."
  • Behavioral contract §1.5: Latest user message takes priority over stale plans.
  • Cron/subagent session isolation: Dedicated sessions per invocation prevent pollution of user conversations.
  • Cron query optimization: WHERE enabled=1 pushed to SQL; uses new index.
  • Delivery queue in-flight recovery: Stale in-flight items auto-recovered after 5 minutes.
  • CapacityTracker bounded: Hard cap at 10,000 events per vector.
  • Wallet HTTP timeout: 30s + 5s connect_timeout (was unbounded).
  • Matrix crypto file permissions: 0o600 on crypto_state.json.
  • FIRMWARE.toml schema: Accepts both [[rules]] array and [rules] table formats.
  • Codex CLI plugin: Removed invalid --non-interactive flag, fail-fast error handling.
  • Channel health recovery: Removed stale recompute override in channel_status().
  • Theme dedup: Reversed retain order so catalog overrides built-in.
  • prune_old_backups dedup: Moved to roboticus-core, imported in roboticus-api.
  • Rate-limit headers: .unwrap().expect("numeric header value").
  • Embedding/classifier log levels: DEBUG → TRACE for per-request embedding and centroid computation.
  • Streaming guard dedup: GuardContext::for_streaming() replaces triplicated construction.
  • Discord intents: Magic numbers replaced with named constants.
  • Intent classification: "close out those 23 revenue tasks" now correctly matches EXECUTION.

v0.11.2+hotfix.1

30 Mar 09:45

Choose a tag to compare

v0.11.2 hotfix 1

v0.11.1

29 Mar 13:53

Choose a tag to compare

Added

  • TaskOperatingState: Planner as sole authority for action dispatch — replaces 10 retired shortcuts with structured decision-making.
  • Streaming guard chain: Buffer-first + post-stream guard checks with client-side recovery (stream_retry, stream_replace, stream_blocked).
  • SemanticClassifier hardening: Typed category constants, per-bank thresholds, TrustLevel (Neural/NGram), AbstainPolicy, ClassificationTrace.
  • Platform behavioral contract: User intent sovereignty, voice boundaries, output originality, behavioral self-awareness.
  • App installer: roboticus apps install/uninstall/list commands with manifest-driven profile creation, skill/theme/subagent deployment.
  • Profile system: roboticus profile create/switch/list/delete for isolated agent configurations.
  • GM soak framework: 100-turn soak tests with recommendation engine for FIRMWARE, OS, and context health diagnostics.
  • Adaptive retrieval strategy: Decision layer for cache-only, indexed, live discovery, and direct verification modes.
  • Post-turn observer dispatch: Subagents can observe conversation turns for background processing.
  • Dashboard: Pipeline Traces, Flight Recorder, Delegation Outcomes tabs; theme skinning; skill create/edit with markdown rendering.

Changed

  • Semantic intent classification replaces keyword matching across intent registry, guard detection, and specialist workflow routing.
  • File decomposition: guard_registry (2399→940), tests (10205→1119), core (2450→submodules), shortcuts (2264→submodules).
  • Shortcut retirement: 10 high-damage shortcuts removed; CronTool replaces CronShortcut; AcknowledgementShortcut retained.

Fixed

  • SubagentRegistry startup race (WEB-174): atomic start_agent transition eliminates agents stuck in "booting".
  • Keystore test flakes: mutex serialization for tests sharing global machine-id file.
  • Dashboard blank bubble: provider errors now preserve error state instead of showing empty message.
  • Specialist workflow dead code: case mismatch in category lookup (lowercase vs uppercase) fixed via typed constants.
  • Infrastructure leaks: guard chain catches subagent name exposure, capability dumps, and narrated delegation.
  • Stream finalization: preserves interception notices instead of overwriting with empty content.

v0.11.0

25 Mar 08:09

Choose a tag to compare

Added

  • Agent efficacy push: Memory introspection, selective forgetting, relationship-memory automation, and task operating state are now first-class parts of the shared pipeline.
  • Introspection-driven execution: Task-oriented turns now begin from introspected runtime truth, can compose specialists from a clean slate, and preserve executed state across normalization retries.
  • MCP release-grade management: Shared /api/mcp/servers management surface aligned across dashboard and CLI.
  • Skill and subagent utilization telemetry: Usage count and last-used signals exposed for skills and subagents.
  • Release vetting automation: Added scripts/run-v0110-vetting.sh and docs/testing/v0110-vetting-matrix.md to lock in the v0.11.0 regression contract.

Changed

  • Delegation and composition flow: Empty-roster specialist requests now progress into composition and delegation instead of collapsing into narrated intent or centralization.
  • Task-path truthfulness: Filesystem/runtime blockers now surface as real blockers instead of canned fallback prose.
  • Prompt and planner behavior: Introspection is now treated as the first operational step for task work and feeds a shared task operating state/action planner.
  • Release documentation: Active docs, architecture notes, roadmap entries, and release gates now reflect shipped v0.11.0 behavior.

Fixed

  • Pipeline trace drift: Legacy databases missing pipeline_traces.session_id and related fields are repaired on boot.
  • Prompt Performance persistence: Routing weights and context budget now persist and rehydrate correctly.
  • Dashboard regressions: Repaired session archive, raw TOML editing, Observability nav, semantic memory navigation, roster skill drill-down, workspace symmetry, and related operator-surface gaps.
  • Analysis/recommendation soak failures: Live deep-analysis surfaces validated against a stronger provider path during soak.

v0.10.0: Correctness, Safety & Operational Maturity

23 Mar 08:30

Choose a tag to compare

Highlights

  • Model Categorization (Phase 1): 10 task categories × 29 model profiles × 8 providers → category-aware routing with category_fit metascore dimension
  • Skill Authoring API: Create, validate, and publish Markdown instruction skills via POST /api/skills/author
  • Landlock & Job Object Confinement: Linux filesystem sandboxing (landlock) and Windows process isolation (Job Objects) for script execution
  • Typestate Sessions: Compile-time session lifecycle enforcement — Session<Created>Session<Active>Session<Closed>
  • Delegation Scoring Engine: score_agent_fit(), composite_fit_ratio(), and utility_margin_for_delegation() for principled decomposition decisions
  • TUI Client: Interactive terminal interface via roboticus tui with streaming responses and WebSocket transport
  • Codex CLI Plugin: Delegate coding tasks to OpenAI Codex CLI with structured output

Critical Fixes

  • Signal adapter: replaced std::sync::Mutex in async context with bounded tokio::sync::mpsc — eliminates runtime thread blocking
  • Signal adapter: added governor::RateLimiter (5 req/s default) to prevent signal-cli daemon DoS
  • Delivery queue: proper HTTP status code extraction for permanent vs transient error classification
  • Formatter: 3-allocation chain → single-pass clean_content() across all channel formatters

Infrastructure

  • DedupTracker moved from LlmService (requiring write lock) to AppState (standard Mutex) — reduced lock contention
  • L0 context budget increased from 4,000 → 8,000 tokens
  • Codecov upload made best-effort (quality gates still enforced at 80% minimum + regression check)
  • Cross-platform CI: Windows HANDLE type migration for windows-sys 0.59+
  • Dependabot, Docker Compose, backup/restore runbook, observability guide

Stats

  • 302 files changed across 12 crates
  • 36,542 additions, 4,809 deletions
  • 83.84% test coverage (above 80% gate)
  • All 28 CI jobs green (Linux, macOS, Windows)

Full changelog · Release notes

v0.9.9

20 Mar 15:41

Choose a tag to compare

Added

  • Terminal user interface (roboticus tui): New roboticus-tui crate providing an interactive terminal UI with chat, log viewer, and status bar. Connects to a running server via REST API + WebSocket, supports session create/resume, streaming responses, and keyboard navigation.
  • Context budget tuning: Configurable L0-L3 token budgets via [context_budget] config section, dashboard range sliders, and per-channel minimum complexity level. Replaces hardcoded 4k/8k/16k/32k defaults.
  • Integrations management: Unified channel health monitoring with POST /api/channels/{platform}/test probe endpoint, dashboard integrations panel with per-channel test buttons, and roboticus integrations CLI command group.
  • Tool output noise filter: New ToolOutputFilterChain with four filters (AnsiStripper, ProgressLineFilter, DuplicateLineDeduper, WhitespaceNormalizer) applied to tool output before LLM observation, reducing token waste from ANSI escapes, progress bars, and duplicate lines.
  • Dashboard config exposure: All configuration sections now appear in the web dashboard form even when not yet configured. Unconfigured channels show "Enable" buttons. Powered by a CONFIG_SCHEMA merge before rendering.
  • Routing profile polish: Validation warning when slider weights exceed 1.0, persistence toast on apply, and default profile display (0.33/0.33/0.33) when no routing config exists.

Changed

  • Default bind address: Changed from 127.0.0.1 to localhost across all defaults, config generation, documentation, and justfile. OAuth loopback callbacks and test ephemeral ports remain as IP literals per RFC 8252.

Fixed

  • Script runner tests on Windows: Added #[cfg(unix)] guard on script_runner::tests module that uses std::os::unix::fs::PermissionsExt, preventing compilation errors on Windows.

v0.9.7

14 Mar 18:53

Choose a tag to compare

Added

  • DB fitness hardening (DF-1–DF-18): 18-item SQLite performance audit resolved — retention pruning for 5 high-growth tables, orphan cleanup sweeps (working memory + embeddings), auto_vacuum=INCREMENTAL, 6 missing indexes, episodic dead-entry pruning, cache NULL-expiry fix, PRAGMA synchronous=NORMAL under WAL, CHECK constraints on 11 columns, and dead proxy_stats table removal.
  • Memory hygiene mechanic: ironclad mechanic detects and (with --repair) purges contaminated memory entries using 7 deterministic LIKE-prefix patterns across 3 tiers, with JSON-structured findings.
  • Sandbox boundary management: Filesystem confinement for skill scripts (skills_dir + $IRONCLAD_WORKSPACE, no traversal/symlink escape), configurable network isolation (unshare(CLONE_NEWNET) on Linux), memory ceiling via RLIMIT_AS, interpreter allowlist via absolute-path resolution, and mechanic sandbox health reporting.
  • Filesystem security overhaul: FilesystemSecurityConfig with workspace_only mode, ~25 default protected path patterns, tool_allowed_paths whitelist (auto-populated from Obsidian vault path), macOS sandbox-exec write-denial confinement, and dashboard UI toggles.
  • Unified pipeline architecture: IntentRegistry (22-variant Intent enum), GuardChain (12 guards with full()/cached()/streaming() presets), ShortcutDispatcher (15 handlers replacing 983-line god function), PipelineConfig (4 presets: api/streaming/channel/cron), and DedupGuard RAII replacing 11 manual release patterns. Net ~653 lines removed.
  • ChannelFormatter trait: Per-platform output formatting with static dispatch registry — TelegramFormatter (Markdown→MarkdownV2), DiscordFormatter, WhatsAppFormatter, SignalFormatter, WebFormatter, EmailFormatter — wired into channel_message.rs delivery path. 31 unit tests.
  • Configurable inference timeouts: Per-provider timeout_seconds setting ([providers.*.timeout_seconds]) with 300-second default, surfaced in dashboard provider configuration.
  • Dashboard session ID copy button: One-click copy-to-clipboard for session IDs in the Sessions panel.

Fixed

  • Circuit breaker window reset: record_failure() now tracks window_start for rolling-window accumulation — failures spaced ~60s apart correctly accumulate instead of resetting.
  • Embedding auth for local providers: EmbeddingConfig.is_local skips API key resolution and auth headers for Ollama/llama.cpp.
  • Cron schedule_kind: "once" support: Runtime maps "once" → "at" dispatch, calls DurableScheduler::evaluate_at(), auto-disables after single execution.
  • Vault path whitelisting: tool_allowed_paths auto-populated from obsidian.vault_path during config normalization — workspace-only mode no longer blocks configured external paths.
  • Fleet activity chart capacity model: Stacked area normalizes per-agent scores by 1/agentCount with fixedMax: 1.0.
  • Cache guard parity: cached() guard set now includes SubagentClaim + LiteraryQuoteRetry (previously missing).
  • ExecutionTruthGuard: Tool-results bypass bug removed.
  • Collapsible if lint: Updated impl_core.rs to use if let chain (edition 2024).
  • Wallet RPC rate-limit backoff: get_all_balances() detects rate-limit error codes (-32016, -32005, 429) and stops iterating remaining tokens instead of repeatedly hitting the provider.
  • Cron once-type orphan jobs: Jobs with schedule_kind: "once" and no schedule_expr are now auto-disabled on first encounter instead of emitting a warning every 60s.
  • Dashboard sidebar footer: Navigation bar footer now stays pinned to the bottom of the viewport (added height: 100% to sidebar container).
  • Dashboard custom model Add button: Custom model text input row now has its own Add button; both Add buttons use a shared class selector.
  • Telegram double-underscore italic: __text__ was incorrectly emitted as Telegram underline instead of italic — formatter now maps to _text_.
  • Config hot-reload path divergence: normalize_paths() and merge_bundled_providers() were skipped during hot-reload — reloaded configs now match boot-time normalization.
  • Routing audit fixes: Attempt counter not incrementing on retry, u32 truncation on cost metrics, misleading timeout error message wording.
  • Dashboard UI stall during inference: 4 RwLock guard-scope fixes release locks before async I/O, preventing cascading reader starvation.
  • Cron semaphore hot-reload race: Semaphore not released when cron runtime reloads config, causing phantom permit exhaustion. Dead LlmService method removed, lock consolidation in admin routes.
  • Agent audit fixes: Tautological always-true test condition, timeout hint parsing edge case, unreachable branch removal.

v0.9.6

12 Mar 14:56

Choose a tag to compare

Added

  • Compliance-first self-funding control plane: Complete revenue opportunity lifecycle (intake → qualify → score → plan → fulfill → settle) with DB-backed restart safety, strategy-level scoring (confidence/effort/risk/priority/recommendation), feedback persistence per opportunity and summary by strategy, configurable post-settlement asset routing (default PALM_USD), EVM swap submission with tx-hash tracking and on-chain receipt reconciliation, tax payout lifecycle mirroring swap tasks, and operator-visible accounting (net profit, attributable costs, retained earnings, tax allocation) across API, CLI, and mechanic surfaces.
  • Revenue mechanic integration: Mechanic can probe, reconcile, and repair orphaned or stale revenue jobs and swap/tax reconciliation mismatches via run_gateway_provider_and_revenue_checks and run_gateway_integrated_repair_sweep.
  • Skills catalog: PluginCatalog with CLI flows (ironclad skills catalog list/install/activate) and API endpoints (GET/POST /api/skills/catalog, /install, /activate). Registry manifest fetch from remote URL.
  • Skill registry protocol: Migration 022 adds version, author, registry_source columns to skills table. Multi-registry support via RegistrySource { name, url, priority, enabled } with backward-compatible fallback from legacy single-URL registry_url.
  • Multi-registry fetch: Registry sync iterates all configured sources, namespaces skills as {registry_name}/{skill_name} for non-local sources, applies semver comparison to skip redundant downloads, and resolves conflicts by registry priority.
  • Learning loop closure: Agent now detects repeating multi-step tool sequences on session close and synthesizes reusable SKILL.md procedure files. learned_skills table (migration 021) tracks reinforcement history (success/failure counts, priority). LearningConfig exposes tuneable thresholds for minimum sequence length, success ratio, priority boost/decay, and skill cap. Inspired by recent work on autonomous tool-use learning in LLM agents (arXiv:2603.05344).
  • Procedural failure recording: record_procedural_failure() (previously dead code in the DB layer) is now called from ingest_turn() when tool results indicate failure, closing the procedural memory feedback loop.
  • Skill priority adjustment: Governor tick() now runs adjust_learned_skill_priorities() after episodic decay — learned skills with high success ratios get priority boosts; those with poor ratios get decayed.
  • Skill subdirectory loading: SkillLoader now recurses into learned/ subdirectory, loading machine-synthesized skills alongside hand-authored ones.
  • Progressive context compaction: 5-stage compaction (TrimSummarizeArchiveEvictEmergency) in compact_before_archive() with CompactionStage::from_excess() selector.
  • Decay-weighted episodic retrieval: rerank_episodic_by_decay() applies time-based decay at retrieval time, preventing stale context from dominating memory budget.
  • Instruction anti-fade micro-reminders: Event-driven system prompt reinforcement at agent decision points to combat instruction-following drift.
  • x402 autonomous payment: LLM HTTP client now handles 402 Payment Required responses with autonomous on-chain payment and request retry.
  • Homebrew & Winget packaging: release.yml contains complete update-homebrew (SHA256 extraction, formula generation, tap push) and update-winget (vedantmgoyal9/winget-releaser@v2) jobs. Activation requires tap repo creation and secrets provisioning.

v0.9.5

06 Mar 22:06
ea076fb

Choose a tag to compare

Changed

  • Behavior soak hardening: scripts/run-agent-behavior-soak.py now includes regression checks for filesystem capability truthfulness, subagent capability response quality, and affirmative continuation quality, with rubric updates to score substantive outcomes over brittle phrase matching.
  • Roadmap/release traceability: docs/releases/v0.9.5.md and docs/ROADMAP.md updated with current v0.9.5 prep status for speculative execution, browser runtime support, CLI skill roadmap slice, and behavior continuity validation.
  • Architecture documentation: Added explicit v0.9.5-prep control/dataflow coverage for deterministic execution shortcuts and guarded response sanitization in docs/architecture/ironclad-dataflow.md and docs/architecture/ironclad-sequences.md.
  • Browser runtime continuity: Browser action execution now attempts a single stop/start session recovery when CDP disconnect/closed-socket errors are detected, limited to idempotent actions to avoid duplicate side effects on replay.
  • Autonomy turn-budget controls: Added configurable agent-level ReAct budget controls (autonomy_max_react_turns, autonomy_max_turn_duration_seconds) and wired enforcement into the runtime loop.
  • CLI adapter response contract: run_script now emits stable typed metadata (adapter, schema_version, status, error_class) and normalized script error classes for downstream handling.
  • Speculative policy invariants: Added explicit test coverage enforcing Safe-only speculative eligibility (Caution/Dangerous/Forbidden remain excluded from speculative execution).

Fixed

  • Internal protocol fallback leakage: response sanitization no longer surfaces protocol-placeholder fallback text; empty/degraded sanitized content now resolves through deterministic user-facing quality fallback.
  • Markdown count execution reliability: execution shortcut path now handles recursive markdown-file count prompts deterministically, including strict numeric-only responses when requested (count only / only the number style prompts).
  • Delegation shortcut boundary: markdown-count shortcut no longer hijacks explicitly delegated prompts, preserving delegation intent handling.
  • Speculative branch cleanup safety: introduced RAII speculation slot guards and abort-path tests to guarantee no slot leakage when speculative tasks are canceled.
  • CLI skill sandbox isolation coverage: added explicit tests that secret env vars are stripped while only allowlisted runtime vars are propagated under skills.sandbox_env=true.