Releases: robot-accomplice/roboticus-rust
Releases · robot-accomplice/roboticus-rust
v0.11.4
Theme: Adaptive Intelligence
The router learns from every turn. The pipeline owns its own crate.
Added
- Pipeline decomposition:
run_pipeline(), core inference engine, stage functions, decomposition, heuristics, and task state moved toroboticus-pipelinecrate (50 source files, ~6,500 lines). - Capability traits:
PipelineSecurity,InferenceRunner,PipelineDeps— trait-scoped dependency injection replaces&AppStateacross the entire inference path. - Shadow routing predictions: Live recording of metascore vs heuristic routing decisions for offline outcome evaluation.
- Metascore routing fitness tests: 16 profile-level + 4 end-to-end routing tests verifying metascore overrides, breaker exclusion, cost weight, session penalty, accuracy floor.
- Startup baselining fitness tests: 5 tests verifying cold-start model quality seeding and its influence on routing.
- Architecture fitness tests: 14 tests enforcing connector thinness, dependency direction, no
AppStatein pipeline/core/stages. - Dashboard markdown rendering: Consistent
##,###,*italic*,**bold**rendering across memory page entries.
Changed
PipelineRequestusesPipelineDeps(trait refs) instead of&AppState.PipelineErrorusesu16status codes (framework-agnostic).- Bot command dispatch moved from pipeline to connector pre-pipeline step.
- Nickname refinement uses
InferenceRunnertrait instead of direct provider key resolution.
Fixed
- Degraded response on follow-up requests: Cache discard falls through to fresh inference; protocol rescue tries stripping before fallback.
- Migration 31+33 crash on fresh databases: Guard
has_table()beforeALTER TABLE pipeline_traces. - All
#[ignore]classifier tests: Fixed to pass with n-gram fallback. - MSRV compliance:
is_multiple_of()replaced with modulo for Rust 1.85.
v0.11.3
Theme: Complete the Operating Picture
The agent can perceive everything, act everywhere, and explain every decision.
Added
- Matrix E2EE (Olm/Megolm): vodozemac-based encryption for Matrix adapter — device key lifecycle, Olm session establishment, Megolm group encrypt/decrypt, to-device key exchange, JSON-file key persistence.
- Windows AppContainer sandboxing: Job Object confinement for script execution — memory limits, kill-on-close, process count caps with graceful degradation.
- Cognitive scaffold architecture principle (ARCHITECTURE.md §4): durable context injection, structured self-knowledge, continuity preservation, learning from failure, referenceability.
- Topic segmentation: Embedding-free topic detection at message storage, topic-aware context assembly (current-topic full, off-topic summarized),
topic_tagcolumn onsession_messages. - Context compaction: Deduplication, compression, budget enforcement between retrieval and context assembly — 25% L0 budget cap for memory.
- Ambient recency injection: Last 2 hours of episodic memories injected into every turn regardless of query similarity.
- SVG pipeline flow graph: Interactive flow visualization with connected nodes, directional arrows, click-to-inspect floating popovers. Full pipeline trace from input_validation through guard_chain.
- Guard stage in pipeline traces: Guard outcomes visible in flow visualization with per-guard annotations.
- TASK_DEFERRAL + FALSE_COMPLETION semantic banks: 10 exemplars each for detecting narrated-future-action and unverified completion claims.
- EXECUTION intent exemplars expanded: Task-management verbs (close out, implement, carry out, handle, process, complete).
- DELEGATION intent exemplars expanded: Indirect delegation patterns (implement recommendations using agents, direct agents to carry out plan).
- Delegation workflow Report step: Mandatory step 6 — orchestrator must present delegation results to user since subagents run in isolated sessions.
- FIRMWARE.toml rules migration:
roboticus updateauto-migrates[[rules]]array to[rules]table format. - Generic channel poll loop: Single
channel_poll_loopreplaces 4 near-identical platform loops. - Database indexes:
tasks(status),cron_jobs(enabled, next_run_at),transactions(created_at DESC). sandbox_requiredconfig flag: Abort script execution if OS-level sandboxing unavailable.#[instrument]on pipeline functions: Automatic span tracing onrun_pipeline,agent_message_stream,process_channel_message.- CORS layer:
tower-httpCorsLayer on the API router. - CSP Google Fonts:
fonts.googleapis.com+fonts.gstatic.comin Content-Security-Policy. - God file splits:
main.rs3408→1454,run.rs3113→2432,update.rs3058→973,transform.rs2876→289.
Fixed
- Dashboard JS syntax error: Orphaned
renderWalletinefficiency.js+ duplicate IIFE close inwebsocket.js— killed all dashboard interactivity since 2.27 decomposition. - InternalJargonGuard: Migrated from 11-string word list to NARRATED_DELEGATION semantic classifier (threshold 0.8). Subagent leak check respects user prompt context.
- TaskDeferralGuard: Migrated from "let me"/"I'll" word list to TASK_DEFERRAL semantic bank.
- ExecutionTruthGuard: Uses FALSE_COMPLETION score, not bare intent trigger. Recommendations no longer treated as false delegation claims.
- ModelIdentityTruth:
||→&&— stops replacing 1,649-char responses that have ≤3 lines. Redacts model name in substantive responses instead of replacing entire content. - Deterministic fallback: Preserves user's topic snippet instead of generic "did not meet quality standards."
- Tool retry loop: Surfaces actual tool error instead of "same tool call kept repeating."
- Behavioral contract §1.5: Latest user message takes priority over stale plans.
- Cron/subagent session isolation: Dedicated sessions per invocation prevent pollution of user conversations.
- Cron query optimization:
WHERE enabled=1pushed to SQL; uses new index. - Delivery queue in-flight recovery: Stale in-flight items auto-recovered after 5 minutes.
- CapacityTracker bounded: Hard cap at 10,000 events per vector.
- Wallet HTTP timeout: 30s + 5s connect_timeout (was unbounded).
- Matrix crypto file permissions: 0o600 on
crypto_state.json. - FIRMWARE.toml schema: Accepts both
[[rules]]array and[rules]table formats. - Codex CLI plugin: Removed invalid
--non-interactiveflag, fail-fast error handling. - Channel health recovery: Removed stale recompute override in
channel_status(). - Theme dedup: Reversed retain order so catalog overrides built-in.
- prune_old_backups dedup: Moved to
roboticus-core, imported inroboticus-api. - Rate-limit headers:
.unwrap()→.expect("numeric header value"). - Embedding/classifier log levels: DEBUG → TRACE for per-request embedding and centroid computation.
- Streaming guard dedup:
GuardContext::for_streaming()replaces triplicated construction. - Discord intents: Magic numbers replaced with named constants.
- Intent classification: "close out those 23 revenue tasks" now correctly matches EXECUTION.
v0.11.2+hotfix.1
v0.11.2 hotfix 1
v0.11.1
Added
- TaskOperatingState: Planner as sole authority for action dispatch — replaces 10 retired shortcuts with structured decision-making.
- Streaming guard chain: Buffer-first + post-stream guard checks with client-side recovery (stream_retry, stream_replace, stream_blocked).
- SemanticClassifier hardening: Typed category constants, per-bank thresholds, TrustLevel (Neural/NGram), AbstainPolicy, ClassificationTrace.
- Platform behavioral contract: User intent sovereignty, voice boundaries, output originality, behavioral self-awareness.
- App installer:
roboticus apps install/uninstall/listcommands with manifest-driven profile creation, skill/theme/subagent deployment. - Profile system:
roboticus profile create/switch/list/deletefor isolated agent configurations. - GM soak framework: 100-turn soak tests with recommendation engine for FIRMWARE, OS, and context health diagnostics.
- Adaptive retrieval strategy: Decision layer for cache-only, indexed, live discovery, and direct verification modes.
- Post-turn observer dispatch: Subagents can observe conversation turns for background processing.
- Dashboard: Pipeline Traces, Flight Recorder, Delegation Outcomes tabs; theme skinning; skill create/edit with markdown rendering.
Changed
- Semantic intent classification replaces keyword matching across intent registry, guard detection, and specialist workflow routing.
- File decomposition: guard_registry (2399→940), tests (10205→1119), core (2450→submodules), shortcuts (2264→submodules).
- Shortcut retirement: 10 high-damage shortcuts removed; CronTool replaces CronShortcut; AcknowledgementShortcut retained.
Fixed
- SubagentRegistry startup race (WEB-174): atomic start_agent transition eliminates agents stuck in "booting".
- Keystore test flakes: mutex serialization for tests sharing global machine-id file.
- Dashboard blank bubble: provider errors now preserve error state instead of showing empty message.
- Specialist workflow dead code: case mismatch in category lookup (lowercase vs uppercase) fixed via typed constants.
- Infrastructure leaks: guard chain catches subagent name exposure, capability dumps, and narrated delegation.
- Stream finalization: preserves interception notices instead of overwriting with empty content.
v0.11.0
Added
- Agent efficacy push: Memory introspection, selective forgetting, relationship-memory automation, and task operating state are now first-class parts of the shared pipeline.
- Introspection-driven execution: Task-oriented turns now begin from introspected runtime truth, can compose specialists from a clean slate, and preserve executed state across normalization retries.
- MCP release-grade management: Shared
/api/mcp/serversmanagement surface aligned across dashboard and CLI. - Skill and subagent utilization telemetry: Usage count and last-used signals exposed for skills and subagents.
- Release vetting automation: Added
scripts/run-v0110-vetting.shanddocs/testing/v0110-vetting-matrix.mdto lock in the v0.11.0 regression contract.
Changed
- Delegation and composition flow: Empty-roster specialist requests now progress into composition and delegation instead of collapsing into narrated intent or centralization.
- Task-path truthfulness: Filesystem/runtime blockers now surface as real blockers instead of canned fallback prose.
- Prompt and planner behavior: Introspection is now treated as the first operational step for task work and feeds a shared task operating state/action planner.
- Release documentation: Active docs, architecture notes, roadmap entries, and release gates now reflect shipped v0.11.0 behavior.
Fixed
- Pipeline trace drift: Legacy databases missing
pipeline_traces.session_idand related fields are repaired on boot. - Prompt Performance persistence: Routing weights and context budget now persist and rehydrate correctly.
- Dashboard regressions: Repaired session archive, raw TOML editing, Observability nav, semantic memory navigation, roster skill drill-down, workspace symmetry, and related operator-surface gaps.
- Analysis/recommendation soak failures: Live deep-analysis surfaces validated against a stronger provider path during soak.
v0.10.0: Correctness, Safety & Operational Maturity
Highlights
- Model Categorization (Phase 1): 10 task categories × 29 model profiles × 8 providers → category-aware routing with
category_fitmetascore dimension - Skill Authoring API: Create, validate, and publish Markdown instruction skills via
POST /api/skills/author - Landlock & Job Object Confinement: Linux filesystem sandboxing (landlock) and Windows process isolation (Job Objects) for script execution
- Typestate Sessions: Compile-time session lifecycle enforcement —
Session<Created>→Session<Active>→Session<Closed> - Delegation Scoring Engine:
score_agent_fit(),composite_fit_ratio(), andutility_margin_for_delegation()for principled decomposition decisions - TUI Client: Interactive terminal interface via
roboticus tuiwith streaming responses and WebSocket transport - Codex CLI Plugin: Delegate coding tasks to OpenAI Codex CLI with structured output
Critical Fixes
- Signal adapter: replaced
std::sync::Mutexin async context with boundedtokio::sync::mpsc— eliminates runtime thread blocking - Signal adapter: added
governor::RateLimiter(5 req/s default) to prevent signal-cli daemon DoS - Delivery queue: proper HTTP status code extraction for permanent vs transient error classification
- Formatter: 3-allocation chain → single-pass
clean_content()across all channel formatters
Infrastructure
- DedupTracker moved from
LlmService(requiring write lock) toAppState(standardMutex) — reduced lock contention - L0 context budget increased from 4,000 → 8,000 tokens
- Codecov upload made best-effort (quality gates still enforced at 80% minimum + regression check)
- Cross-platform CI: Windows
HANDLEtype migration forwindows-sys0.59+ - Dependabot, Docker Compose, backup/restore runbook, observability guide
Stats
- 302 files changed across 12 crates
- 36,542 additions, 4,809 deletions
- 83.84% test coverage (above 80% gate)
- All 28 CI jobs green (Linux, macOS, Windows)
v0.9.9
Added
- Terminal user interface (
roboticus tui): Newroboticus-tuicrate providing an interactive terminal UI with chat, log viewer, and status bar. Connects to a running server via REST API + WebSocket, supports session create/resume, streaming responses, and keyboard navigation. - Context budget tuning: Configurable L0-L3 token budgets via
[context_budget]config section, dashboard range sliders, and per-channel minimum complexity level. Replaces hardcoded 4k/8k/16k/32k defaults. - Integrations management: Unified channel health monitoring with
POST /api/channels/{platform}/testprobe endpoint, dashboard integrations panel with per-channel test buttons, androboticus integrationsCLI command group. - Tool output noise filter: New
ToolOutputFilterChainwith four filters (AnsiStripper, ProgressLineFilter, DuplicateLineDeduper, WhitespaceNormalizer) applied to tool output before LLM observation, reducing token waste from ANSI escapes, progress bars, and duplicate lines. - Dashboard config exposure: All configuration sections now appear in the web dashboard form even when not yet configured. Unconfigured channels show "Enable" buttons. Powered by a
CONFIG_SCHEMAmerge before rendering. - Routing profile polish: Validation warning when slider weights exceed 1.0, persistence toast on apply, and default profile display (0.33/0.33/0.33) when no routing config exists.
Changed
- Default bind address: Changed from
127.0.0.1tolocalhostacross all defaults, config generation, documentation, and justfile. OAuth loopback callbacks and test ephemeral ports remain as IP literals per RFC 8252.
Fixed
- Script runner tests on Windows: Added
#[cfg(unix)]guard onscript_runner::testsmodule that usesstd::os::unix::fs::PermissionsExt, preventing compilation errors on Windows.
v0.9.7
Added
- DB fitness hardening (DF-1–DF-18): 18-item SQLite performance audit resolved — retention pruning for 5 high-growth tables, orphan cleanup sweeps (working memory + embeddings),
auto_vacuum=INCREMENTAL, 6 missing indexes, episodic dead-entry pruning, cache NULL-expiry fix,PRAGMA synchronous=NORMALunder WAL, CHECK constraints on 11 columns, and deadproxy_statstable removal. - Memory hygiene mechanic:
ironclad mechanicdetects and (with--repair) purges contaminated memory entries using 7 deterministic LIKE-prefix patterns across 3 tiers, with JSON-structured findings. - Sandbox boundary management: Filesystem confinement for skill scripts (skills_dir +
$IRONCLAD_WORKSPACE, no traversal/symlink escape), configurable network isolation (unshare(CLONE_NEWNET)on Linux), memory ceiling viaRLIMIT_AS, interpreter allowlist via absolute-path resolution, and mechanic sandbox health reporting. - Filesystem security overhaul:
FilesystemSecurityConfigwithworkspace_onlymode, ~25 default protected path patterns,tool_allowed_pathswhitelist (auto-populated from Obsidian vault path), macOSsandbox-execwrite-denial confinement, and dashboard UI toggles. - Unified pipeline architecture:
IntentRegistry(22-variantIntentenum),GuardChain(12 guards withfull()/cached()/streaming()presets),ShortcutDispatcher(15 handlers replacing 983-line god function),PipelineConfig(4 presets:api/streaming/channel/cron), andDedupGuardRAII replacing 11 manual release patterns. Net ~653 lines removed. - ChannelFormatter trait: Per-platform output formatting with static dispatch registry —
TelegramFormatter(Markdown→MarkdownV2),DiscordFormatter,WhatsAppFormatter,SignalFormatter,WebFormatter,EmailFormatter— wired intochannel_message.rsdelivery path. 31 unit tests. - Configurable inference timeouts: Per-provider
timeout_secondssetting ([providers.*.timeout_seconds]) with 300-second default, surfaced in dashboard provider configuration. - Dashboard session ID copy button: One-click copy-to-clipboard for session IDs in the Sessions panel.
Fixed
- Circuit breaker window reset:
record_failure()now trackswindow_startfor rolling-window accumulation — failures spaced ~60s apart correctly accumulate instead of resetting. - Embedding auth for local providers:
EmbeddingConfig.is_localskips API key resolution and auth headers for Ollama/llama.cpp. - Cron
schedule_kind: "once"support: Runtime maps "once" → "at" dispatch, callsDurableScheduler::evaluate_at(), auto-disables after single execution. - Vault path whitelisting:
tool_allowed_pathsauto-populated fromobsidian.vault_pathduring config normalization — workspace-only mode no longer blocks configured external paths. - Fleet activity chart capacity model: Stacked area normalizes per-agent scores by
1/agentCountwithfixedMax: 1.0. - Cache guard parity:
cached()guard set now includesSubagentClaim+LiteraryQuoteRetry(previously missing). - ExecutionTruthGuard: Tool-results bypass bug removed.
- Collapsible if lint: Updated
impl_core.rsto useif letchain (edition 2024). - Wallet RPC rate-limit backoff:
get_all_balances()detects rate-limit error codes (-32016,-32005,429) and stops iterating remaining tokens instead of repeatedly hitting the provider. - Cron once-type orphan jobs: Jobs with
schedule_kind: "once"and noschedule_exprare now auto-disabled on first encounter instead of emitting a warning every 60s. - Dashboard sidebar footer: Navigation bar footer now stays pinned to the bottom of the viewport (added
height: 100%to sidebar container). - Dashboard custom model Add button: Custom model text input row now has its own Add button; both Add buttons use a shared class selector.
- Telegram double-underscore italic:
__text__was incorrectly emitted as Telegram underline instead of italic — formatter now maps to_text_. - Config hot-reload path divergence:
normalize_paths()andmerge_bundled_providers()were skipped during hot-reload — reloaded configs now match boot-time normalization. - Routing audit fixes: Attempt counter not incrementing on retry,
u32truncation on cost metrics, misleading timeout error message wording. - Dashboard UI stall during inference: 4
RwLockguard-scope fixes release locks before async I/O, preventing cascading reader starvation. - Cron semaphore hot-reload race: Semaphore not released when cron runtime reloads config, causing phantom permit exhaustion. Dead
LlmServicemethod removed, lock consolidation in admin routes. - Agent audit fixes: Tautological always-true test condition, timeout hint parsing edge case, unreachable branch removal.
v0.9.6
Added
- Compliance-first self-funding control plane: Complete revenue opportunity lifecycle (intake → qualify → score → plan → fulfill → settle) with DB-backed restart safety, strategy-level scoring (confidence/effort/risk/priority/recommendation), feedback persistence per opportunity and summary by strategy, configurable post-settlement asset routing (default
PALM_USD), EVM swap submission with tx-hash tracking and on-chain receipt reconciliation, tax payout lifecycle mirroring swap tasks, and operator-visible accounting (net profit, attributable costs, retained earnings, tax allocation) across API, CLI, and mechanic surfaces. - Revenue mechanic integration: Mechanic can probe, reconcile, and repair orphaned or stale revenue jobs and swap/tax reconciliation mismatches via
run_gateway_provider_and_revenue_checksandrun_gateway_integrated_repair_sweep. - Skills catalog:
PluginCatalogwith CLI flows (ironclad skills catalog list/install/activate) and API endpoints (GET/POST /api/skills/catalog,/install,/activate). Registry manifest fetch from remote URL. - Skill registry protocol: Migration 022 adds
version,author,registry_sourcecolumns to skills table. Multi-registry support viaRegistrySource { name, url, priority, enabled }with backward-compatible fallback from legacy single-URLregistry_url. - Multi-registry fetch: Registry sync iterates all configured sources, namespaces skills as
{registry_name}/{skill_name}for non-local sources, applies semver comparison to skip redundant downloads, and resolves conflicts by registry priority. - Learning loop closure: Agent now detects repeating multi-step tool sequences on session close and synthesizes reusable SKILL.md procedure files.
learned_skillstable (migration 021) tracks reinforcement history (success/failure counts, priority).LearningConfigexposes tuneable thresholds for minimum sequence length, success ratio, priority boost/decay, and skill cap. Inspired by recent work on autonomous tool-use learning in LLM agents (arXiv:2603.05344). - Procedural failure recording:
record_procedural_failure()(previously dead code in the DB layer) is now called fromingest_turn()when tool results indicate failure, closing the procedural memory feedback loop. - Skill priority adjustment: Governor
tick()now runsadjust_learned_skill_priorities()after episodic decay — learned skills with high success ratios get priority boosts; those with poor ratios get decayed. - Skill subdirectory loading:
SkillLoadernow recurses intolearned/subdirectory, loading machine-synthesized skills alongside hand-authored ones. - Progressive context compaction: 5-stage compaction (
Trim→Summarize→Archive→Evict→Emergency) incompact_before_archive()withCompactionStage::from_excess()selector. - Decay-weighted episodic retrieval:
rerank_episodic_by_decay()applies time-based decay at retrieval time, preventing stale context from dominating memory budget. - Instruction anti-fade micro-reminders: Event-driven system prompt reinforcement at agent decision points to combat instruction-following drift.
- x402 autonomous payment: LLM HTTP client now handles
402 Payment Requiredresponses with autonomous on-chain payment and request retry. - Homebrew & Winget packaging:
release.ymlcontains completeupdate-homebrew(SHA256 extraction, formula generation, tap push) andupdate-winget(vedantmgoyal9/winget-releaser@v2) jobs. Activation requires tap repo creation and secrets provisioning.
v0.9.5
Changed
- Behavior soak hardening:
scripts/run-agent-behavior-soak.pynow includes regression checks for filesystem capability truthfulness, subagent capability response quality, and affirmative continuation quality, with rubric updates to score substantive outcomes over brittle phrase matching. - Roadmap/release traceability:
docs/releases/v0.9.5.mdanddocs/ROADMAP.mdupdated with current v0.9.5 prep status for speculative execution, browser runtime support, CLI skill roadmap slice, and behavior continuity validation. - Architecture documentation: Added explicit v0.9.5-prep control/dataflow coverage for deterministic execution shortcuts and guarded response sanitization in
docs/architecture/ironclad-dataflow.mdanddocs/architecture/ironclad-sequences.md. - Browser runtime continuity: Browser action execution now attempts a single stop/start session recovery when CDP disconnect/closed-socket errors are detected, limited to idempotent actions to avoid duplicate side effects on replay.
- Autonomy turn-budget controls: Added configurable agent-level ReAct budget controls (
autonomy_max_react_turns,autonomy_max_turn_duration_seconds) and wired enforcement into the runtime loop. - CLI adapter response contract:
run_scriptnow emits stable typed metadata (adapter,schema_version,status,error_class) and normalized script error classes for downstream handling. - Speculative policy invariants: Added explicit test coverage enforcing Safe-only speculative eligibility (Caution/Dangerous/Forbidden remain excluded from speculative execution).
Fixed
- Internal protocol fallback leakage: response sanitization no longer surfaces protocol-placeholder fallback text; empty/degraded sanitized content now resolves through deterministic user-facing quality fallback.
- Markdown count execution reliability: execution shortcut path now handles recursive markdown-file count prompts deterministically, including strict numeric-only responses when requested (
count only/only the numberstyle prompts). - Delegation shortcut boundary: markdown-count shortcut no longer hijacks explicitly delegated prompts, preserving delegation intent handling.
- Speculative branch cleanup safety: introduced RAII speculation slot guards and abort-path tests to guarantee no slot leakage when speculative tasks are canceled.
- CLI skill sandbox isolation coverage: added explicit tests that secret env vars are stripped while only allowlisted runtime vars are propagated under
skills.sandbox_env=true.