Skip to content

v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening

Choose a tag to compare

@siddsachar siddsachar released this 28 May 17:16
· 52 commits to main since this release
39b0de2

v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening

This release hardens the runtime paths that were expanded in v3.22.0. The headline work is provider compatibility: Thoth now preserves provider-qualified model identity end to end, routes incompatible models into a safer chat-only path, probes custom OpenAI-compatible endpoints before trusting tool support, and normalizes tricky provider transcripts before replay. It also ships a major memory recall uplift, with deterministic bounded recall, lexical and graph-expanded candidates, audit metadata, review states, and provenance surfaces. Around that, v3.23.0 makes large transcripts and Settings screens lighter, adds task database recovery, improves local/self-hosted setup, and expands regression coverage around real provider behavior.

Provider Runtime & Custom Endpoints

  • Provider-qualified model identity — model choices now keep their provider identity across Settings, catalog pinning, defaults, thread overrides, status displays, setup wizard choices, and runtime construction.
  • No accidental OpenRouter fallback — unknown bare model IDs no longer silently route to OpenRouter when the original provider cannot be inferred.
  • Runtime readiness routing — provider/runtime checks now distinguish full agent mode, chat-only mode, and blocked configurations before a broken run starts.
  • Context-window guardrails — small context windows block agent mode with clearer guidance, while medium windows can use chat-only mode when tool schemas would not fit reliably.
  • Unified context policy — local, cloud, and custom endpoint context caps now flow through one policy path with model maximums, user caps, and request-time context parameters where supported.
  • Context cache invalidation — changing local or cloud context settings clears stale LLM clients so subsequent turns use the new limits.
  • Custom endpoint profiles — OpenAI-compatible endpoints can use profile behavior for common local and proxy servers such as LM Studio, vLLM, llama.cpp, LocalAI, LiteLLM, SGLang, oMLX-style servers, and generic OpenAI-compatible backends.
  • Custom endpoint probing — self-hosted endpoints can be probed for catalog availability, streaming support, tool-call behavior, and model compatibility, with probe results persisted for later readiness decisions.
  • Native metadata discovery — LM Studio and llama.cpp metadata paths are used when available to discover context windows and native tool support more accurately.
  • No-auth endpoint support — local endpoints that do not require API keys can refresh catalogs without unnecessary secret lookups.
  • OpenAI-compatible transport — adds a dedicated transport for custom OpenAI-compatible chat, streaming, tool serialization, tool-call chunks, reasoning fields, runtime context overrides, and clearer HTTP error messages.
  • Unsupported payload cleanup — custom endpoint profiles can drop unsupported parameters such as tools, tool choice, parallel tool calls, reasoning, response formats, or tool history when a backend cannot accept them.
  • Tool-call recovery — local models that emit tool-call envelopes as text or reasoning can be recovered into structured tool calls when safe.
  • Reasoning-only response handling — reasoning-only outputs after successful tool calls can be promoted into final visible content, while reasoning-only failures after tool errors produce actionable errors instead of silent empty replies.
  • Custom tool validation repair — local/custom providers can receive a repair message when a tool call misses required fields such as query, reducing dead-end schema failures.
  • Ollama tool probing — unknown or uncertain local Ollama models can be promoted to agent mode only after a real tool round-trip succeeds.
  • Ollama launch cleanup — the launcher now starts Ollama only when saved Brain or Vision settings actually need local Ollama, and --no-ollama forces the skip.
  • Ollama reasoning behavior — Ollama reasoning is enabled only for detected reasoning models instead of being forced globally.
  • Vision provider refs — local Vision calls strip provider-qualified Ollama refs at the runtime edge while provider/cloud refs still route through the correct path.
  • Designer runtime readiness — Designer text refinement and speaker-note generation now use the active model override and verify that the selected model is agent-ready.

Chat-Only Runtime & Transcript Compatibility

  • Chat-only runtime path — non-tool or tool-incompatible models can answer normal chat without building the full tool graph.
  • Compact chat-only prompt — chat-only mode uses a smaller prompt that avoids implying tools, workflows, or task actions are available.
  • Tool-free history shaping — prior tool turns are summarized for chat-only context without replaying full tool bodies or invalid protocol shapes.
  • Chat-only streaming persistence — chat-only responses stream and persist through the normal conversation paths.
  • Runtime surface tagging — chat, channels, workflow approvals, Designer, and forced agent surfaces now tag their runtime mode so provider readiness can make the right routing decision.
  • Provider transcript diagnostics — model-facing transcripts are inspected for invalid tool calls, duplicate tool IDs, orphan tool results, and reasoning-field hazards.
  • Transcript normalization — provider-facing messages drop no-op assistant turns, strip invalid tool calls, rewrite duplicate tool-call IDs, drop orphan tool results, and remove unsafe reasoning fields for custom-tool artifacts.
  • Thinking retention — non-empty thinking/reasoning text is preserved through streaming, reattach, persisted transcript rendering, and final message display.
  • Reasoning-only final guard — reasoning-only chunks are no longer mistaken for final assistant content when there is no visible answer.
  • Checkpoint transcript loading — transcript loading can read checkpoint messages and token usage without importing or constructing the agent graph.
  • Legacy checkpoint repair — checkpoint version values are normalized when older integer versions are encountered.
  • Detached stream finalization — detached clients can finalize with scoped transcript refreshes instead of rebuilding the full main UI.
  • Optimistic message preservation — user messages remain visible during detached finalize and reconnect flows.

Memory Recall & Knowledge Audit

  • Bounded auto-recall policy — Agent turns now use deterministic memory recall with query building, context-aware token budgeting, scoring, filtering, and trace output.
  • Hybrid recall candidates — recall combines semantic search, FTS5 lexical search, keyword fallback, and graph-neighbor expansion.
  • Graph-expanded recall — strong seed memories can pull in related graph nodes with relation confidence and hop metadata.
  • Recall-safe candidate retrieval — candidate inspection no longer mutates recall timestamps until the final selected memories are injected.
  • Recall reinforcement — selected memories are touched with recalled_at and recall-count metadata after they are actually used.
  • Memory tier scoring — recall ranks core, semantic, episodic, and resource memories differently based on source, confidence, evidence, recency, and query fit.
  • Status-aware filtering — archived, needs-review, superseded, stale, weak, greeting-only, runtime-status, and unanchored resource memories are filtered out of normal auto-recall.
  • Recall traces — recent recall decisions are written to a compact trace file for debugging why memories were included or rejected.
  • FTS5 memory index — knowledge graph entities now maintain a lexical search index for faster exact/keyword recall.
  • Memory evolution helpers — new integrity helpers normalize status, tier, confidence, evidence, source context, manual edits, review state, superseding, archival, and journal entries.
  • Memory review states — memories can now be active, needs review, superseded, or archived without losing the underlying entity.
  • Audit metadata — extracted, document-derived, wiki-synced, and manually edited memories preserve stronger provenance, confidence, evidence, and source context.
  • Conflict handling — extraction can mark conflicting memories for review instead of overwriting high-authority user facts.
  • Low-confidence relation filtering — background extraction skips weak inferred relations instead of adding noisy graph edges.
  • Extraction journal — memory extraction records run summaries, per-thread details, skipped relations, and extraction outcomes.
  • Resource hub memories — document extraction creates or updates resource-style hub memories with provenance and audit fields.
  • Wiki sync provenance — wiki vault sync preserves audit/status metadata and appends memory-evolution journal entries.
  • Knowledge audit UI — Settings and entity editor surfaces now expose audit badges, filters, review queues, recall traces, and evolution journal entries.
  • Entity review actions — individual memories/entities can be archived, marked for review, superseded, restored to active, or marked as user-modified from the editor.
  • Memory tool output — memory search/list/save/update output now includes IDs, status, confidence, tier, and recall-aware results so agents can modify the right memory.

UI Performance & Transcript Loading

  • UI performance utilities — adds generation tokens, timed UI sections, slow-section logging, and safe UI callback/task wrappers.
  • Bounded transcript windows — large conversations render a bounded visible window with an explicit load-earlier path instead of rebuilding every message at once.
  • Async model picker cache — model picker options are cached and refreshed asynchronously so chat inputs can appear quickly.
  • Model surface placeholders — chat can render lightweight model/provider placeholders while detailed model status resolves in the background.
  • Generation-safe token counters — token counter updates are debounced and ignored when they belong to an older render generation.
  • Lazy Home panels — Home tab panels defer heavier Developer, Designer, Knowledge, and Activity work until opened.
  • Coalesced status refreshes — Home status pill refreshes are cached and coalesced to reduce repeated expensive checks.
  • Settings generation guards — Settings tab renders use generation tokens and local error boundaries so stale async work cannot overwrite newer UI.
  • Deferred Settings tabs — heavier Settings tab content is scheduled lazily instead of blocking the shell.
  • Lazy Knowledge sections — memory browsing, audit details, relationship loading, recall traces, and journal rows load on demand.
  • Off-UI-loop entity saves — entity editor saves run off the UI loop and refresh Knowledge state in staged steps.
  • Render instrumentation — graph chat, streaming, Mermaid rendering, text embeds, transcript rendering, and blank-thread startup now include performance instrumentation.
  • Performance harness — adds a local harness for profiling real transcripts and blank-thread shells.

Task Database Recovery

  • Shared data path helpers — local database paths now resolve through a shared data-path module for tasks, memory, threads, and diagnostics.
  • Task schema validation — startup/task operations validate required tables and columns before use.
  • In-place schema repair — partial task databases can be repaired in place while preserving existing rows when possible.
  • Corrupt DB recovery — corrupt task databases are backed up and recreated with a clean schema.
  • Schema retry wrappers — task operations retry once after repairing schema-related SQLite errors.
  • Malformed migration tolerance — workflow-to-task migration skips malformed legacy rows after the destination schema exists.
  • Launcher recovery commandslauncher.py --reset-tasks-db, --reset-db, and --restore-data can back up and recreate local SQLite stores.
  • WAL/SHM backup coverage — task, memory, and thread DB backup/restore handles SQLite companion files.
  • Support diagnostics — Home, Command Center, and thoth_status show task-schema state, recovery guidance, last repair, and schema errors.

Tools, Channels & Runtime Reliability

  • Channel runtime routing — Telegram, WhatsApp, Discord, Slack, and SMS now mark channel turns as channel/auto runtime, while approval resumes force agent mode.
  • Approval resume routing — channel approval resumes explicitly request agent mode so tool continuations do not fall into chat-only routing.
  • Wikipedia HTTPS endpoint — the Wikipedia tool forces the legacy client onto the HTTPS API endpoint.
  • Wikipedia recoverable errors — upstream JSON/API failures now return a recoverable tool result that tells the agent not to retry the same query blindly.
  • Wikipedia usage guidance — the tool description now steers broad conceptual questions away from unnecessary encyclopedia lookups.
  • Thoth Status model reporting — status output reports the effective runtime model/mode more accurately.
  • Thoth Status task reporting — scheduled-task status now includes schema diagnostics before listing configured tasks.
  • Command Center recovery copy — task-schema failures point users toward the new launcher recovery command.

Tests & Release Checks

  • Provider readiness coverage — tests cover agent/chat-only/block routing, context floors, cached capability snapshots, OpenRouter metadata, Ollama probing, and custom endpoint probing.
  • Custom provider coverage — tests cover profiles, no-auth endpoints, native metadata discovery, streaming probes, context overrides, and setup wizard payloads.
  • OpenAI-compatible transport coverage — tests cover request payloads, tool calls, streaming, reasoning-only finals, unsupported parameters, and provider error handling.
  • Provider selection coverage — tests cover provider-qualified refs, duplicate model IDs across providers, Ollama refs, Quick Choices, and stale capability refresh.
  • Chat-only and transcript coverage — tests cover chat-only streaming, forced agent surfaces, checkpoint transcript loading, checkpoint version repair, detached finalize, and thinking retention.
  • Memory recall coverage — tests cover auto-recall scoring, filtering, graph expansion, recall traces, evolution helpers, audit helpers, and memory extraction metadata.
  • UI performance coverage — tests cover generation tokens, safe UI callbacks, bounded transcript windows, lazy Knowledge surfaces, staged refreshes, and performance harness wiring.
  • Task recovery coverage — tests cover empty data dirs, partial schemas, corrupt DB recreation, migration tolerance, launcher reset/restore args, and DB-family backup.
  • Tool/runtime regressions — tests cover Wikipedia recovery, Vision provider refs, Designer routing, Home performance, model picker regressions, and opt-in live provider matrix behavior.
  • Live provider marker — adds a live_provider pytest marker for real configured-provider calls that remain opt-in.

Release Notes & Risk Notes

  • Custom endpoint compatibility depends on the server — profiles and probes improve behavior for common OpenAI-compatible servers, but local/proxy backends can still vary in tool syntax, streaming behavior, and context parameter names.
  • Chat-only mode is intentionally limited — models routed to chat-only mode can answer normal conversation but should not be expected to run tools, workflows, or structured agent actions.
  • Memory recall is more selective — archived, superseded, weak, or unanchored memories may stop appearing automatically; users can still review and restore memory state from Knowledge surfaces.
  • Task DB recovery backs up before reset — recovery commands preserve old SQLite files under the local recovery directory, but reset flows can remove active scheduled-task rows from the live DB until restored.
  • Live provider tests are opt-in — the new live matrix is useful for release validation with configured credentials, but it is not part of the normal offline unit suite.

Files Changed

File Change
agent.py, models.py, prompts.py, threads.py Runtime readiness routing, chat-only execution, provider transcript normalization, thinking retention, context policy usage, and checkpoint transcript helpers
providers/custom.py, providers/readiness.py, providers/resolution.py, providers/runtime.py, providers/selection.py, providers/tool_protocol.py, providers/transports/openai_compatible.py, providers/ollama.py Provider-qualified resolution, custom endpoint profiles/probes, OpenAI-compatible transport, Ollama probing/reasoning behavior, context overrides, and tool validation repair
ui/setup_wizard.py, ui/provider_settings.py, ui/model_catalog.py, vision.py, designer/ai_content.py Custom endpoint setup fields, provider-qualified setup selections, async model-picker behavior, Vision provider-ref routing, and Designer model readiness
memory_policy.py, memory_evolution.py, knowledge_graph.py, memory.py, memory_extraction.py, document_extraction.py, wiki_vault.py, tools/memory_tool.py Bounded recall policy, lexical/graph recall candidates, memory audit metadata, evolution journal, extraction provenance, and memory tool output
ui/knowledge_audit.py, ui/entity_editor.py, ui/settings.py, ui/graph_panel.py Knowledge audit helpers, entity review actions, lazy Knowledge settings surfaces, recall traces, and memory evolution journal UI
ui/performance.py, ui/transcript.py, ui/chat.py, ui/chat_components.py, ui/render.py, ui/streaming.py, ui/home.py, ui/status_bar.py, ui/command_center.py UI performance instrumentation, bounded transcript rendering, detached finalize improvements, async picker loading, lazy Home panels, cached status refresh, and task recovery copy
tasks.py, data_paths.py, launcher.py, tools/thoth_status_tool.py Task DB schema validation/repair, recovery commands, data path helpers, backup/restore support, and support diagnostics
channels/approval.py, channels/telegram.py, channels/whatsapp.py, channels/discord_channel.py, channels/slack.py, channels/sms.py Runtime surface tagging for channel turns and approval resumes
tools/wikipedia_tool.py HTTPS API endpoint forcing, recoverable Wikipedia errors, and safer tool usage guidance
scripts/reasoning_completion_harness.py, scripts/ui_performance_harness.py, pytest.ini, tests/ Reasoning/runtime harnesses, UI performance harness, live-provider marker, and focused regressions for provider runtime, memory recall, UI performance, task recovery, transcript loading, Vision, and Wikipedia
schema: 1
files:
  Thoth-3.23.0-Linux-x86_64.tar.gz: sha256=22e5e46e883418dffea8ac394f80a59e648c2b7c820944ab3715b4b41b2fd458
  Thoth-3.23.0-macOS-arm64.dmg: sha256=c298995e741940a79e71e2af4e9e4fd85b53af34de058990eedf6ba27ca3d724
  ThothSetup_3.23.0.exe: sha256=908cf9563b2dc0b2ebc9dd0d6e685ef4b56204b93d0fd30ca4a8f9be2551ae8e