Release v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening · siddsachar/row-bot

v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening

This release hardens the runtime paths that were expanded in v3.22.0. The headline work is provider compatibility: Thoth now preserves provider-qualified model identity end to end, routes incompatible models into a safer chat-only path, probes custom OpenAI-compatible endpoints before trusting tool support, and normalizes tricky provider transcripts before replay. It also ships a major memory recall uplift, with deterministic bounded recall, lexical and graph-expanded candidates, audit metadata, review states, and provenance surfaces. Around that, v3.23.0 makes large transcripts and Settings screens lighter, adds task database recovery, improves local/self-hosted setup, and expands regression coverage around real provider behavior.

Provider Runtime & Custom Endpoints

Provider-qualified model identity — model choices now keep their provider identity across Settings, catalog pinning, defaults, thread overrides, status displays, setup wizard choices, and runtime construction.
No accidental OpenRouter fallback — unknown bare model IDs no longer silently route to OpenRouter when the original provider cannot be inferred.
Runtime readiness routing — provider/runtime checks now distinguish full agent mode, chat-only mode, and blocked configurations before a broken run starts.
Context-window guardrails — small context windows block agent mode with clearer guidance, while medium windows can use chat-only mode when tool schemas would not fit reliably.
Unified context policy — local, cloud, and custom endpoint context caps now flow through one policy path with model maximums, user caps, and request-time context parameters where supported.
Context cache invalidation — changing local or cloud context settings clears stale LLM clients so subsequent turns use the new limits.
Custom endpoint profiles — OpenAI-compatible endpoints can use profile behavior for common local and proxy servers such as LM Studio, vLLM, llama.cpp, LocalAI, LiteLLM, SGLang, oMLX-style servers, and generic OpenAI-compatible backends.
Custom endpoint probing — self-hosted endpoints can be probed for catalog availability, streaming support, tool-call behavior, and model compatibility, with probe results persisted for later readiness decisions.
Native metadata discovery — LM Studio and llama.cpp metadata paths are used when available to discover context windows and native tool support more accurately.
No-auth endpoint support — local endpoints that do not require API keys can refresh catalogs without unnecessary secret lookups.
OpenAI-compatible transport — adds a dedicated transport for custom OpenAI-compatible chat, streaming, tool serialization, tool-call chunks, reasoning fields, runtime context overrides, and clearer HTTP error messages.
Unsupported payload cleanup — custom endpoint profiles can drop unsupported parameters such as tools, tool choice, parallel tool calls, reasoning, response formats, or tool history when a backend cannot accept them.
Tool-call recovery — local models that emit tool-call envelopes as text or reasoning can be recovered into structured tool calls when safe.
Reasoning-only response handling — reasoning-only outputs after successful tool calls can be promoted into final visible content, while reasoning-only failures after tool errors produce actionable errors instead of silent empty replies.
Custom tool validation repair — local/custom providers can receive a repair message when a tool call misses required fields such as query, reducing dead-end schema failures.
Ollama tool probing — unknown or uncertain local Ollama models can be promoted to agent mode only after a real tool round-trip succeeds.
Ollama launch cleanup — the launcher now starts Ollama only when saved Brain or Vision settings actually need local Ollama, and --no-ollama forces the skip.
Ollama reasoning behavior — Ollama reasoning is enabled only for detected reasoning models instead of being forced globally.
Vision provider refs — local Vision calls strip provider-qualified Ollama refs at the runtime edge while provider/cloud refs still route through the correct path.
Designer runtime readiness — Designer text refinement and speaker-note generation now use the active model override and verify that the selected model is agent-ready.

Chat-Only Runtime & Transcript Compatibility

Chat-only runtime path — non-tool or tool-incompatible models can answer normal chat without building the full tool graph.
Compact chat-only prompt — chat-only mode uses a smaller prompt that avoids implying tools, workflows, or task actions are available.
Tool-free history shaping — prior tool turns are summarized for chat-only context without replaying full tool bodies or invalid protocol shapes.
Chat-only streaming persistence — chat-only responses stream and persist through the normal conversation paths.
Runtime surface tagging — chat, channels, workflow approvals, Designer, and forced agent surfaces now tag their runtime mode so provider readiness can make the right routing decision.
Provider transcript diagnostics — model-facing transcripts are inspected for invalid tool calls, duplicate tool IDs, orphan tool results, and reasoning-field hazards.
Transcript normalization — provider-facing messages drop no-op assistant turns, strip invalid tool calls, rewrite duplicate tool-call IDs, drop orphan tool results, and remove unsafe reasoning fields for custom-tool artifacts.
Thinking retention — non-empty thinking/reasoning text is preserved through streaming, reattach, persisted transcript rendering, and final message display.
Reasoning-only final guard — reasoning-only chunks are no longer mistaken for final assistant content when there is no visible answer.
Checkpoint transcript loading — transcript loading can read checkpoint messages and token usage without importing or constructing the agent graph.
Legacy checkpoint repair — checkpoint version values are normalized when older integer versions are encountered.
Detached stream finalization — detached clients can finalize with scoped transcript refreshes instead of rebuilding the full main UI.
Optimistic message preservation — user messages remain visible during detached finalize and reconnect flows.

Memory Recall & Knowledge Audit

Bounded auto-recall policy — Agent turns now use deterministic memory recall with query building, context-aware token budgeting, scoring, filtering, and trace output.
Hybrid recall candidates — recall combines semantic search, FTS5 lexical search, keyword fallback, and graph-neighbor expansion.
Graph-expanded recall — strong seed memories can pull in related graph nodes with relation confidence and hop metadata.
Recall-safe candidate retrieval — candidate inspection no longer mutates recall timestamps until the final selected memories are injected.
Recall reinforcement — selected memories are touched with recalled_at and recall-count metadata after they are actually used.
Memory tier scoring — recall ranks core, semantic, episodic, and resource memories differently based on source, confidence, evidence, recency, and query fit.
Status-aware filtering — archived, needs-review, superseded, stale, weak, greeting-only, runtime-status, and unanchored resource memories are filtered out of normal auto-recall.
Recall traces — recent recall decisions are written to a compact trace file for debugging why memories were included or rejected.
FTS5 memory index — knowledge graph entities now maintain a lexical search index for faster exact/keyword recall.
Memory evolution helpers — new integrity helpers normalize status, tier, confidence, evidence, source context, manual edits, review state, superseding, archival, and journal entries.
Memory review states — memories can now be active, needs review, superseded, or archived without losing the underlying entity.
Audit metadata — extracted, document-derived, wiki-synced, and manually edited memories preserve stronger provenance, confidence, evidence, and source context.
Conflict handling — extraction can mark conflicting memories for review instead of overwriting high-authority user facts.
Low-confidence relation filtering — background extraction skips weak inferred relations instead of adding noisy graph edges.
Extraction journal — memory extraction records run summaries, per-thread details, skipped relations, and extraction outcomes.
Resource hub memories — document extraction creates or updates resource-style hub memories with provenance and audit fields.
Wiki sync provenance — wiki vault sync preserves audit/status metadata and appends memory-evolution journal entries.
Knowledge audit UI — Settings and entity editor surfaces now expose audit badges, filters, review queues, recall traces, and evolution journal entries.
Entity review actions — individual memories/entities can be archived, marked for review, superseded, restored to active, or marked as user-modified from the editor.
Memory tool output — memory search/list/save/update output now includes IDs, status, confidence, tier, and recall-aware results so agents can modify the right memory.

UI Performance & Transcript Loading

UI performance utilities — adds generation tokens, timed UI sections, slow-section logging, and safe UI callback/task wrappers.
Bounded transcript windows — large conversations render a bounded visible window with an explicit load-earlier path instead of rebuilding every message at once.
Async model picker cache — model picker options are cached and refreshed asynchronously so chat inputs can appear quickly.
Model surface placeholders — chat can render lightweight model/provider placeholders while detailed model status resolves in the background.
Generation-safe token counters — token counter updates are debounced and ignored when they belong to an older render generation.
Lazy Home panels — Home tab panels defer heavier Developer, Designer, Knowledge, and Activity work until opened.
Coalesced status refreshes — Home status pill refreshes are cached and coalesced to reduce repeated expensive checks.
Settings generation guards — Settings tab renders use generation tokens and local error boundaries so stale async work cannot overwrite newer UI.
Deferred Settings tabs — heavier Settings tab content is scheduled lazily instead of blocking the shell.
Lazy Knowledge sections — memory browsing, audit details, relationship loading, recall traces, and journal rows load on demand.
Off-UI-loop entity saves — entity editor saves run off the UI loop and refresh Knowledge state in staged steps.
Render instrumentation — graph chat, streaming, Mermaid rendering, text embeds, transcript rendering, and blank-thread startup now include performance instrumentation.
Performance harness — adds a local harness for profiling real transcripts and blank-thread shells.

Task Database Recovery

Shared data path helpers — local database paths now resolve through a shared data-path module for tasks, memory, threads, and diagnostics.
Task schema validation — startup/task operations validate required tables and columns before use.
In-place schema repair — partial task databases can be repaired in place while preserving existing rows when possible.
Corrupt DB recovery — corrupt task databases are backed up and recreated with a clean schema.
Schema retry wrappers — task operations retry once after repairing schema-related SQLite errors.
Malformed migration tolerance — workflow-to-task migration skips malformed legacy rows after the destination schema exists.
Launcher recovery commands — launcher.py --reset-tasks-db, --reset-db, and --restore-data can back up and recreate local SQLite stores.
WAL/SHM backup coverage — task, memory, and thread DB backup/restore handles SQLite companion files.
Support diagnostics — Home, Command Center, and thoth_status show task-schema state, recovery guidance, last repair, and schema errors.

Tools, Channels & Runtime Reliability

Channel runtime routing — Telegram, WhatsApp, Discord, Slack, and SMS now mark channel turns as channel/auto runtime, while approval resumes force agent mode.
Approval resume routing — channel approval resumes explicitly request agent mode so tool continuations do not fall into chat-only routing.
Wikipedia HTTPS endpoint — the Wikipedia tool forces the legacy client onto the HTTPS API endpoint.
Wikipedia recoverable errors — upstream JSON/API failures now return a recoverable tool result that tells the agent not to retry the same query blindly.
Wikipedia usage guidance — the tool description now steers broad conceptual questions away from unnecessary encyclopedia lookups.
Thoth Status model reporting — status output reports the effective runtime model/mode more accurately.
Thoth Status task reporting — scheduled-task status now includes schema diagnostics before listing configured tasks.
Command Center recovery copy — task-schema failures point users toward the new launcher recovery command.

Tests & Release Checks

Provider readiness coverage — tests cover agent/chat-only/block routing, context floors, cached capability snapshots, OpenRouter metadata, Ollama probing, and custom endpoint probing.
Custom provider coverage — tests cover profiles, no-auth endpoints, native metadata discovery, streaming probes, context overrides, and setup wizard payloads.
OpenAI-compatible transport coverage — tests cover request payloads, tool calls, streaming, reasoning-only finals, unsupported parameters, and provider error handling.
Provider selection coverage — tests cover provider-qualified refs, duplicate model IDs across providers, Ollama refs, Quick Choices, and stale capability refresh.
Chat-only and transcript coverage — tests cover chat-only streaming, forced agent surfaces, checkpoint transcript loading, checkpoint version repair, detached finalize, and thinking retention.
Memory recall coverage — tests cover auto-recall scoring, filtering, graph expansion, recall traces, evolution helpers, audit helpers, and memory extraction metadata.
UI performance coverage — tests cover generation tokens, safe UI callbacks, bounded transcript windows, lazy Knowledge surfaces, staged refreshes, and performance harness wiring.
Task recovery coverage — tests cover empty data dirs, partial schemas, corrupt DB recreation, migration tolerance, launcher reset/restore args, and DB-family backup.
Tool/runtime regressions — tests cover Wikipedia recovery, Vision provider refs, Designer routing, Home performance, model picker regressions, and opt-in live provider matrix behavior.
Live provider marker — adds a live_provider pytest marker for real configured-provider calls that remain opt-in.

Release Notes & Risk Notes

Custom endpoint compatibility depends on the server — profiles and probes improve behavior for common OpenAI-compatible servers, but local/proxy backends can still vary in tool syntax, streaming behavior, and context parameter names.
Chat-only mode is intentionally limited — models routed to chat-only mode can answer normal conversation but should not be expected to run tools, workflows, or structured agent actions.
Memory recall is more selective — archived, superseded, weak, or unanchored memories may stop appearing automatically; users can still review and restore memory state from Knowledge surfaces.
Task DB recovery backs up before reset — recovery commands preserve old SQLite files under the local recovery directory, but reset flows can remove active scheduled-task rows from the live DB until restored.
Live provider tests are opt-in — the new live matrix is useful for release validation with configured credentials, but it is not part of the normal offline unit suite.

Files Changed

File	Change
`agent.py`, `models.py`, `prompts.py`, `threads.py`	Runtime readiness routing, chat-only execution, provider transcript normalization, thinking retention, context policy usage, and checkpoint transcript helpers
`providers/custom.py`, `providers/readiness.py`, `providers/resolution.py`, `providers/runtime.py`, `providers/selection.py`, `providers/tool_protocol.py`, `providers/transports/openai_compatible.py`, `providers/ollama.py`	Provider-qualified resolution, custom endpoint profiles/probes, OpenAI-compatible transport, Ollama probing/reasoning behavior, context overrides, and tool validation repair
`ui/setup_wizard.py`, `ui/provider_settings.py`, `ui/model_catalog.py`, `vision.py`, `designer/ai_content.py`	Custom endpoint setup fields, provider-qualified setup selections, async model-picker behavior, Vision provider-ref routing, and Designer model readiness
`memory_policy.py`, `memory_evolution.py`, `knowledge_graph.py`, `memory.py`, `memory_extraction.py`, `document_extraction.py`, `wiki_vault.py`, `tools/memory_tool.py`	Bounded recall policy, lexical/graph recall candidates, memory audit metadata, evolution journal, extraction provenance, and memory tool output
`ui/knowledge_audit.py`, `ui/entity_editor.py`, `ui/settings.py`, `ui/graph_panel.py`	Knowledge audit helpers, entity review actions, lazy Knowledge settings surfaces, recall traces, and memory evolution journal UI
`ui/performance.py`, `ui/transcript.py`, `ui/chat.py`, `ui/chat_components.py`, `ui/render.py`, `ui/streaming.py`, `ui/home.py`, `ui/status_bar.py`, `ui/command_center.py`	UI performance instrumentation, bounded transcript rendering, detached finalize improvements, async picker loading, lazy Home panels, cached status refresh, and task recovery copy
`tasks.py`, `data_paths.py`, `launcher.py`, `tools/thoth_status_tool.py`	Task DB schema validation/repair, recovery commands, data path helpers, backup/restore support, and support diagnostics
`channels/approval.py`, `channels/telegram.py`, `channels/whatsapp.py`, `channels/discord_channel.py`, `channels/slack.py`, `channels/sms.py`	Runtime surface tagging for channel turns and approval resumes
`tools/wikipedia_tool.py`	HTTPS API endpoint forcing, recoverable Wikipedia errors, and safer tool usage guidance
`scripts/reasoning_completion_harness.py`, `scripts/ui_performance_harness.py`, `pytest.ini`, `tests/`	Reasoning/runtime harnesses, UI performance harness, live-provider marker, and focused regressions for provider runtime, memory recall, UI performance, task recovery, transcript loading, Vision, and Wikipedia

schema: 1
files:
  Thoth-3.23.0-Linux-x86_64.tar.gz: sha256=22e5e46e883418dffea8ac394f80a59e648c2b7c820944ab3715b4b41b2fd458
  Thoth-3.23.0-macOS-arm64.dmg: sha256=c298995e741940a79e71e2af4e9e4fd85b53af34de058990eedf6ba27ca3d724
  ThothSetup_3.23.0.exe: sha256=908cf9563b2dc0b2ebc9dd0d6e685ef4b56204b93d0fd30ca4a8f9be2551ae8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v3.23.0 - Provider Runtime, Memory Recall & UI Performance Hardening

Provider Runtime & Custom Endpoints

Chat-Only Runtime & Transcript Compatibility

Memory Recall & Knowledge Audit

UI Performance & Transcript Loading

Task Database Recovery

Tools, Channels & Runtime Reliability

Tests & Release Checks

Release Notes & Risk Notes

Files Changed

Uh oh!