Skip to content

Roadmap

Bob edited this page Jun 24, 2026 · 9 revisions

Roadmap

This page reflects where the project actually stands, not a static wishlist — it gets revisited and corrected as work lands, the same way a stale README section gets caught and fixed rather than left to drift.

Capability Expansion — complete

The five original items that defined the project's early feature set are all done:

  1. ✅ Configurable thresholds
  2. ✅ Kiwix search term disambiguation — see Kiwix Disambiguation
  3. ✅ Multi-book Kiwix fusion — see Multi-Book Fusion
  4. ✅ Confidence-aware fusion with expanded ingest — see Confidence-Aware Fusion
  5. ✅ Conditional query detection — see Conditional Query Detection

Battle Testing & Operational Maturity — complete

Three real gaps, found through deliberate review rather than reported failures, all closed:

  • ✅ Discourse-framing routing bypass — see The Discourse-Framing Investigation
  • ✅ Fallback visibility in /logs/stats
  • ✅ Routing cache size bounding + visibility in /health
  • ✅ Background snapshot job health

Full mechanism detail for the operational maturity work lives in Health & Observability and Caching.

Bulletproofing Pass — complete

A deliberate, full read of every file in app/, top to bottom — specifically ignoring complexity scores and looking at the kind of small, simple-looking code that score-driven review naturally skips. Found and fixed real bugs in nearly every file touched, several of them significant:

  • home_assistant.py — a severe word-boundary bug ("is the front door locked" silently returning no results) and four related fixes
  • kiwix.py — non-deterministic book selection, broken table-of-contents stripping, a single-character search-term bug, and an unbounded retry loop with a real multi-minute worst case
  • fusion.py — a real crash on FUSION_MAX_SOURCES=0
  • snapshots.py — uptime history only covering 9.6 real hours instead of a full week
  • router.py / fusion.py — a cross-file drift in the shared "did this source actually fail" logic that silently disabled the newsweb fallback for unconfigured sources
  • forecast.py — an unconfigured deployment silently returning real weather data for the wrong place on Earth
  • llm.py — thinking models on the OpenAI-compatible backend silently returning no answer at all

mcp_server.py, query_expansion.py, and searxng.py were read with the same scrutiny and came back genuinely clean — a real, useful outcome in its own right, confirming prior work in those files holds up.

Documentation Restructuring — complete

  • ✅ Full wiki review — every page checked against actual current code, factual drift fixed, user-useful information reordered to lead before mechanism detail
  • ✅ README restructured — a "Why Mnemolis" section added before the feature list, Architecture's deep-dive diagrams moved after installation/configuration, MCP moved up near Integrations, several stale facts and one broken anchor link fixed
  • ✅ First real benchmark run since v3.17.0, covering the entire battle-testing and bulletproofing campaign — aggregated median held at 24ms across roughly 25 releases, with two honest findings reported (one now traced to a real cause, one still genuinely open)

Config-Completeness Audit — complete

A systematic search across every file in app/ for hardcoded values a real homelab deployment might genuinely want to tune, rather than a hunt for bugs. 16 new settings added, including one that directly undermined existing documentation:

  • SEARXNG_REQUEST_TIMEOUT_SECONDS — Mnemolis's own client-side timeout was hardcoded shorter than the timeout the README's own SearXNG Timeout Lesson tells people to configure on the SearXNG side, meaning that documented fix wouldn't have fully worked
  • ✅ The actual, central multi-book fusion threshold, previously hardcoded despite being documented in 3 places as the real mechanism
  • ✅ All 8 per-source result cache TTLs and the routing cache TTL, previously presented as deliberate, reasoned defaults but impossible to actually adjust
  • ✅ Kiwix article truncation length, web/news scoring's raw-result budget, query expansion's minimum word count, and the snapshot job staleness grace multiplier

Full detail in Configuration Reference and Caching.

Known limitations (tracked, accepted, not blocking)

These are real, understood boundaries — not bugs waiting for a fix, but deliberate scope decisions or honest, accepted ceilings. A reader-facing version of this same list, written for evaluating fit rather than tracking status, lives at Known Limitations:

  • Single ambiguous bare words (e.g. "galaxy") can land on a thematically-related but imprecise match when the index genuinely contains multiple comparably-relevant senses of the word. See Kiwix Scoring.
  • Conditional phrasing without an explicit comma ("if the front door is unlocked tell me") is intentionally not detected — a real grammatical-parsing problem, not a pattern-matching one. See Conditional Query Detection.
  • A decomposed segment merging two unrelated topics may route to a single source that doesn't serve both well — an accepted, minor side effect of the proper-noun-pair guard's content-preservation fix, not a regression.

🔬 Speculative — no obligation to succeed

These two are deliberately framed differently from everything else on this page. They're permitted to fail; "found nothing interesting" or "didn't pan out" are acceptable, informative outcomes here, not wasted effort.

Cross-Source Temporal Pattern Detection — extend the snapshot engine to surface correlations across sources over time, not just per-source diffs. Recurring timing relationships between events (a door event consistently preceding a motion event, a particular weather shift consistently preceding a service hiccup) — closer to lightweight pattern-mining than search. Buildable on infrastructure that already exists; the actual risk is finding nothing beyond noise, which is a fine, honest result.

Adversarial Self-Testing — a background job (reusing the same apscheduler infrastructure the snapshot engine already runs on) that periodically generates messy, compound, edge-case-shaped queries via the local LLM — seeded with the actual patterns that broke things during this project's testing history — runs them through the real pipeline, and logs results for periodic review. Institutionalizes the adversarial megaquery testing approach that found most of the bugs documented in Design History, instead of relying on someone doing it by hand each time. Open design question worth solving first: what makes a generated query actually useful versus trivial.

Tabled, revisit in ~1 year

Cross-modal grounding — correlating a camera snapshot with a text answer ("did anything weird happen at the back door" pulling the actual image alongside the sensor log) would be a genuine "wow" capability, not just well-executed plumbing. Deliberately not pursued yet — the current camera setup (Ring) isn't infrastructure worth building on top of long-term; revisit once a self-controlled NVR solution exists instead.

Still tracked, lower priority

  • New source modules — see Contributing for the current list of proposed ones looking for contributors
  • HA/voice pipeline architecture question — whether to bypass Home Assistant's own conversation/intent layer for non-device-control voice queries, piping STT output more directly to Mnemolis's /search instead, and keeping HA for device control and audio I/O only. Raised, never designed — a genuinely different kind of work (infrastructure/integration) than anything else on this list.

Clone this wiki locally