Skip to content

Configuration Reference

Bob edited this page Jun 25, 2026 · 8 revisions

Configuration Reference

Every setting is an environment variable, set in docker-compose.yml. This page groups them by what they actually control, rather than the README's single flat table — useful for understanding why a default is what it is, not just what it is.

Backend connections

Variable Default Notes
KIWIX_URL http://kiwix:8080
FRESHRSS_URL http://freshrss
FRESHRSS_USER (blank)
FRESHRSS_API_PASSWORD (blank) A separate password from your normal FreshRSS login — generated specifically for API access
FRESHRSS_MAX_ARTICLES 10
SEARXNG_URL http://searxng:8080
SEARXNG_REQUEST_TIMEOUT_SECONDS 15 Mnemolis's own client-side wait time for a SearXNG response — separate from SearXNG's own server-side request_timeout setting (see The SearXNG Timeout Lesson). Set this to match or exceed whatever SearXNG is itself configured to wait, or the documented timeout fix on the SearXNG side won't fully take effect — Mnemolis would still cut the connection first
UPTIME_KUMA_URL (blank) Leaving this blank disables the uptime source entirely, rather than erroring — /health will simply not report a status for it
UPTIME_KUMA_USERNAME / UPTIME_KUMA_PASSWORD (blank)
HA_URL (blank) Same graceful-disable behavior as UPTIME_KUMA_URL
HA_TOKEN (blank) See Home Assistant Integration for how to generate this

LLM backend

Variable Default Notes
LLM_URL (blank) Leaving this blank disables every LLM-assisted feature — Routing falls back to keyword-only matching, Kiwix Disambiguation and Query Expansion never trigger, Kiwix book selection falls back to a fixed "search Wikipedia first" rule. Mnemolis still works, with meaningfully less of its actual intelligence available
LLM_MODEL qwen3:8b
LLM_API_TYPE ollama The other supported value is openai, for any OpenAI-compatible endpoint

Weather (forecast)

Variable Default Notes
FORECAST_LATITUDE / FORECAST_LONGITUDE (blank) Required for forecast to work at all
FORECAST_LOCATION_NAME (blank) Used to prefix forecast responses, so a fused response can't be misread as weather somewhere else
FORECAST_TIMEZONE UTC
FORECAST_PRECIP_THRESHOLD_PCT 20 Precipitation probability above which the forecast text actually mentions rain chance
FORECAST_WIND_THRESHOLD_MPH 15 Wind speed above which the forecast text mentions wind
FORECAST_TEMP_CHANGE_THRESHOLD 5.0 How large a temperature shift between snapshots needs to be before changes reports it as meaningful — a half-degree difference between two snapshots isn't worth surfacing

Timezone conversion

Variable Default Notes
LOCAL_TIMEZONE inherits TZ, or UTC if TZ is unset Converts stored UTC timestamps (every database timestamp in Mnemolis is UTC internally) into real local time, for any feature bucketing activity by local hour-of-day or day-of-week. See Timezone Conversion. Most deployments never need to set this directly — setting TZ (see README's Timezone configuration) is enough; LOCAL_TIMEZONE exists only for the rare case where this conversion should use a different zone than TZ

Time-window phrase resolution

Variable Default Notes
MORNING_START_HOUR 6 What hour (local time, 0-23) "this morning" looks back to in changes queries. A value outside 0-23 (e.g. 24 for midnight, a natural mistake) is wrapped via modulo rather than rejected — 24 is treated as 0
WORK_START_HOUR 9 Same, for "while at work" / "since work"

Fusion

Variable Default Notes
FUSION_MAX_SOURCES 4 Hard cap on how many sources one fusion query can touch. Setting this to 0 doesn't disable fusion — it correctly returns "no valid sources specified" rather than the raw crash it used to produce
FUSION_MAX_CHARS_PER_SOURCE 1500 Per-source truncation before merging
FUSION_TIMEOUT_SECONDS 15 How long any single source gets before fusion moves on without it

Caching

Variable Default Notes
CACHE_MAX_SIZE 500 Max result cache entries before oldest-eviction
ROUTING_CACHE_MAX_SIZE 1000 Max routing cache entries before oldest-eviction — larger than the result cache's default, since the routing cache's real key space (every unique conditional query, discourse-framing phrase, and disambiguation candidate set) is genuinely bigger
ROUTING_CACHE_TTL_SECONDS 3600 How long a routing decision (source, Kiwix book, disambiguation candidates) stays cached before the LLM gets asked again
CACHE_TTL_KIWIX_SECONDS 86400 Result cache TTL for kiwix (24 hours — offline encyclopedic content barely changes within a day)
CACHE_TTL_FORECAST_SECONDS 1800 Result cache TTL for forecast
CACHE_TTL_NEWS_SECONDS 900 Result cache TTL for news
CACHE_TTL_WEB_SECONDS 3600 Result cache TTL for web
CACHE_TTL_UPTIME_SECONDS 60 Result cache TTL for uptime
CACHE_TTL_HA_SECONDS 30 Result cache TTL for ha (the shortest of any source — lights and locks change state constantly)
CACHE_TTL_CHANGES_SECONDS 120 Result cache TTL for changes
CACHE_TTL_FUSION_SECONDS 1800 Result cache TTL for fusion

Kiwix tuning

Variable Default Notes
KIWIX_SEARCH_LIMIT 15 Results requested per book per search — higher values give scoring more candidates to find the right answer among when common terms collide with brand-name results
KIWIX_MAX_BOOKS 2 Max books the LLM can select for one query — raise this to allow broader multi-book fusion, at the cost of more searches per query
KIWIX_ARTICLE_MAX_CHARS 3000 How many characters of a fetched article's body to keep before scoring/fusion ever sees it — distinct from FUSION_MAX_CHARS_PER_SOURCE, which truncates the already-combined multi-source response, not an individual Kiwix article on its own
KIWIX_MULTI_BOOK_FUSION_THRESHOLD_PCT 0.5 The actual, central decision threshold for multi-book fusion: a second book's best result must score at least this fraction of the leading book's top score to be included. Lower for more aggressive fusion, raise for more conservative

Web & news scoring

Variable Default Notes
WEB_NEWS_SCORE_THRESHOLD 0 Results from confidence-aware fusion scoring at or below this are dropped
WEB_NEWS_TOP_N 10 Max results kept after scoring
WEB_NEWS_RAW_RESULT_BUDGET 25 How many raw, unscored results to pull from each web search before scoring filters them down — the scoring pipeline's input budget, distinct from WEB_NEWS_TOP_N's output cap
QUERY_EXPANSION_MIN_WORDS 3 Minimum query length (in words) for web search query expansion to trigger

Snapshot diff thresholds

Variable Default Notes
BATTERY_LOW_THRESHOLD_PCT 20.0 Battery level below which a snapshot diff reports "low"
SNAPSHOT_STALE_GRACE_MULTIPLIER 3 How many multiples of a job's own expected interval can pass before /health flags it as "stale" rather than "ok" — lower for tighter alerting on flakier hardware, raise if normal scheduler jitter on your own hardware is wider than the default assumes

Adversarial self-testing

Variable Default Notes
ADVERSARIAL_TEST_ENABLED true Master on/off switch for adversarial self-testing. false skips DB init, never registers the scheduler job, and makes POST /adversarial/trigger a safe no-op
ADVERSARIAL_TEST_INTERVAL_MINUTES 60 How often the scheduler tick fires
ADVERSARIAL_TEST_BATCH_SIZE 8 Queries generated per tick — cheap to raise, since generation is pure combinatorics with no LLM calls in the hot path
ADVERSARIAL_TEST_LATENCY_OUTLIER_MULTIPLIER 1.5 How many multiples of a recipe's own historical p95 latency counts as a real outlier
ADVERSARIAL_TEST_LATENCY_OUTLIER_FLOOR_MS 1000 A floor below which latency is never flagged regardless of the multiplier
ADVERSARIAL_TEST_LATENCY_OUTLIER_MIN_SAMPLES 10 How many historical samples a recipe needs before the latency-outlier check engages at all

Cross-source temporal pattern detection

Variable Default Notes
TEMPORAL_PATTERN_DETECTION_ENABLED true Master on/off switch for temporal pattern detection. false skips DB init, never registers the scheduler job, and makes POST /temporal-patterns/trigger a safe no-op — checked both at scheduler-registration time and inside the cycle function itself
TEMPORAL_PATTERN_MINING_INTERVAL_HOURS 24 How often the mining cycle runs. Deliberately far longer than every other scheduler job in this codebase — mining over a short window is statistically meaningless given how infrequently real structured events actually occur
TEMPORAL_PATTERN_LAG_WINDOW_MINUTES 30 The maximum lag within which event B must follow event A to count as one real occurrence of that pair
TEMPORAL_PATTERN_MIN_OCCURRENCES 5 A hard floor below which a pair is never even significance-tested, regardless of what the math would say. Raise this for a stricter bar on real homelab data volumes; lowering it below the default trades real statistical confidence for catching potential patterns sooner
TEMPORAL_PATTERN_SIGNIFICANCE_LEVEL 0.05 The per-comparison significance level, before Bonferroni correction divides it by the number of pairs actually tested in a given pass
TEMPORAL_PATTERN_VALIDATION_WINDOW_HOURS 24 How much later, non-overlapping data a candidate needs to be re-checked against before it can be promoted to confirmed
TEMPORAL_PATTERN_STALE_GRACE_MULTIPLIER 3 Same role as SNAPSHOT_STALE_GRACE_MULTIPLIER — how many missed mining intervals before /health flags this job stale

Security

Variable Default Notes
API_KEYS (blank — auth disabled) Comma-separated list of valid keys. Protects POST /search and GET /changes specifically — every other endpoint (/health, /areas, /backup, /cache, etc.) stays unauthenticated regardless of this setting, so monitoring tools and discovery requests aren't blocked. Clients send the key via the X-API-Key header. Leaving this blank matches the trust model of a homelab sitting behind your own firewall

Operational

Variable Default Notes
LOG_LEVEL INFO INFO is what actually shows the interesting decisions — decomposition splits, disambiguation candidates, article selection. This wasn't always true: application logging was silently disabled project-wide for a real stretch of this project's history (the root logger defaulted to WARNING with no handler configured), meaning every _LOGGER.info() call across the entire codebase was being swallowed before anyone could see it. Fixed once, and worth knowing about if you're ever debugging on a build old enough to predate that fix

Where to go from setting a value to understanding what it actually does

Most of the notes above link to the wiki page that covers the real mechanism a setting controls — Routing, Caching, Fusion, Kiwix Scoring, and so on. The numeric default itself is rarely the interesting part; the page it links to explains why that number, specifically, was chosen.

Clone this wiki locally