Skip to content

default indexing on and size backpressure to 20% of RAM#1202

Merged
bplatz merged 2 commits intomainfrom
fix/sane-defaults
Apr 27, 2026
Merged

default indexing on and size backpressure to 20% of RAM#1202
bplatz merged 2 commits intomainfrom
fix/sane-defaults

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented Apr 24, 2026

The out-of-the-box Docker experience in v4.0.1 was biased toward "tiny ephemeral test ledger" rather than "production database":

  • Indexing was disabled by default. fluree init (now run automatically by the v4.0.1 Docker entrypoint) generated a config with [server.indexing] commented out, falling through to DEFAULT_INDEXING_ENABLED = false.
  • Hard novelty ceiling was 1 MB. DEFAULT_REINDEX_MAX_BYTES = 1_000_000 — a test-sized value.

This PR flips both defaults and wires the programmatic FlureeBuilder API consistently with the server/CLI paths.

What changed

Indexing on by default

  • DEFAULT_INDEXING_ENABLED: true (const in fluree-db-api::server_defaults). Used by the server clap default_value_t, the fluree init TOML + JSON-LD templates, and the Docker entrypoint.
  • is_indexing_enabled() (JSON-LD config path) now defaults to true, resolving the prior mismatch with defaults_indexing_enabled() which was already true.
  • FlureeBuilder::file(), ::s3(), ::ipfs() now default indexing_config to Some(default_indexing_builder_config()). The programmatic API path now matches the server path — indexing is the default for any persistent builder.
  • New FlureeBuilder::without_indexing() opt-out method. The only legitimate production reason to disable indexing is a peer / external-indexer setup where a separate process owns index maintenance; the method's doc comment says so explicitly.
  • build_memory() is left as hardcoded IndexingMode::Disabled — it's a sync test helper that would need a tokio runtime to spawn the background worker.
  • Template profile example in fluree init flipped from "enable indexing for prod" to "disable indexing for a dedicated-indexer peer setup".

reindex_max_bytes → 20% of RAM, single source of truth at the API layer

Configuration policy (RAM detection, tiered sizing, env/flag resolution) belongs at the API layer, not in leaf crates. A leaf crate that hands out a hidden default invites bugs where upstream forgets to wire something and the system silently runs on the leaf's value.

  • fluree-db-ledger::IndexConfig no longer impls Default. Callers must construct it explicitly. Missing upstream wiring now fails to compile rather than running on a stale hardcoded value. Doc comment on the struct says so.
  • fluree-db-api::server_defaults::default_index_config() (new) is the canonical default for any API-layer caller. It composes DEFAULT_REINDEX_MIN_BYTES (100 KB) with default_reindex_max_bytes().
  • fluree-db-api::server_defaults::default_reindex_max_bytes() (new) returns 20% of detected system RAM, floored at 64 MB, with a 256 MB fallback when sysinfo is unavailable. Mirrors the existing CacheConfig::default RAM-tiered pattern.
  • All production sites that previously called IndexConfig::default() or Option<IndexConfig>::unwrap_or_default() now route through default_index_config(): FlureeBuilder::file()/::s3(), derive_indexing(), the JSON-LD derive_index_config() fallback, the server's commit-push handler, the transient in-memory Fluree used by nameservice_query, etc.
  • Server's reindex_max_bytes: usize field → Option<usize> (same shape as cache_max_mb). Resolves lazily via default_reindex_max_bytes() when unset, so CLI args, env vars, config file, and built-in defaults all flow through one place.
  • Tests use explicit IndexConfig { reindex_min_bytes: …, reindex_max_bytes: … } literals — no more hidden defaults in test fixtures.
  • reindex_min_bytes constant unchanged at 100 KB — indexing is fast, jobs queue and drain under load.

Peer / external-indexer backpressure

  • When indexing_enabled = false on the server, the startup path now calls .without_indexing().with_novelty_thresholds(...) rather than leaving the builder with no thresholds. Without this, a peer transactor whose external indexer falls behind would accumulate novelty indefinitely and eventually OOM.

Effective defaults on typical Docker hosts

Host RAM reindex_max_bytes
2 GB 400 MB
4 GB 800 MB
8 GB 1.6 GB
16 GB 3.2 GB
32 GB 6.4 GB

Docs

  • docs/operations/configuration.md — flag table, env-var table, TOML example, and JSON-LD example updated. 20% of system RAM (256 MB fallback) replaces the old 1000000 default in reference tables.
  • docs/indexing-and-search/background-indexing.md — "enabled at the server level" → "on by default".
  • docs/cli/server.md, docs/getting-started/quickstart-server.md — reference examples updated; the redundant FLUREE_INDEXING_ENABLED: "true" line removed from the docker-compose snippet (keeping it would misleadingly imply users need to set it).

What this does NOT change

  • FlureeBuilder::build_memory() stays as hardcoded IndexingMode::Disabled (test helper; flipping would require tokio for every call-site and break many tests).
  • Integration tests that construct ServerConfig { indexing_enabled: false, .. } explicitly are unchanged — they're explicit opt-outs, which still works.

@bplatz bplatz requested review from aaj3f and zonotope April 24, 2026 22:33
Copy link
Copy Markdown
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏵️

@bplatz bplatz merged commit 30151c6 into main Apr 27, 2026
12 checks passed
@bplatz bplatz deleted the fix/sane-defaults branch April 27, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants