default indexing on and size backpressure to 20% of RAM#1202
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The out-of-the-box Docker experience in v4.0.1 was biased toward "tiny ephemeral test ledger" rather than "production database":
fluree init(now run automatically by the v4.0.1 Docker entrypoint) generated a config with[server.indexing]commented out, falling through toDEFAULT_INDEXING_ENABLED = false.DEFAULT_REINDEX_MAX_BYTES = 1_000_000— a test-sized value.This PR flips both defaults and wires the programmatic
FlureeBuilderAPI consistently with the server/CLI paths.What changed
Indexing on by default
DEFAULT_INDEXING_ENABLED: true(const influree-db-api::server_defaults). Used by the server clapdefault_value_t, thefluree initTOML + JSON-LD templates, and the Docker entrypoint.is_indexing_enabled()(JSON-LD config path) now defaults totrue, resolving the prior mismatch withdefaults_indexing_enabled()which was alreadytrue.FlureeBuilder::file(),::s3(),::ipfs()now defaultindexing_configtoSome(default_indexing_builder_config()). The programmatic API path now matches the server path — indexing is the default for any persistent builder.FlureeBuilder::without_indexing()opt-out method. The only legitimate production reason to disable indexing is a peer / external-indexer setup where a separate process owns index maintenance; the method's doc comment says so explicitly.build_memory()is left as hardcodedIndexingMode::Disabled— it's a sync test helper that would need a tokio runtime to spawn the background worker.fluree initflipped from "enable indexing for prod" to "disable indexing for a dedicated-indexer peer setup".reindex_max_bytes→ 20% of RAM, single source of truth at the API layerConfiguration policy (RAM detection, tiered sizing, env/flag resolution) belongs at the API layer, not in leaf crates. A leaf crate that hands out a hidden default invites bugs where upstream forgets to wire something and the system silently runs on the leaf's value.
fluree-db-ledger::IndexConfigno longer implsDefault. Callers must construct it explicitly. Missing upstream wiring now fails to compile rather than running on a stale hardcoded value. Doc comment on the struct says so.fluree-db-api::server_defaults::default_index_config()(new) is the canonical default for any API-layer caller. It composesDEFAULT_REINDEX_MIN_BYTES(100 KB) withdefault_reindex_max_bytes().fluree-db-api::server_defaults::default_reindex_max_bytes()(new) returns 20% of detected system RAM, floored at 64 MB, with a 256 MB fallback whensysinfois unavailable. Mirrors the existingCacheConfig::defaultRAM-tiered pattern.IndexConfig::default()orOption<IndexConfig>::unwrap_or_default()now route throughdefault_index_config():FlureeBuilder::file()/::s3(),derive_indexing(), the JSON-LDderive_index_config()fallback, the server's commit-push handler, the transient in-memory Fluree used bynameservice_query, etc.reindex_max_bytes: usizefield →Option<usize>(same shape ascache_max_mb). Resolves lazily viadefault_reindex_max_bytes()when unset, so CLI args, env vars, config file, and built-in defaults all flow through one place.IndexConfig { reindex_min_bytes: …, reindex_max_bytes: … }literals — no more hidden defaults in test fixtures.reindex_min_bytesconstant unchanged at 100 KB — indexing is fast, jobs queue and drain under load.Peer / external-indexer backpressure
indexing_enabled = falseon the server, the startup path now calls.without_indexing().with_novelty_thresholds(...)rather than leaving the builder with no thresholds. Without this, a peer transactor whose external indexer falls behind would accumulate novelty indefinitely and eventually OOM.Effective defaults on typical Docker hosts
reindex_max_bytesDocs
docs/operations/configuration.md— flag table, env-var table, TOML example, and JSON-LD example updated.20% of system RAM (256 MB fallback)replaces the old1000000default in reference tables.docs/indexing-and-search/background-indexing.md— "enabled at the server level" → "on by default".docs/cli/server.md,docs/getting-started/quickstart-server.md— reference examples updated; the redundantFLUREE_INDEXING_ENABLED: "true"line removed from the docker-compose snippet (keeping it would misleadingly imply users need to set it).What this does NOT change
FlureeBuilder::build_memory()stays as hardcodedIndexingMode::Disabled(test helper; flipping would require tokio for every call-site and break many tests).ServerConfig { indexing_enabled: false, .. }explicitly are unchanged — they're explicit opt-outs, which still works.