feat: Remote mounts, serving tiers, and peer-mode CLI by bplatz · Pull Request #1428 · fluree/db

bplatz · 2026-07-03T22:00:50Z

Serve ledgers to downstream consumers in two tiers and let a consumer mount a remote Fluree's ledgers as read-only, locally-queryable sources. Design doc: docs/design/remote-mounts.md; user guide: docs/guides/sharing-data.md.

The model

Query serving — the origin executes queries (its compute, row-level policy per identity). The only tier with fine-grained permissioning.
Block serving — the origin serves canonical, CID-verified index content and the consumer executes queries locally (their compute, Iceberg-style). Strictly all-or-nothing per (token, ledger); fine-grained access stays on the query tier.

What's included

Raw block serving. ProxyStorage gains explicit read modes: Raw fetches canonical CAS bytes via GET /storage/objects/{cid} with client-side CID verification; Filtered keeps the FLKB negotiation. Peer proxy mode now uses Raw — which is what makes binary-indexed ledgers actually readable over the proxy (the FLKB tier has no leaf decoder on the read path).

Per-ledger serving posture. New f:servingDefaults setting group (f:serveQuery / f:serveBlocks / f:publicVisibility) in the ledger config graph, enforced on transaction-role servers only (query gate 403; blocks gate 404 on /storage/block, /storage/objects, /commits, /pack) and advertised per-caller on NS record responses plus a coarse block in /.well-known/fluree.json.

Remote mounts. FlureeBuilder::with_remote_mount composes a CompositeNameService (prefix-routed reads with record localization — remote inventory:main appears as acme/inventory:main; writes to mounts rejected) with a new StorageBackend::Routed variant (namespace-prefix store selection at the single content_store seam). Mounted ledgers keep full native semantics including mixed local+mounted dataset queries. ProxyStorage/ProxyNameService moved to fluree-db-nameservice-sync (server re-exports preserve the old paths).

CLI peer mode. fluree track add --mode peer runs queries locally over blocks fetched on demand, CID-verified and cached in a persistent per-remote disk cache; writes still forward over HTTP. Plus fluree remote ledgers (the token's auth-filtered catalog with serving tiers) and fluree cache status|clear.

HTTP Range. Single-range requests on /storage/objects (206 + Content-Range, full-object CID verification before slicing) and native ranged reads in ProxyStorage.

Vended S3 credentials. For S3-backed origins, GET /storage/credentials?ledger= mints STS AssumeRole grants narrowed by session policy to the ledger's name-level prefix (covers all branches + the @shared dict namespace). Consumers auto-refresh grants inside an expiry margin; the CLI probes and prefers S3-direct automatically, falling back to proxied reads on 404. Single-bucket S3 only in this iteration; short TTLs since grants outlive revocation until expiry.

Fix along the way: @shared dict-blob addresses carry only the ledger name, so proxy clients derive a default-branch alias — the block/object endpoints now branch-resolve dict-blob requests (previously peers tracking non-main branches 404'd on dictionary fetches). The legacy per-branch dict layout also parses now.

Docs

New: design/remote-mounts.md, guides/sharing-data.md, cli/cache.md. Updated: operations/query-peers.md (two read tiers; corrected /storage/block leaf semantics), ledger-config/setting-groups.md, design/auth-contract.md, cli/track.md, cli/remote.md, api/endpoints.md, and cli/server-integration.md (the contract an embedding server must expose for peer-mode consumers, including the dict-blob branch-resolution and 404-fallback requirements).

Testing

proxy_integration.rs: 28 tests including end-to-end peer query over HTTP against an indexed ledger, remote-mount mixed-dataset query + write rejection, serving-posture gate/advertisement flips, raw-mode byte identity + ranged reads, vended-credentials 404 gate.
LocalStack round trip (it_vended_credentials_testcontainers): mint against real STS → build reader from the grant → read a CAS object through the fluree address layer.
Wiremock coverage for grant fetch 404-fallback and refresh-on-expiry; unit tests for the session-policy shape, mount routing/localization, and CLI target resolution (peer → local queries, downgrade to HTTP for writes).
CI parity: clippy (all features/targets, -D warnings) and cargo check --workspace --all-features --all-targets clean.

Notes for review

Base is feature/rdfs-enforcement-entailment (this branch was cut from its tip).
Serving posture binds only the origin's serving surface by design — a consumer holding the blocks always queries its own copy; rationale in the design doc.
LocalStack community doesn't enforce IAM session policies, so prefix-scoping enforcement rests on the policy-shape unit tests + AWS semantics; worth one manual verification against real AWS before production reliance.
Follow-ups deliberately out of scope: server-level mount config flags, --mode auto negotiation, the f:publicVisibility anonymous tier, split-bucket vend grants.

Serve ledgers to other Fluree instances in two tiers and let a consumer mount a remote's ledgers as read-only, locally-queryable sources. - ProxyStorage read modes: Raw fetches canonical CAS bytes via GET /storage/objects/{cid} with client-side CID verification (what makes indexed ledgers readable over the proxy — the FLKB tier has no leaf decoder on the read path); Filtered keeps the FLKB negotiation. Peer proxy mode now uses Raw. - Per-ledger serving posture: new f:servingDefaults setting group (f:serveQuery / f:serveBlocks / f:publicVisibility) in the ledger config graph, enforced on transaction-role servers only (query gate 403, blocks gate 404 on /storage/block, /storage/objects, /commits, /pack) and advertised per-caller as serving: ["query","blocks"] on NS record responses plus a coarse block in /.well-known/fluree.json. - Remote mounts: FlureeBuilder::with_remote_mount composes a CompositeNameService (prefix-routed reads with record localization, writes to mounted aliases rejected) with StorageBackend::Routed (namespace-prefix store selection at the content_store seam), so mounted ledgers get full native semantics including mixed datasets. - ProxyStorage/ProxyNameService moved to fluree-db-nameservice-sync (server re-exports keep the peer paths); from_api_base constructors for non-default API mounts; mount-prefix stripping on derived aliases. - HTTP Range on /storage/objects (206 + Content-Range, full-object CID verification before slicing) and native ranged reads in ProxyStorage. - Fix: dict-blob requests are branch-resolved server-side — @shared addresses carry only the ledger name, so non-main-branch peers previously 404'd on dict fetches; the legacy per-branch dict layout now parses too.

- fluree track add --mode peer: queries execute locally against index blocks fetched on demand from the remote's raw storage tier (CID-verified), while writes and admin commands keep forwarding over HTTP (resolve_ledger_mode downgrades the peer target to Tracked). - Per-remote persistent artifact cache under the OS cache dir; entries are content-addressed and immutable so clearing is always safe. verify_freshness_on_cache_hit keeps heads current against the remote. - fluree remote ledgers <name>: the remote's auth-filtered catalog with the serving tiers each ledger offers (query / blocks). - fluree cache status|clear for the peer cache. - track list shows the mode column; peer entries persist as mode = "peer" in [[tracked_ledgers]].

- New docs/design/remote-mounts.md: the serving-tier model (query / blocks / reserved filtered tier), per-caller resolution, mount architecture (CompositeNameService + StorageBackend::Routed + ProxyStorage modes), and the CID-verified cache-forever integrity semantics, with the fine-grained and vended-origin extension points. - setting-groups.md: f:servingDefaults as a ledger-scoped group. - query-peers.md: the two read tiers, corrected /storage/block leaf semantics, Range behavior, raw-tier access model. - auth-contract.md: discovery serving capability block. - CLI docs: track --mode peer, remote ledgers, new cache page.

For S3-backed origins, hand authorized peers short-lived STS credentials scoped to a ledger's prefix so they read index content directly from S3 (native ranged reads, no origin bandwidth) instead of proxying every object through the origin's HTTP server. - fluree-db-api::vended_credentials: STS AssumeRole minting with a session policy narrowed to the ledger's name-level prefix (covers all branches + the @shared dict namespace, matching the all-or-nothing raw tier); s3:ListBucket is prefix-conditioned so missing keys stay 404 (the reader's legacy dict fallback depends on 404-vs-403); S3VendScope extraction from the connection config (single-bucket S3 only — split commit/index layouts are refused). - Server: GET /storage/credentials?ledger= behind the same guards as raw object serving (bearer scope, namespace guard, f:serveBlocks posture; 404 anti-leak). Config: --storage-vend-enabled, --storage-vend-role-arn, --storage-vend-ttl-secs (default 900, the STS minimum — grants outlive revocations until expiry, so short TTLs). - fluree-db-nameservice-sync (feature aws): grant fetch client, a ProvideCredentials impl that refreshes grants inside a 60s expiry margin (single-flight), and build_vended_s3_storage composing an S3Storage whose credentials auto-refresh; 404 means fall back to proxied reads. - CLI peer mode probes the endpoint and prefers direct S3 automatically, falling back to ProxyStorage. - Tests: LocalStack round trip (mint -> grant -> S3 reader -> CAS object read through the fluree address layer), wiremock refresh/404 paths, policy-shape unit tests, server 404 gate test.

End-to-end guide covering the sharing patterns (query serving vs peer/block serving vs replication) with a decision table, provider setup (trusted issuers, token minting per tier, per-ledger f:servingDefaults participation, identity-bound row-level permissioning, vended S3 credentials), the consumer-side CLI workflow (remote add / auth login / remote ledgers / track modes / clone / cache), programmatic mounts, and the revocation/integrity/freshness semantics. Indexed in SUMMARY and the guides README.

Document the endpoints an embedding server must expose for CLI peer mode (fluree track --mode peer) and fluree remote ledgers: the NsRecord lookup and CAS object endpoints with their required semantics (all-or-nothing authorization with 404 anti-leak, dict-blob branch resolution for name-scoped @shared artifacts, exact CID-verifiable bytes), recommended Range support, and the optional vended-credentials endpoint with its 404-fallback contract.

aaj3f

This is a nice feature addition -- glad to have it and eager to use it

aaj3f · 2026-07-05T01:16:29Z

    if state.config.server_role != ServerRole::Peer {
-        return Ok(handle.snapshot().await.to_ledger_state());
+        let ledger_state = handle.snapshot().await.to_ledger_state();
+        let serving = crate::routes::serving::effective_serving_from_state(&ledger_state).await?;


The query serving gate calls effective_serving_from_state → config_resolver::resolve_ledger_config, which resolves the entire ledger config graph (policy, shacl, reasoning, datalog, transact, full_text, serving, graph_overrides — 8 group reads plus a find_instances_of_type scan) on every query to a transaction-role server, purely to read f:serveQuery. It duplicates the config resolution the db-view build already performs per query (fluree-db-api/src/view/fluree_ext.rs:120). Bounding factors (why this is minor, not major): resolve_ledger_config early-returns cheaply when the config graph is empty (guard at config_resolver.rs:67-84), so unconfigured ledgers pay almost nothing; the gate runs only on transaction/origin servers (peers skip it); and the in-memory config graph is dwarfed by the query it precedes. Still worth fixing: resolve only the serving group (see the serving.rs suggestion), thread the already-resolved config into the gate, or memoize the posture per (ledger_id, t).

// Prefer a targeted resolver (see serving.rs suggestion) so the gate reads // only f:servingDefaults instead of the full config graph, or reuse the // ResolvedConfig the view build already computes for this same snapshot/t. let serving = crate::routes::serving::effective_serving_from_state(&ledger_state).await?;

aaj3f · 2026-07-05T01:17:32Z

+    // 3c. Serving gate: the ledger's f:serveBlocks posture must allow raw
+    //     content serving.
+    let serving =
+        crate::routes::serving::effective_serving(&state.fluree, &effective_ledger).await?;


get_object_by_cid is the peer-mode raw-read hot path: a cold peer sync fetches every index leaf, branch, root, dict blob, and commit through this endpoint. This PR adds, per object, both a nameservice lookup (via resolve_block_ledger, line 708) and a full resolve_ledger_config (via effective_serving, line 720). Previously this handler only checked token scope and read bytes — zero ledger loads, zero config resolution. Bounding factors (why this is minor, not major): the ledger handle is cached (no per-object reload — effective_serving reuses ledger_cached); resolve_ledger_config early-returns for unconfigured ledgers; and per-request cost is dominated by JWT verification + storage IO + full-object SHA-256, so the aggregate config-resolution overhead on a cold sync is negligible next to the byte transfer. The ns.lookup namespace guard is a legitimate correctness addition (namespace guard + dict-blob branch resolution), not waste — only the repeated full-config resolution is avoidable. Memoize EffectiveServing per (effective_ledger, t) on AppState and/or resolve only the serving group.

// e.g. a DashMap<(String, i64), EffectiveServing> on AppState keyed by // (effective_ledger, snapshot.t), populated on miss — a config change bumps t. let serving = state.effective_serving_cached(&effective_ledger).await?;

aaj3f · 2026-07-05T01:24:12Z

+/// Resolve the serving posture from an already-loaded ledger state.
+///
+/// Reads the config graph as-of `state.t()` (novelty-inclusive), so a
+/// committed-but-unindexed config change takes effect immediately.
+pub(crate) async fn effective_serving_from_state(
+    state: &LedgerState,
+) -> Result<EffectiveServing, ServerError> {
+    let overlay: &dyn OverlayProvider = &*state.novelty;
+    let config = config_resolver::resolve_ledger_config(&state.snapshot, overlay, state.t())
+        .await
+        .map_err(|e| ServerError::internal(format!("Serving config resolution failed: {e}")))?;
+    Ok(EffectiveServing::from_config(
+        config.as_ref().and_then(|c| c.serving.as_ref()),
+    ))
+}


effective_serving_from_state resolves the whole LedgerConfig (all 8 setting groups) but uses only config.serving. Every gate check (query, object, block, commits, pack, credentials, NS advertisement) pays for 7 unused group reads. Add a targeted resolver that reads only f:servingDefaults.

// In config_resolver: expose a resolve_serving_only(snapshot, overlay, to_t) // that runs find_instances_of_type + read_serving_defaults and nothing else. let serving = config_resolver::resolve_serving_only(&state.snapshot, overlay, state.t()).await?; Ok(EffectiveServing::from_config(serving.as_ref()))

bplatz requested review from aaj3f and zonotope July 3, 2026 22:00

bplatz mentioned this pull request Jul 3, 2026

feat: event-time commits with @recorded: audit-axis time travel #1429

Open

bplatz added 7 commits July 4, 2026 10:07

memory

cfd1f90

bplatz force-pushed the feature/remote-mounts branch from 078af0b to 929b024 Compare July 4, 2026 14:07

aaj3f approved these changes Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Remote mounts, serving tiers, and peer-mode CLI#1428

feat: Remote mounts, serving tiers, and peer-mode CLI#1428
bplatz wants to merge 7 commits into
feature/rdfs-enforcement-entailmentfrom
feature/remote-mounts

bplatz commented Jul 3, 2026

Uh oh!

aaj3f left a comment

Uh oh!

aaj3f Jul 5, 2026

Uh oh!

aaj3f Jul 5, 2026

Uh oh!

aaj3f Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bplatz commented Jul 3, 2026

The model

What's included

Docs

Testing

Notes for review

Uh oh!

aaj3f left a comment

Choose a reason for hiding this comment

Uh oh!

aaj3f Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

aaj3f Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

aaj3f Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants