Skip to content

fix: Complete incremental indexing on Standard S3#1245

Merged
bplatz merged 6 commits into
mainfrom
debug/s3-indexing
May 21, 2026
Merged

fix: Complete incremental indexing on Standard S3#1245
bplatz merged 6 commits into
mainfrom
debug/s3-indexing

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented May 17, 2026

  • Fixes a Phase 3b incremental indexing hang on Standard S3 where, after a successful initial full-build, the indexer would deterministically stall on the first per-class property-attribution merge entry and never progress past the first indexed t. Under pre-fix builds, the indexer Lambda burned its full 15 min execution budget and fell back to a backpressure-triggered full rebuild on every subsequent commit.
  • Keeps incremental class/property stats merging storage-free by removing BinaryIndexStore lookups from the per-property ref-class merge loop — the hot path that was triggering on-demand dictionary/index artifact loads on Standard S3.
  • Adds operations guidance comparing Standard S3 vs S3 Express One Zone for index storage in serverless/Lambda deployments, including benchmark-backed transaction, query, and indexing latency expectations.
  • The fix improves S3 Express one indexing speed as well, this is not just a S3 standard fix. Stats show 30%+ improvement at the 10MB ledger range and speedup is proportional to ledger size, GB+ ledgers should be very meaningful improvements.

Details

Root cause

Hidden store-backed class Sid resolution inside incremental stats merging. The per-property ref-class merge loop was reaching into BinaryIndexStore to resolve class IDs, which could demand-load dictionary and index artifacts mid-loop. On S3 Express that cost was masked by ~1–5 ms per-object latency; on Standard S3, with ~50–200 ms per-object latency and many small reads per merge entry, the loop effectively never made forward progress on larger ledgers.

The fix moves all required resolution out of the inner merge loop so the per-property attribution merge is fully in-memory.

Before / after on a medium staged workload

Workload: ~8.5 MB JSON-LD, ~10K subjects, 5 commits (~1.5 MB chunk each). Indexer wall time measured from the indexer Lambda's own processing_time_ms (CloudWatch), not client-side polling. Identical compiled Lambdas on both stacks; only the index bucket type differs.

Pre-fix indexing on Standard S3

Chunk Express Standard Ratio Notes
1 (full rebuild) 561 ms 1,383 ms 2.5× initial build — fine
2 (first incremental) 755 ms 31,971 ms 42× falls off a cliff
3 912 ms TIMEOUT (> 10 min) never completes
4 1,497 ms TIMEOUT

Post-fix indexing on Standard S3

Chunk Express Standard Standard / Express
1 (full rebuild) 482 ms 1,197 ms 2.5×
2 (first incremental) 706 ms 1,768 ms 2.5×
3 878 ms 1,472 ms 1.7×
4 969 ms 1,629 ms 1.7×
5 1,064 ms 1,901 ms 1.8×

Standard S3 indexing is now a clean ~1.7–2.5× slower than Express across all chunks, with no degradation over time (where pre-fix it was unusable past the first incremental).

Transactions

No meaningful difference between backends. Transaction wall time on both stacks lands in a ~4–7 s/commit range, dominated by the synchronous commit path through the transactor / SQS FIFO queue — index storage isn't on that critical path.

Queries

With a caught-up indexer, hot/warm query latency on Standard S3 is statistically indistinguishable from Express. Server-side median across 5 iterations after warmup, sampled after each of the 5 chunks:

Express median range Standard median range
Single-subject lookup 152–171 ms 157–176 ms
Group aggregate 178–247 ms 171–228 ms
Multi-hop join + filter 144–184 ms 144–174 ms

Standard is occasionally faster, sometimes slower — all within normal runtime noise. The 8 GB Lambda /tmp disk artifact cache absorbs per-request S3 latency once warm.

Cold/simple query latency on Standard generally ranges from no measurable difference up to ~30% slower. On a tiny synthetic dataset (10 iters after warmup), Standard medians were 11–33% slower than Express — the penalty scales with the number of index segments touched per query.

indexing gap. Pre-fix on Standard, a multi-hop join median climbed from 160 ms → 578 ms (3.6×) across 4 chunks as the indexer fell further behind. Post-fix, with the indexer keeping up, the same query stays flat at 144–174 ms across all 5 chunks.

When S3 Express One Zone still matters

For larger ledgers and sustained indexing throughput, S3 Express One Zone remains a meaningful optimization — observed indexing speedups of 30%+ versus Standard S3, and the gap is expected to widen with:

  • larger working sets (more cache-miss reads per commit)
  • wider class/property statistics (more attribution merge entries)
  • many small index blobs touched per query before the local disk cache is warm

For workloads that are mostly hot queries, modest indexing volume, or cost-sensitive — Standard S3 is a viable index backend.

Docs

  • New: docs/operations/serverless-storage.md — Standard S3 vs S3 Express One Zone guidance, expected ranges, tuning notes.
  • Updated: docs/operations/storage.md, docs/operations/README.md, docs/reference/connection-config-jsonld.md, docs/getting-started/rust-api.md, docs/SUMMARY.md (cross-links + new
    s3MaxConcurrentRequests field).

@bplatz bplatz requested review from aaj3f and zonotope May 17, 2026 12:56
Copy link
Copy Markdown
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍙

@bplatz bplatz merged commit ddd5964 into main May 21, 2026
14 checks passed
@bplatz bplatz deleted the debug/s3-indexing branch May 21, 2026 09:37
@bplatz bplatz mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants