Rework backfill design doc: UPPER_SNAKE_CASE, [LOGGING], cobra invocation#700
Merged
karthikiyer56 merged 1 commit intofeature/full-historyfrom Apr 20, 2026
Merged
Conversation
…ocation - Migrate all TOML sections + keys to UPPER_SNAKE_CASE; prose references, placeholder forms, directory-tree annotations, example config block all updated. CLI flags stay kebab-case; pseudocode and filesystem dir names stay lowercase. - Add sectioned [LOGGING] with LEVEL / FORMAT keys; CLI --log-level / --log-format override TOML (specifying both is not an error). Deliberate divergence from legacy flat LOG_LEVEL / LOG_FORMAT layout. - Fix invocation examples to cobra subcommand form (stellar-rpc full-history-backfill, not --mode=...). - README scope blurb: drop (this PR) / (future PR) / "separate PR" PM framing; keep specific PR-number cross-references. Addresses #683. PRD: #678.
4 tasks
karthikiyer56
added a commit
that referenced
this pull request
Apr 27, 2026
Reframes the backfill doc as an internal subroutine invoked by Phase 1 of the streaming daemon, matching the post-#700 unified design. All operator-CLI ceremony is removed; naming + style now match 02-streaming-workflow.md. Alignment edits (source-of-truth §9): - Dropped the stellar-rpc full-history-backfill cobra subcommand and all per-run CLI flags (--start-ledger, --end-ledger, --workers, --verify-recsplit, --max-retries). Removed the getStatus endpoint (daemon-level getHealth covers status under unified design). - run_backfill signature is now run_backfill(config, range_start_chunk_id, range_end_chunk_id, source) where source is a LedgerSource (BSB or captive core), matching the streaming doc's ledger source abstraction. - process_chunk takes source= and reads via source.get_range(...) instead of constructing its own BSB connection. - DAG worker cap honors source.max_parallelism() (GOMAXPROCS for BSB, 1 for captive core). - Retention handling removed entirely from backfill. The daemon's validate_config owns retention + immutable-key enforcement. - Partial Tx Index Ranges section describes how trailing partial-tx- index chunks (.bin files + :txhash flags) persist until Phase 2 hydrates them on the next daemon start. Naming + style alignment with 02-streaming-workflow.md: - snake_case Python pseudocode throughout. - GENESIS_LEDGER, LEDGERS_PER_CHUNK, LEDGERS_PER_INDEX, and CHUNKS_PER_TXHASH_INDEX defined as SCREAMING_SNAKE constants. - tx_index_id replaces index_id; chunk_id, tx_index_id are the canonical long names (no bare C / N / L / T / R placeholders). - Geometry functions: chunk_id_of_ledger, first_ledger_in_chunk, last_ledger_in_chunk, tx_index_id_of_chunk, first_ledger_in_tx_index, last_ledger_in_tx_index — matching streaming doc exactly. - Meta-store key templates use {chunk_id:08d} and {tx_index_id:08d}. - UPPER_SNAKE_CASE TOML keys and section headers throughout. - build_txhash_index's internal pipeline renamed from Phase 1 / 2 / 3 / 4 to Stage 1 / 2 / 3 / 4, preserving the rule that "Phase" refers only to the daemon's startup phases. Grill-me Pass A (correctness) fixes: - cleanup_txhash now uses delete_if_exists for .bin files — crash between .bin delete and :txhash flag delete is safe to retry. - build_dag schedules build_txhash_index for tx indexes whose LAST chunk falls in the current range, and filters process_chunk scheduling to in-range chunks only. Covers the cross-iteration tx-index-completion case where iteration N ingests a tx index's first chunks and iteration N+1 ingests its last chunks: build runs in iteration N+1 using .bin files from both iterations. - validate asserts source.tip() >= last_ledger_in_chunk(range_end_chunk_id) rather than calling source.covers() (not on the LedgerSource interface). - build_txhash_index: added an invariant comment noting that every chunk's .bin is on disk when the task runs (DAG ordering guarantees cleanup can only run after build succeeds). Grill-me Pass B (ambiguity) fixes: - cpi defined inline in Geometry as shorthand for CHUNKS_PER_TXHASH_INDEX. - Stray "txhash index" in prose replaced with "tx index" for consistency with the streaming doc's dominant form. - "LFS" in process_chunk's key-properties bullet spelled as "ledger pack file (:lfs flag)". File now 664 lines (down from 698 at HEAD). ID-leak grep (CC/D/B-INV) and stale-term grep both zero-hit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[LOGGING]withLEVEL/FORMATkeys; CLI--log-level/--log-formatoverride TOML (specifying both is not an error).stellar-rpc full-history-backfill, dropping--mode=...).design-docs/README.mdscope blurb: dropped(this PR)/(future PR)/ "separate PR" PM framing; specific PR-number cross-references (Checking in full-history design docs into stellar-rpc for review #617, Add packfile library design doc #633) retained.Reviewer call-outs
[LOGGING]diverges from legacy stellar-rpc's flatLOG_LEVEL/LOG_FORMATTOML layout. Intentional: legacy code is slated for removal. Intuitively, it makes sense to have a dedicatedLOGGINGsection in the toml file, since there might be more config params that might be added in the future.The flat layout, although not restrictive, does not make for a clean nesting
Closes #683. PRD: #678.