Skip to content

Rework backfill design doc: UPPER_SNAKE_CASE, [LOGGING], cobra invocation#700

Merged
karthikiyer56 merged 1 commit intofeature/full-historyfrom
karthik/backfill-impl/doc-rework
Apr 20, 2026
Merged

Rework backfill design doc: UPPER_SNAKE_CASE, [LOGGING], cobra invocation#700
karthikiyer56 merged 1 commit intofeature/full-historyfrom
karthik/backfill-impl/doc-rework

Conversation

@karthikiyer56
Copy link
Copy Markdown
Contributor

@karthikiyer56 karthikiyer56 commented Apr 18, 2026

Summary

  • TOML sections + keys migrated to UPPER_SNAKE_CASE throughout the backfill design doc (prose refs, placeholders, example config, directory-tree annotations).
  • New sectioned [LOGGING] with LEVEL / FORMAT keys; CLI --log-level / --log-format override TOML (specifying both is not an error).
  • Invocation examples switched to cobra subcommand form (stellar-rpc full-history-backfill, dropping --mode=...).
  • design-docs/README.md scope blurb: dropped (this PR) / (future PR) / "separate PR" PM framing; specific PR-number cross-references (Checking in full-history design docs into stellar-rpc for review #617, Add packfile library design doc #633) retained.

Reviewer call-outs

  • Sectioned [LOGGING] diverges from legacy stellar-rpc's flat LOG_LEVEL / LOG_FORMAT TOML layout. Intentional: legacy code is slated for removal. Intuitively, it makes sense to have a dedicated LOGGING section in the toml file, since there might be more config params that might be added in the future.
    The flat layout, although not restrictive, does not make for a clean nesting

Closes #683. PRD: #678.

…ocation

- Migrate all TOML sections + keys to UPPER_SNAKE_CASE; prose references,
  placeholder forms, directory-tree annotations, example config block all
  updated. CLI flags stay kebab-case; pseudocode and filesystem dir names
  stay lowercase.
- Add sectioned [LOGGING] with LEVEL / FORMAT keys; CLI --log-level /
  --log-format override TOML (specifying both is not an error). Deliberate
  divergence from legacy flat LOG_LEVEL / LOG_FORMAT layout.
- Fix invocation examples to cobra subcommand form
  (stellar-rpc full-history-backfill, not --mode=...).
- README scope blurb: drop (this PR) / (future PR) / "separate PR" PM
  framing; keep specific PR-number cross-references.

Addresses #683. PRD: #678.
Copy link
Copy Markdown
Contributor

@cjonas9 cjonas9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@karthikiyer56 karthikiyer56 merged commit eba167d into feature/full-history Apr 20, 2026
15 checks passed
@karthikiyer56 karthikiyer56 deleted the karthik/backfill-impl/doc-rework branch April 20, 2026 15:30
karthikiyer56 added a commit that referenced this pull request Apr 27, 2026
Reframes the backfill doc as an internal subroutine invoked by Phase 1
of the streaming daemon, matching the post-#700 unified design. All
operator-CLI ceremony is removed; naming + style now match
02-streaming-workflow.md.

Alignment edits (source-of-truth §9):

- Dropped the stellar-rpc full-history-backfill cobra subcommand and
  all per-run CLI flags (--start-ledger, --end-ledger, --workers,
  --verify-recsplit, --max-retries). Removed the getStatus endpoint
  (daemon-level getHealth covers status under unified design).
- run_backfill signature is now
    run_backfill(config, range_start_chunk_id, range_end_chunk_id, source)
  where source is a LedgerSource (BSB or captive core), matching the
  streaming doc's ledger source abstraction.
- process_chunk takes source= and reads via source.get_range(...)
  instead of constructing its own BSB connection.
- DAG worker cap honors source.max_parallelism() (GOMAXPROCS for BSB,
  1 for captive core).
- Retention handling removed entirely from backfill. The daemon's
  validate_config owns retention + immutable-key enforcement.
- Partial Tx Index Ranges section describes how trailing partial-tx-
  index chunks (.bin files + :txhash flags) persist until Phase 2
  hydrates them on the next daemon start.

Naming + style alignment with 02-streaming-workflow.md:

- snake_case Python pseudocode throughout.
- GENESIS_LEDGER, LEDGERS_PER_CHUNK, LEDGERS_PER_INDEX, and
  CHUNKS_PER_TXHASH_INDEX defined as SCREAMING_SNAKE constants.
- tx_index_id replaces index_id; chunk_id, tx_index_id are the canonical
  long names (no bare C / N / L / T / R placeholders).
- Geometry functions: chunk_id_of_ledger, first_ledger_in_chunk,
  last_ledger_in_chunk, tx_index_id_of_chunk, first_ledger_in_tx_index,
  last_ledger_in_tx_index — matching streaming doc exactly.
- Meta-store key templates use {chunk_id:08d} and {tx_index_id:08d}.
- UPPER_SNAKE_CASE TOML keys and section headers throughout.
- build_txhash_index's internal pipeline renamed from
  Phase 1 / 2 / 3 / 4 to Stage 1 / 2 / 3 / 4, preserving the rule
  that "Phase" refers only to the daemon's startup phases.

Grill-me Pass A (correctness) fixes:

- cleanup_txhash now uses delete_if_exists for .bin files — crash
  between .bin delete and :txhash flag delete is safe to retry.
- build_dag schedules build_txhash_index for tx indexes whose LAST
  chunk falls in the current range, and filters process_chunk
  scheduling to in-range chunks only. Covers the cross-iteration
  tx-index-completion case where iteration N ingests a tx index's
  first chunks and iteration N+1 ingests its last chunks: build runs
  in iteration N+1 using .bin files from both iterations.
- validate asserts source.tip() >= last_ledger_in_chunk(range_end_chunk_id)
  rather than calling source.covers() (not on the LedgerSource interface).
- build_txhash_index: added an invariant comment noting that every
  chunk's .bin is on disk when the task runs (DAG ordering guarantees
  cleanup can only run after build succeeds).

Grill-me Pass B (ambiguity) fixes:

- cpi defined inline in Geometry as shorthand for CHUNKS_PER_TXHASH_INDEX.
- Stray "txhash index" in prose replaced with "tx index" for
  consistency with the streaming doc's dominant form.
- "LFS" in process_chunk's key-properties bullet spelled as
  "ledger pack file (:lfs flag)".

File now 664 lines (down from 698 at HEAD). ID-leak grep (CC/D/B-INV)
and stale-term grep both zero-hit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backfill 18: Rework backfill design doc (UPPER_SNAKE_CASE, [LOGGING], subcommand)

2 participants