Skip to content

Dont lose progress on stream failure#17

Merged
dwerner merged 5 commits intomainfrom
dont-lose-progress-on-stream-failure
Oct 28, 2025
Merged

Dont lose progress on stream failure#17
dwerner merged 5 commits intomainfrom
dont-lose-progress-on-stream-failure

Conversation

@dwerner
Copy link
Copy Markdown
Collaborator

@dwerner dwerner commented Oct 28, 2025

No description provided.

…eaming files

Added is_live flag to PhaserMetadata (version 2) to distinguish between:
- Historical sync workers: Fetch specific block ranges (is_live = false)
- Live streaming workers: Subscribe to current head (is_live = true)

Changes:
- Bump PhaserMetadata version from 1 to 2
- Add is_live boolean field with #[serde(default)] for backward compatibility
- Version 1 files remain readable (is_live defaults to false)
…andling

ParquetWriter now tracks two distinct ranges:
- Segment range: Logical 500K segment boundaries (used for filename)
- Responsibility range: Actual range this worker is responsible for (used for metadata)

This allows files to be grouped by segment in filenames while metadata accurately
reflects the capped responsibility range when workers are stopped at the live boundary.

Changes:
- Split requested_range into segment_range and responsibility_range
- Added set_ranges() method to set both independently
- Added validation to ensure end >= start for both ranges
- Filenames use segment range, metadata uses responsibility range
- Logs show both ranges for debugging
…nges

Workers now cap their sync range at the historical/live boundary and correctly
report their responsibility range in metadata.

Example: Worker syncing segment 23500000-23999999 capped at boundary 23550000:
- Filename: blocks_from_23500000_to_23999999_0.parquet (segment grouping)
- Metadata segment: 23500000-23999999 (logical segment)
- Metadata responsibility: 23500000-23549999 (actual coverage)

Changes:
- SyncWorker caps to_block at boundary-1 when boundary is present
- Workers calculate segment boundaries independently of capping
- Workers pass both segment range and responsibility range to ParquetWriter
- SyncService passes historical_boundary to workers
- DataScanner handles both historical and live_ prefixed files
- Removed unused _historical_boundary storage field
Updated parquet-files.md to explain:
- live_ filename prefix indicates file written by live streaming worker
- is_live metadata flag tracks worker type (historical vs live)
- PhaserMetadata version 2 adds is_live field
@dwerner dwerner merged commit d3acb50 into main Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant