Merged
Conversation
Previously, empty parquet files (0 rows but valid metadata) were treated as valid data coverage. This caused the scanner to report ranges as complete when they actually contained no data. Changes: - Check total row count in read_block_range_from_parquet - Return None for parquet files with 0 rows - Add support for .empty marker files in find_missing_ranges - Add support for .empty marker files in has_completed_segment - Simplify filename parsing to only support new format This allows distinguishing between "checked but empty" ranges (using 0-byte .empty files) and actual data files.
When a sync range contains no data, write a 0-byte .empty marker file instead of a 3.2KB empty parquet file. This makes it clearer that the range was checked but contained no data. Changes: - Add write_empty_marker() function to create .empty files - Replace all write_empty_range() calls with write_empty_marker() - Update all three sync functions (blocks, transactions, logs) The .empty files are recognized by the data scanner and prevent re-syncing of empty ranges.
Add detailed logging to track data flow from Erigon's BlockDataBackend through the bridge to phaser-query. Changes: - Log each batch received from BlockDataBackend with count - Log stream completion with total batch count - Change log level from Debug to Info for visibility - Add batch counting for blocks, transactions, and logs streams This helps diagnose issues where streams complete without sending data.
Distinguish between live streaming and historical sync with separate
filename patterns and an is_live flag.
Changes:
- Add is_live flag to ParquetWriter
- Add with_config_and_mode constructor
- Use 'live_{type}_from_{start}_{timestamp}.tmp' pattern for live files
- Use '{type}_from_{start}_{timestamp}.parquet.tmp' for historical files
- Add write_empty_range method for marking empty ranges
This allows different handling of live vs historical data and makes
it easier to identify the source of parquet files.
Previously, progress tracking relied on in-memory worker state which could become stale or inaccurate. Now we scan the actual parquet files on disk to determine what's truly completed. Changes: - Use DataScanner to analyze sync range progress - Calculate blocks_synced from complete segments on disk - Find max_completed_block from actual complete segments - Remove in-memory aggregation of worker progress This provides accurate progress even after restarts and handles cases where workers report completion but files aren't written.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.