Skip to content

Various bugfixes#7

Merged
dwerner merged 7 commits intomainfrom
various-bugfixes
Oct 13, 2025
Merged

Various bugfixes#7
dwerner merged 7 commits intomainfrom
various-bugfixes

Conversation

@dwerner
Copy link
Copy Markdown
Collaborator

@dwerner dwerner commented Oct 13, 2025

No description provided.

Previously, empty parquet files (0 rows but valid metadata) were treated
as valid data coverage. This caused the scanner to report ranges as complete
when they actually contained no data.

Changes:
- Check total row count in read_block_range_from_parquet
- Return None for parquet files with 0 rows
- Add support for .empty marker files in find_missing_ranges
- Add support for .empty marker files in has_completed_segment
- Simplify filename parsing to only support new format

This allows distinguishing between "checked but empty" ranges (using
0-byte .empty files) and actual data files.
When a sync range contains no data, write a 0-byte .empty marker file
instead of a 3.2KB empty parquet file. This makes it clearer that the
range was checked but contained no data.

Changes:
- Add write_empty_marker() function to create .empty files
- Replace all write_empty_range() calls with write_empty_marker()
- Update all three sync functions (blocks, transactions, logs)

The .empty files are recognized by the data scanner and prevent
re-syncing of empty ranges.
Add detailed logging to track data flow from Erigon's BlockDataBackend
through the bridge to phaser-query.

Changes:
- Log each batch received from BlockDataBackend with count
- Log stream completion with total batch count
- Change log level from Debug to Info for visibility
- Add batch counting for blocks, transactions, and logs streams

This helps diagnose issues where streams complete without sending data.
Distinguish between live streaming and historical sync with separate
filename patterns and an is_live flag.

Changes:
- Add is_live flag to ParquetWriter
- Add with_config_and_mode constructor
- Use 'live_{type}_from_{start}_{timestamp}.tmp' pattern for live files
- Use '{type}_from_{start}_{timestamp}.parquet.tmp' for historical files
- Add write_empty_range method for marking empty ranges

This allows different handling of live vs historical data and makes
it easier to identify the source of parquet files.
Previously, progress tracking relied on in-memory worker state which
could become stale or inaccurate. Now we scan the actual parquet files
on disk to determine what's truly completed.

Changes:
- Use DataScanner to analyze sync range progress
- Calculate blocks_synced from complete segments on disk
- Find max_completed_block from actual complete segments
- Remove in-memory aggregation of worker progress

This provides accurate progress even after restarts and handles
cases where workers report completion but files aren't written.
@dwerner dwerner merged commit 6b44e93 into main Oct 13, 2025
@dwerner dwerner deleted the various-bugfixes branch October 13, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant