Conversation
- Add grpc_request_duration_blocks histogram metric - Add grpc_request_duration_transactions histogram metric - Add grpc_request_duration_logs histogram metric - Track per-segment request latency for blocks, transactions, logs - Histogram buckets: 1ms to 1 hour for comprehensive latency tracking Enables detailed performance analysis of bridge request patterns and helps identify slow segments or data types during sync operations.
- Track start/end block in SegmentWorkerState - Encode responsibility range as (u64, u64) in FlightData.app_metadata - Include range even when batch has 0 rows (empty blocks) - Wire up metrics for block/transaction/log request durations Bridge now tells phaser-query which blocks were checked, not just which blocks had data. This enables gap detection to distinguish between 'unchecked blocks' and 'checked but empty blocks'.
- Replace StreamReader with proper FlightData decoding - Use arrow_ipc::root_as_message to extract schema from first message - Use arrow_flight::utils::flight_data_to_arrow_batch per message - Preserve app_metadata access for responsibility range tracking - Remove unused subscribe() method (replaced by subscribe_with_metadata) - Increase max message size to 256MB for large batches Bug: Code incorrectly used StreamReader::try_new() which expected complete IPC streams, but FlightData contains single RecordBatch messages. This caused RangeOutOfBounds errors that blocked all syncing. Fix: Decode each FlightData as a single message using arrow_flight::utils, matching the FlightRecordBatchStream implementation pattern.
- Add update_responsibility_end() to accumulate max responsibility - Simplify finalize_current_file() to use full responsibility range - Remove per-file responsibility_start (tracked at writer level) - Always write responsibility_end from bridge, not data_end Parquet files now claim responsibility for checked-but-empty blocks, enabling gap scanner to correctly report completion status.
- Use subscribe_with_metadata() for blocks, transactions, logs - Extract responsibility range from each batch - Call writer.update_responsibility_end() as batches arrive - Add StreamingNotInitialized error for timeout diagnostics - Add is_streaming_enabled check in sync service Workers now accumulate responsibility ranges from bridge metadata and update parquet writers accordingly. Gap scanner can now distinguish between missing data and legitimately empty blocks.
- Use subscribe_with_metadata() in spawn_stream_processor - Extract responsibility ranges from batches (currently unused) - Add TODO for future responsibility tracking in live mode Maintains compatibility with new subscription API. Responsibility tracking in live mode will be implemented in future work.
- Add BatchMetadata struct with ResponsibilityRange field - Document extensibility pattern for future metadata fields - Add encode/decode methods with proper error handling - Update subscribe_with_metadata() to return BatchMetadata (not Optional) - Update all workers to use BatchMetadata instead of Option<(u64, u64)> - Update bridge to use encode_metadata() - Deprecate old tuple-based encode/decode methods Benefits: - Type safety: ResponsibilityRange is now required, not optional - Extensibility: Can add new fields (compression_ratio, split_index, etc.) - Clear contract: subscribe_with_metadata() guarantees metadata or fails - Better documentation: Comments explain usage and evolution The BatchMetadata struct is designed to grow over time. See lib.rs for guidelines on adding new fields while maintaining compatibility.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.