-
Couldn't load subscription status.
- Fork 5
feat: Add streaming Conway snapshot parser with callback interface #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement a complete streaming parser for Cardano Conway-era snapshots that parses the NewEpochState CBOR structure without loading everything into memory. Key features: - Callback-based streaming architecture for UTXOs, pools, accounts, and DReps - Parses 11.2M UTXOs at ~2M UTXOs/second - Extracts 3,095 stake pools with full metadata (pledge, cost, margin, relays) - Parses 1.41M stake accounts with rewards and delegations (SPO & DRep) - Extracts 278 DReps with deposits and anchor metadata - Uses minicbor Decode trait for type-safe CBOR parsing Components added: - streaming_snapshot.rs: Main streaming parser with callback traits - pool_params.rs: Stake pool parameter types with CBOR decoding - account.rs: Stake account types (rewards, delegations) - hash.rs: Generic hash types for pool IDs, VRF keys, addresses - AccountState/StakeAddressState: Shared types for account state - test_streaming_parser.rs: Example showing callback usage - Makefile: snap-test-streaming target for testing The parser navigates the Conway ledger structure: NewEpochState -> EpochState -> LedgerState -> CertState/UTxOState - VState (DReps), PState (pools), DState (accounts), UTxOs Performance tested on epoch 507 mainnet snapshot (506MB CBOR). Refs: #snapshot-parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a complete streaming Conway snapshot parser with callback interface for Cardano blockchain snapshots. The parser enables memory-efficient processing of large snapshot files through callback-based streaming architecture, designed specifically for the bootstrap process to distribute state via message bus.
- Implements streaming parser with per-UTXO callbacks and bulk callbacks for pools, accounts, and DReps
- Adds CBOR decoding types and validation utilities for snapshot manifests
- Provides example implementation showing callback usage patterns
Reviewed Changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/snapshot/streaming_snapshot.rs | Main streaming parser with callback traits and CBOR type definitions |
| common/src/snapshot/pool_params.rs | Stake pool parameter types with CBOR encoding/decoding |
| common/src/snapshot/parser.rs | Snapshot manifest parsing and validation utilities |
| common/src/snapshot/mod.rs | Module organization and public API exports |
| common/src/snapshot/error.rs | Error types for snapshot parsing failures |
| common/src/hash.rs | Generic hash types with CBOR support for pool IDs and addresses |
| common/src/account.rs | Account and DRep types for ledger state representation |
| tests/fixtures/ | Test manifest and snapshot files for validation |
| docs/streaming-snapshot-parser.md | Comprehensive documentation of parser architecture |
| common/examples/test_streaming_parser.rs | Example showing callback implementation patterns |
| Makefile | Build targets for testing snapshot parsing functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| use std::collections::{HashMap, HashSet}; | ||
| use std::fmt::{Display, Formatter}; | ||
| use std::ops::{AddAssign, Neg}; | ||
| use std::ops::Neg; |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of AddAssign import suggests that Value::add_assign implementation was removed, but this could break existing code that depends on the += operator for Value types.
| #[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, serde::Serialize, serde::Deserialize)] | ||
| pub struct AssetName { | ||
| #[n(0)] | ||
| len: u8, | ||
| #[n(1)] | ||
| bytes: [u8; 32], |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of minicbor traits from AssetName could break CBOR serialization/deserialization functionality that may be used elsewhere in the codebase.
| #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] | ||
| pub struct NativeAsset { | ||
| #[n(0)] | ||
| pub name: AssetName, | ||
| #[n(1)] | ||
| pub amount: u64, |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of minicbor encoding/decoding traits from NativeAsset could break CBOR functionality that may be required for blockchain data processing.
| Debug, Default, Clone, Copy, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize, | ||
| )] | ||
| pub struct TxIdentifier(#[n(0)] [u8; 6]); | ||
| pub struct TxIdentifier([u8; 6]); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of minicbor traits from TxIdentifier could break CBOR serialization needed for transaction processing.
|
|
||
| #[inline] | ||
| pub fn entry(&mut self, stake_key: KeyHash) -> Entry<KeyHash, StakeAddressState> { | ||
| pub fn entry(&'_ mut self, stake_key: KeyHash) -> Entry<'_, KeyHash, StakeAddressState> { |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Explicit lifetime annotations are unnecessary here as the compiler can infer them. The &'_ syntax adds visual noise without providing clarity.
| StakeCredential::AddrKeyhash(hash) => format!("drep_{}", hash), | ||
| StakeCredential::ScriptHash(hash) => format!("drep_script_{}", hash), |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRep IDs should use proper Bech32 encoding with 'drep' prefix instead of custom hex string formats to maintain compatibility with Cardano standards.
|
|
||
| // Element 0: Address (bytes) | ||
| let address_bytes = dec.bytes().context("Failed to parse address bytes")?; | ||
| let address = hex::encode(address_bytes); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses should be Bech32-encoded instead of hex-encoded to maintain compatibility with standard Cardano address formats.
| // Serialize implementation requires pallas_addresses which is not currently a dependency | ||
| // TODO: Add pallas_addresses or implement Bech32 encoding differently | ||
| /* |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large commented-out code block (103 lines) should be removed. If the implementation is needed later, it can be recovered from version control.
| pub struct NextEpochs {} | ||
|
|
||
| #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] | ||
| pub struct PreviousEpochs { | ||
| pub epochs: Vec<EpochActivityMessage>, | ||
| } | ||
| pub struct PreviousEpochs {} |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty structs replaced the previous implementation with epochs field, which removes functionality without clear justification.
| pub fn keyhash(key: &[u8]) -> KeyHash { | ||
| let mut hasher = Blake2b::<U32>::new(); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blake2b with 32-byte output (U32) differs from the original Blake2b-224 (28 bytes) implementation, which could break hash compatibility with existing systems.
Implement a complete streaming parser for Cardano Conway-era snapshots that parses the NewEpochState CBOR structure without loading everything into memory.
Key features:
Components added:
The parser navigates the Conway ledger structure:
NewEpochState -> EpochState -> LedgerState -> CertState/UTxOState
Performance tested on epoch 507 mainnet snapshot (506MB CBOR).
Refs: #snapshot-parser