Skip to content

Conversation

@buddhisthead
Copy link
Collaborator

Implement a complete streaming parser for Cardano Conway-era snapshots that parses the NewEpochState CBOR structure without loading everything into memory.

Key features:

  • Callback-based streaming architecture for UTXOs, pools, accounts, and DReps
  • Parses 11.2M UTXOs at ~2M UTXOs/second
  • Extracts 3,095 stake pools with full metadata (pledge, cost, margin, relays)
  • Parses 1.41M stake accounts with rewards and delegations (SPO & DRep)
  • Extracts 278 DReps with deposits and anchor metadata
  • Uses minicbor Decode trait for type-safe CBOR parsing

Components added:

  • streaming_snapshot.rs: Main streaming parser with callback traits
  • pool_params.rs: Stake pool parameter types with CBOR decoding
  • account.rs: Stake account types (rewards, delegations)
  • hash.rs: Generic hash types for pool IDs, VRF keys, addresses
  • AccountState/StakeAddressState: Shared types for account state
  • test_streaming_parser.rs: Example showing callback usage
  • Makefile: snap-test-streaming target for testing

The parser navigates the Conway ledger structure:
NewEpochState -> EpochState -> LedgerState -> CertState/UTxOState

  • VState (DReps), PState (pools), DState (accounts), UTxOs

Performance tested on epoch 507 mainnet snapshot (506MB CBOR).

Refs: #snapshot-parser

Implement a complete streaming parser for Cardano Conway-era snapshots that
parses the NewEpochState CBOR structure without loading everything into memory.

Key features:
- Callback-based streaming architecture for UTXOs, pools, accounts, and DReps
- Parses 11.2M UTXOs at ~2M UTXOs/second
- Extracts 3,095 stake pools with full metadata (pledge, cost, margin, relays)
- Parses 1.41M stake accounts with rewards and delegations (SPO & DRep)
- Extracts 278 DReps with deposits and anchor metadata
- Uses minicbor Decode trait for type-safe CBOR parsing

Components added:
- streaming_snapshot.rs: Main streaming parser with callback traits
- pool_params.rs: Stake pool parameter types with CBOR decoding
- account.rs: Stake account types (rewards, delegations)
- hash.rs: Generic hash types for pool IDs, VRF keys, addresses
- AccountState/StakeAddressState: Shared types for account state
- test_streaming_parser.rs: Example showing callback usage
- Makefile: snap-test-streaming target for testing

The parser navigates the Conway ledger structure:
  NewEpochState -> EpochState -> LedgerState -> CertState/UTxOState
  - VState (DReps), PState (pools), DState (accounts), UTxOs

Performance tested on epoch 507 mainnet snapshot (506MB CBOR).

Refs: #snapshot-parser
Copilot AI review requested due to automatic review settings October 17, 2025 17:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a complete streaming Conway snapshot parser with callback interface for Cardano blockchain snapshots. The parser enables memory-efficient processing of large snapshot files through callback-based streaming architecture, designed specifically for the bootstrap process to distribute state via message bus.

  • Implements streaming parser with per-UTXO callbacks and bulk callbacks for pools, accounts, and DReps
  • Adds CBOR decoding types and validation utilities for snapshot manifests
  • Provides example implementation showing callback usage patterns

Reviewed Changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
common/src/snapshot/streaming_snapshot.rs Main streaming parser with callback traits and CBOR type definitions
common/src/snapshot/pool_params.rs Stake pool parameter types with CBOR encoding/decoding
common/src/snapshot/parser.rs Snapshot manifest parsing and validation utilities
common/src/snapshot/mod.rs Module organization and public API exports
common/src/snapshot/error.rs Error types for snapshot parsing failures
common/src/hash.rs Generic hash types with CBOR support for pool IDs and addresses
common/src/account.rs Account and DRep types for ledger state representation
tests/fixtures/ Test manifest and snapshot files for validation
docs/streaming-snapshot-parser.md Comprehensive documentation of parser architecture
common/examples/test_streaming_parser.rs Example showing callback implementation patterns
Makefile Build targets for testing snapshot parsing functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

use std::collections::{HashMap, HashSet};
use std::fmt::{Display, Formatter};
use std::ops::{AddAssign, Neg};
use std::ops::Neg;
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of AddAssign import suggests that Value::add_assign implementation was removed, but this could break existing code that depends on the += operator for Value types.

Copilot uses AI. Check for mistakes.
Comment on lines +170 to 173
#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash, serde::Serialize, serde::Deserialize)]
pub struct AssetName {
#[n(0)]
len: u8,
#[n(1)]
bytes: [u8; 32],
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of minicbor traits from AssetName could break CBOR serialization/deserialization functionality that may be used elsewhere in the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +198 to 201
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct NativeAsset {
#[n(0)]
pub name: AssetName,
#[n(1)]
pub amount: u64,
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of minicbor encoding/decoding traits from NativeAsset could break CBOR functionality that may be required for blockchain data processing.

Copilot uses AI. Check for mistakes.
Debug, Default, Clone, Copy, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize,
)]
pub struct TxIdentifier(#[n(0)] [u8; 6]);
pub struct TxIdentifier([u8; 6]);
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of minicbor traits from TxIdentifier could break CBOR serialization needed for transaction processing.

Copilot uses AI. Check for mistakes.

#[inline]
pub fn entry(&mut self, stake_key: KeyHash) -> Entry<KeyHash, StakeAddressState> {
pub fn entry(&'_ mut self, stake_key: KeyHash) -> Entry<'_, KeyHash, StakeAddressState> {
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Explicit lifetime annotations are unnecessary here as the compiler can infer them. The &'_ syntax adds visual noise without providing clarity.

Copilot uses AI. Check for mistakes.
Comment on lines +1038 to +1039
StakeCredential::AddrKeyhash(hash) => format!("drep_{}", hash),
StakeCredential::ScriptHash(hash) => format!("drep_script_{}", hash),
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRep IDs should use proper Bech32 encoding with 'drep' prefix instead of custom hex string formats to maintain compatibility with Cardano standards.

Copilot uses AI. Check for mistakes.

// Element 0: Address (bytes)
let address_bytes = dec.bytes().context("Failed to parse address bytes")?;
let address = hex::encode(address_bytes);
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses should be Bech32-encoded instead of hex-encoded to maintain compatibility with standard Cardano address formats.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +72
// Serialize implementation requires pallas_addresses which is not currently a dependency
// TODO: Add pallas_addresses or implement Bech32 encoding differently
/*
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large commented-out code block (103 lines) should be removed. If the implementation is needed later, it can be recovered from version control.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +54
pub struct NextEpochs {}

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct PreviousEpochs {
pub epochs: Vec<EpochActivityMessage>,
}
pub struct PreviousEpochs {}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty structs replaced the previous implementation with epochs field, which removes functionality without clear justification.

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +8
pub fn keyhash(key: &[u8]) -> KeyHash {
let mut hasher = Blake2b::<U32>::new();
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blake2b with 32-byte output (U32) differs from the original Blake2b-224 (28 bytes) implementation, which could break hash compatibility with existing systems.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants