-
Notifications
You must be signed in to change notification settings - Fork 5
feat: snapshot downloader for bootstrapper #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a snapshot bootstrapper module that enables Acropolis to initialize from NewEpochState CBOR snapshot files. The implementation adds an alternative bootstrap path alongside the existing genesis bootstrapper, allowing users to choose between starting from genesis or from a pre-built snapshot. The snapshot bootstrapper downloads gzip-compressed CBOR files from configured URLs, decompresses them on-the-fly, and parses them to reconstruct blockchain state.
Key changes:
- New
SnapshotBootstrappermodule with download orchestration and gzip decompression - Dynamic bootstrap module selection based on
startup.methodconfig ("genesis" or "snapshot") - Configuration structure for network-specific snapshots with metadata in JSON files
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| processes/omnibus/src/main.rs | Adds dynamic bootstrap module registration based on startup.method config |
| processes/omnibus/omnibus.toml | Reorganizes config structure and adds snapshot bootstrapper configuration section |
| processes/omnibus/Cargo.toml | Adds dependency on the new snapshot_bootstrapper module |
| modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs | Implements core snapshot download, decompression, and parsing logic with error handling |
| modules/snapshot_bootstrapper/data/mainnet/snapshots.json | Defines metadata for available mainnet snapshots (epochs 507-509) |
| modules/snapshot_bootstrapper/data/mainnet/config.json | Specifies which snapshots to load and associated block points |
| modules/snapshot_bootstrapper/Cargo.toml | Declares dependencies for HTTP client, compression, and JSON parsing |
| common/src/snapshot/streaming_snapshot.rs | Adjusts logging format for consistency (removes emojis and extra indentation) |
| Cargo.lock | Updates lock file with new dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d streamlined snapshot processing
…mproved error handling
…quence improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ation in snapshot bootstrapper
…ion' into lowhung/395-bootstrap-orchestration # Conflicts: # modules/snapshot_bootstrapper/src/downloader.rs
…ion' into lowhung/395-bootstrap-orchestration # Conflicts: # modules/snapshot_bootstrapper/src/downloader.rs
…r and configuration
…ion' into lowhung/395-bootstrap-orchestration
…ion' into lowhung/395-bootstrap-orchestration
…ion' into lowhung/395-bootstrap-orchestration
…ion' into lowhung/395-bootstrap-orchestration
…ion' into lowhung/395-bootstrap-orchestration
| slot: 0, | ||
| number: 0, | ||
| hash: BlockHash::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not know the latest block produced from that restored state? I know the chainsync code needs that to work right, and I suspect other components would as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I've kept this as a placeholder function for now, but it won't be difficult to implement this properly.
For era we could hardcode it within our pre-existing json config files, or use an era history summary, which keeps track of stability window, each era's start/end bounds and epoch parameters (epoch_size_slots, slot_length). With that you can derive the timestamp, epoch_slot_no etc. We have slot and block_hash as we use those to name our snapshot files and are specified in our config json files.
Amaru constructs an era history summary as a history.{slot}.{hash}.json file, and uses this during their bootstrap process. They actually have some nifty commands, one of which converts ledger state from a DebugEpochState cbor dump (from a cardano-node) into said history file.
But yeh, given a slot, hash, and era history, we can derive everything except block number. Unless I'm missing something 🤔
For mainnet/preprod/preview, era history is fairly static, and can be hardcoded if we wanted to. For testnets, we'd need to include the history file alongside snapshots.
| pub fn try_load(config: &Config) -> Result<Self> { | ||
| Ok(Self { | ||
| network: config.get_string("network").unwrap_or_else(|_| "mainnet".to_string()), | ||
| data_dir: config.get_string("data-dir").unwrap_or_else(|_| "./data".to_string()), | ||
| startup_topic: config | ||
| .get_string("startup-topic") | ||
| .unwrap_or(DEFAULT_STARTUP_TOPIC.to_string()), | ||
| snapshot_topic: config | ||
| .get_string("snapshot-topic") | ||
| .unwrap_or(DEFAULT_SNAPSHOT_TOPIC.to_string()), | ||
| bootstrapped_topic: config | ||
| .get_string("bootstrapped-subscribe-topic") | ||
| .unwrap_or(DEFAULT_BOOTSTRAPPED_TOPIC.to_string()), | ||
| completion_topic: config | ||
| .get_string("completion-topic") | ||
| .unwrap_or(DEFAULT_COMPLETION_TOPIC.to_string()), | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we follow the pattern from PeerNetworkInterface, where default config values are stored in a config.default.toml file? Sorry to keep bringing it up, but I think it's really helpful to have a conventional place where you can see a module's default configuration (plus comments about what that configuration does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, done here 15e382f
| for snapshot_meta in snapshots { | ||
| let file_path = snapshot_meta.file_path(&self.network_dir); | ||
| self.download_single(&snapshot_meta.url, &file_path).await?; | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big are these files? Would it make sense to parallelize the downloads with try_join_all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For epochs 507, 508 and 509 they are 1638 MB, 1644 MB, 1645 MB respectively, and it takes ~2-2.5 mins total to download all 3 of them. With that in mind though, I'm fairly confident we will be able to use a single snapshot (based on our conversation with the ledger team) -- it doesn't hurt to still support an array of snapshots to process IMO.
…ion' into lowhung/395-bootstrap-orchestration # Conflicts: # modules/snapshot_bootstrapper/config.default.toml
| [dev-dependencies] | ||
| tempfile = "3.23" | ||
| config = { workspace = true } | ||
| caryatid_process = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo shear chore, these are only being used in tests within common/src/resolver.rs
| use tracing_subscriber::prelude::*; | ||
| use tracing_subscriber::{filter, fmt, EnvFilter, Registry}; | ||
|
|
||
| const STARTUP_METHOD_MITHRIL: &str = "mithril"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this. I could use a similiar config.default.toml file like we do in the snapshot_bootstrapper/src/configuration.
# Conflicts: # common/Cargo.toml
Description
Implements downloading of snapshots from configured URLs within Snapshot Boostrapper, along with conditional registration of either Snapshot Bootstrapper or Mithril Snapshot Fetcher.
Related Issue(s)
#395
How was this tested?
Checklist
Impact / Side effects
No breaking changes. This is an additive change.
New optional config:
Dependencies added:
async-compression,reqwest,tokio-util,futures-utilReviewer notes / Areas to focus
modules/snapshot_bootstrapper/bootstrapper.rs: Core bootstrap logic / orchestrationmodules/snapshot_bootstrapper/downloader.rs: Download logicmodules/snapshot_bootstrapper/publisher.rs: Publishing logicprocesses/omnibus/main.rs: Conditional registration of Mithril and SnapshotHow to run this
processes/omnibus/omnibus.tomlSetstartup.method = 'snapshot'make runat top-levelYou can see the bootstrap logs here