feat(dataset-store): support loading datasets by name and version#796
Merged
feat(dataset-store): support loading datasets by name and version#796
Conversation
This was referenced Sep 24, 2025
6631ee2 to
811c0f2
Compare
3b5fb8a to
e0d5c02
Compare
Refactored the dataset store to load datasets by name with optional version, making version management easier and cleaning up the manifest handling. - Introduced `ManifestsStore` module for centralized manifest management with `MetadataDb` integration - Added versioned dataset loading with automatic latest version resolution when version not specified - Removed `ManifestValue` abstraction in favor of direct strongly-typed manifest deserialization - Extracted provider configurations into dedicated typed structs (`ProviderConfig`) for each extractor - Added `init()` method to sync object store manifests with metadata database on startup Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
e0d5c02 to
de2ec76
Compare
fubhy
approved these changes
Sep 30, 2025
Contributor
fubhy
left a comment
There was a problem hiding this comment.
Can confirm that the typescript client side works with this change.
This was referenced Sep 30, 2025
13 tasks
Theodus
approved these changes
Sep 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context and Rationale
This is an important rewrite of the
DatasetStorestruct.New ManifestsStore Module: Created a dedicated module to handle all manifest operations. It communicates with both the object store and the metadata database to maintain synchronization.
Dataset Store Initialization: When the
init()method is called, the dataset store lists all manifest files in the manifest store and registers them in the metadata database. The dataset name and version are extracted from the filename. If the filename doesn't include a version suffix, it defaults tov0.0.0. If not explicitly initialized, the dataset store and manifest store are lazily initialized on first use.Loading Datasets by Version: One can now load datasets with or without specifying a version. If a version is not provided, it automatically resolves the latest version from all the dataset versions registered in the metadata DB.
Direct Type Deserialization: Got rid of the
ManifestValuewrapper that used to sit between TOML/JSON and our actual types. Now we deserialize manifests directly into the structs we need, which is cleaner and less prone to errors.Typed Provider Configs: Provider configurations are no longer generic TOML blobs. Each extractor now has its own config struct (
EvmRpcProviderConfig,FirehoseProviderConfig,SubstreamsProviderConfig,EthBeaconProviderConfig) with clearly defined fields.Better Error Messages: Instead of generic errors, we now have specific error types for each operation (
RegisterManifestError,GetDatasetError,GetLatestVersionError,IsRegisteredError). This makes debugging easier.Commit message
feat(dataset-store): support loading datasets by name and version
Refactored the dataset store to load datasets by name with optional version, making version management easier and cleaning up the manifest handling.
ManifestsStoremodule for centralized manifest management withMetadataDbintegrationManifestValueabstraction in favor of direct strongly-typed manifest deserializationProviderConfig) for each extractorinit()method to sync object store manifests with metadata database on startup