Skip to content

feat(dataset-store): support loading datasets by name and version#796

Merged
fubhy merged 2 commits intomainfrom
lnsd/feat-dataset-store-by-version
Sep 30, 2025
Merged

feat(dataset-store): support loading datasets by name and version#796
fubhy merged 2 commits intomainfrom
lnsd/feat-dataset-store-by-version

Conversation

@LNSD
Copy link
Contributor

@LNSD LNSD commented Sep 24, 2025

Context and Rationale

This is an important rewrite of the DatasetStore struct.

  • New ManifestsStore Module: Created a dedicated module to handle all manifest operations. It communicates with both the object store and the metadata database to maintain synchronization.

  • Dataset Store Initialization: When the init() method is called, the dataset store lists all manifest files in the manifest store and registers them in the metadata database. The dataset name and version are extracted from the filename. If the filename doesn't include a version suffix, it defaults to v0.0.0. If not explicitly initialized, the dataset store and manifest store are lazily initialized on first use.

  • Loading Datasets by Version: One can now load datasets with or without specifying a version. If a version is not provided, it automatically resolves the latest version from all the dataset versions registered in the metadata DB.

  • Direct Type Deserialization: Got rid of the ManifestValue wrapper that used to sit between TOML/JSON and our actual types. Now we deserialize manifests directly into the structs we need, which is cleaner and less prone to errors.

  • Typed Provider Configs: Provider configurations are no longer generic TOML blobs. Each extractor now has its own config struct (EvmRpcProviderConfig, FirehoseProviderConfig, SubstreamsProviderConfig, EthBeaconProviderConfig) with clearly defined fields.

  • Better Error Messages: Instead of generic errors, we now have specific error types for each operation (RegisterManifestError, GetDatasetError, GetLatestVersionError, IsRegisteredError). This makes debugging easier.

Commit message

feat(dataset-store): support loading datasets by name and version

Refactored the dataset store to load datasets by name with optional version, making version management easier and cleaning up the manifest handling.

  • Introduced ManifestsStore module for centralized manifest management with MetadataDb integration
  • Added versioned dataset loading with automatic latest version resolution when version not specified
  • Removed ManifestValue abstraction in favor of direct strongly-typed manifest deserialization
  • Extracted provider configurations into dedicated typed structs (ProviderConfig) for each extractor
  • Added init() method to sync object store manifests with metadata database on startup

@LNSD LNSD self-assigned this Sep 24, 2025
@LNSD LNSD marked this pull request as draft September 24, 2025 17:00
@LNSD LNSD force-pushed the lnsd/feat-dataset-store-by-version branch 25 times, most recently from 6631ee2 to 811c0f2 Compare September 27, 2025 09:36
@LNSD LNSD force-pushed the lnsd/feat-dataset-store-by-version branch 18 times, most recently from 3b5fb8a to e0d5c02 Compare September 29, 2025 21:20
Refactored the dataset store to load datasets by name with optional version,
making version management easier and cleaning up the manifest handling.

- Introduced `ManifestsStore` module for centralized manifest management with `MetadataDb` integration
- Added versioned dataset loading with automatic latest version resolution when version not specified
- Removed `ManifestValue` abstraction in favor of direct strongly-typed manifest deserialization
- Extracted provider configurations into dedicated typed structs (`ProviderConfig`) for each extractor
- Added `init()` method to sync object store manifests with metadata database on startup

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
@LNSD LNSD force-pushed the lnsd/feat-dataset-store-by-version branch from e0d5c02 to de2ec76 Compare September 29, 2025 21:26
@LNSD LNSD requested a review from leoyvens September 29, 2025 21:28
@LNSD LNSD marked this pull request as ready for review September 29, 2025 21:29
Copy link
Contributor

@fubhy fubhy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm that the typescript client side works with this change.

@fubhy fubhy merged commit d25b061 into main Sep 30, 2025
7 checks passed
@fubhy fubhy deleted the lnsd/feat-dataset-store-by-version branch September 30, 2025 16:17
@LNSD LNSD mentioned this pull request Sep 30, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants