Skip to content

Implement parse and fetch logic for local db manifest#2273

Merged
findolor merged 8 commits intomainfrom
local-db-manifest-new
Nov 3, 2025
Merged

Implement parse and fetch logic for local db manifest#2273
findolor merged 8 commits intomainfrom
local-db-manifest-new

Conversation

@findolor
Copy link
Copy Markdown
Collaborator

@findolor findolor commented Oct 27, 2025

Motivation

See issue:

Solution

  • Create utils.rs under settings crate for common parse logic
  • Add parsing logic for local db manifest
  • Add fetch logic for manifest file
  • Update tests

Checks

By submitting this for review, I'm confirming I've done the following:

  • made this PR as small as possible
  • unit-tested any new functionality
  • linked any relevant issues or PRs
  • included screenshots (if this involves a front-end change)

Summary by CodeRabbit

  • New Features

    • Parse and validate YAML-based local database manifests, including lookup helpers.
    • Fetch and parse remote manifests over HTTP with clear error reporting.
  • Refactor

    • Introduced shared parsing utilities to centralize numeric and URL validation used across manifest parsing.
  • Tests

    • Added extensive unit tests covering parsing, validation, error cases, and remote fetch behavior.

@findolor findolor added this to the LocalDB Implementation milestone Oct 27, 2025
@findolor findolor requested review from 0xgleb and hardyjosh October 27, 2025 10:48
@findolor findolor self-assigned this Oct 27, 2025
@findolor findolor linked an issue Oct 27, 2025 that may be closed by this pull request
@findolor
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 27, 2025

Walkthrough

Adds YAML manifest parsing and related data types (local manifest), remote manifest fetching over HTTP, shared parsing utilities, and exposes these modules from the settings crate. Refactors local_db_sync to use the new utilities.

Changes

Cohort / File(s) Summary
Module Declarations
crates/settings/src/lib.rs, crates/settings/src/remote/mod.rs
Export new public modules: local_db_manifest, utils in lib.rs; manifest in remote/mod.rs.
Local Manifest Parsing
crates/settings/src/local_db_manifest.rs
New module implementing YAML-driven LocalDbManifest, ManifestNetwork, ManifestOrderbook; parsing/validation helpers, find() method, MANIFEST_VERSION and helpers; comprehensive unit tests and detailed YamlError reporting.
Remote Manifest Fetching
crates/settings/src/remote/manifest.rs
New async fetch(url: Url) -> Result<LocalDbManifest, FetchManifestError> that GETs a URL with reqwest, parses YAML, converts to LocalDbManifest; adds FetchManifestError enum and tests (including mocked HTTP).
Parsing Utilities
crates/settings/src/utils.rs
New helpers parse_positive_u32, parse_positive_u64, parse_url that validate inputs and return structured YamlError on failure; includes unit tests.
Local DB Sync Refactor
crates/settings/src/local_db_sync.rs
Replaced internal parse_positive_u32/parse_positive_u64 implementations with imports from crate::utils; updated call sites accordingly.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant RemoteManifest as remote::manifest::fetch()
    participant HTTP as Reqwest
    participant YAML as load_yaml
    participant Parser as local_db_manifest::parse_manifest_doc
    participant LocalDb as LocalDbManifest

    Client->>RemoteManifest: fetch(url)
    RemoteManifest->>HTTP: GET url
    HTTP-->>RemoteManifest: response text
    RemoteManifest->>YAML: load_yaml(response)
    YAML-->>RemoteManifest: StrictYaml doc
    RemoteManifest->>Parser: parse_manifest_doc(doc)
    Parser->>Parser: validate header, networks, orderbooks
    Parser-->>RemoteManifest: LocalDbManifest
    RemoteManifest-->>Client: Result<LocalDbManifest, FetchManifestError>
    
    Note over LocalDb: Consumer may call .find(chain_id, address)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to review closely:
    • Validation/error construction in local_db_manifest.rs (YamlError locations and variants)
    • Error conversions and async handling in remote/manifest.rs
    • Correctness and edge cases in utils.rs (zero/negative parsing, URL parsing)
    • Behavior-preservation in local_db_sync.rs after replacing local helpers with utils functions

Possibly related issues

Possibly related PRs

Suggested labels

rust

Suggested reviewers

  • 0xgleb
  • hardyjosh

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Implement parse and fetch logic for local db manifest" directly and accurately describes the primary changes in the changeset. The title clearly refers to the two main deliverables: parsing logic is implemented in the new local_db_manifest.rs module with parse_manifest_doc() and related parsing functions, while fetch logic is implemented in remote/manifest.rs with the async fetch() function. The title is concise, uses clear language without vague terms, and provides sufficient specificity that a teammate reviewing the git history would understand the core purpose of the changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch local-db-manifest-new

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b683159 and 4c34623.

📒 Files selected for processing (2)
  • crates/settings/src/local_db_manifest.rs (1 hunks)
  • crates/settings/src/remote/manifest.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
crates/*/{src,tests,benches,examples}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Rust code must pass clippy with all warnings denied (cargo clippy --workspace --all-targets --all-features -D warnings)

Files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
crates/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Rust code lives in the workspace under crates/* (e.g., cli, common, bindings, js_api, quote, subgraph, settings, math, integration_tests)

Files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Format Rust with nix develop -c cargo fmt --all
Lint Rust with nix develop -c rainix-rs-static (preconfigured flags)
Use snake_case for Rust crates/modules and PascalCase for types

Files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
🧠 Learnings (27)
📓 Common learnings
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:25:47.873Z
Learning: After each new commit review for rainlanguage/rain.orderbook repository, generate a PR size report in a separate comment with the format: "TOTAL=number ADDITIONS=number DELETIONS=number". This report should exclude irrelevant files such as lock files (e.g., package-lock.json, cargo.lock).
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:14:22.646Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, I must consistently report the PR size after EVERY new commit in the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number" without any additional text or formatting blocks. The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock) and can be included either as part of the review or as a separate comment.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:14:22.646Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, I must consistently report the PR size after EVERY new commit in the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number" without any formatting blocks. The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock) and can be included either as part of the review or as a separate comment.
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:25:47.873Z
Learning: After each new commit review for rainlanguage/rain.orderbook repository, generate a PR size report in a separate comment with the exact format: "TOTAL=number\nADDITIONS=number\nDELETIONS=number". This report should exclude irrelevant files such as lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:26:31.566Z
Learning: For PR #1884 and future PRs in rainlanguage/rain.orderbook repository, provide PR size reports in a separate comment with ONLY the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number", without any additional text or explanation. Reports should exclude irrelevant files such as lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:46:08.530Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, the PR size report should be the only content in the comment - no text before it, no text after it, no formatting blocks, just the raw report in the exact format: "TOTAL=number\nADDITIONS=number\nDELETIONS=number". The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).
📚 Learning: 2025-05-19T12:25:02.658Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1858
File: crates/subgraph/src/cynic_client.rs:27-29
Timestamp: 2025-05-19T12:25:02.658Z
Learning: In the Rain Orderbook codebase, when passing a URL from `self.get_base_url()` to `reqwest::Client::post()`, the `.clone()` is necessary because `get_base_url()` returns a reference (`&Url`), but an owned `Url` is needed for the reqwest client in the async context to avoid lifetime issues.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-08-26T14:52:37.000Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2099
File: crates/common/src/hyper_rpc.rs:3-7
Timestamp: 2025-08-26T14:52:37.000Z
Learning: In the rain.orderbook codebase, creating new reqwest::Client instances per request in HyperRpcClient is not considered an issue by the maintainers, despite potential performance benefits of client reuse.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-06-17T16:21:24.384Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1903
File: crates/settings/src/yaml/orderbook.rs:371-377
Timestamp: 2025-06-17T16:21:24.384Z
Learning: In crates/settings/src/yaml/orderbook.rs tests, the user findolor considers RPC ordering in Vec<Url> assertions to be intentional and not a test brittleness issue. The ordering of RPCs in tests should be preserved as specified.

Applied to files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-10-06T08:05:02.438Z
Learnt from: CR
Repo: rainlanguage/rain.orderbook PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-10-06T08:05:02.438Z
Learning: Applies to crates/*/tests/**/*.rs : Rust crates must have passing tests (cargo test --workspace or --package <crate>)

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-07-11T06:40:49.511Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1983
File: crates/js_api/src/gui/mod.rs:964-1199
Timestamp: 2025-07-11T06:40:49.511Z
Learning: In crates/js_api/src/gui/mod.rs tests, findolor prefers to keep large hardcoded YAML strings (like the validation test YAML) inline in the test file rather than extracting them to separate external files for maintainability purposes.

Applied to files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-07-09T07:42:48.458Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1975
File: crates/js_api/src/config.rs:44-45
Timestamp: 2025-07-09T07:42:48.458Z
Learning: User findolor prefers to defer code improvements and refactoring suggestions to future PRs when they are not directly related to the current PR's scope, even when the suggestions are valid improvements.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-06-24T08:45:10.971Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1947
File: crates/common/src/raindex_client/orders.rs:462-1643
Timestamp: 2025-06-24T08:45:10.971Z
Learning: User findolor prefers to keep tests simple and avoid refactoring for maintainability when the code is stable and viewed as "one time thing" that won't require frequent changes.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-06-05T12:20:20.455Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1831
File: packages/ui-components/src/lib/components/transactions/getStatusEmoji.ts:13-22
Timestamp: 2025-06-05T12:20:20.455Z
Learning: The user findolor prefers not to add test coverage for fallback cases in the getStatusEmoji function when using ts-pattern's .otherwise() method.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-09-02T08:04:34.729Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2117
File: crates/js_api/src/gui/select_tokens.rs:421-424
Timestamp: 2025-09-02T08:04:34.729Z
Learning: User findolor in the rainlanguage/rain.orderbook codebase prefers not to change test code when tests are already passing, even if there appear to be type mismatches or potential issues. Working tests should be left alone unless there are actual errors.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-06-11T12:39:28.613Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1917
File: packages/webapp/src/__tests__/resetActiveOrderbookRef.test.ts:32-40
Timestamp: 2025-06-11T12:39:28.613Z
Learning: In this repository, the maintainer (findolor) is fine with using `as any` casts inside test files; such casts should not be flagged as issues.

Applied to files:

  • crates/settings/src/remote/manifest.rs
📚 Learning: 2025-10-06T11:13:29.956Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2123
File: crates/common/src/raindex_client/local_db/mod.rs:23-29
Timestamp: 2025-10-06T11:13:29.956Z
Learning: In `crates/common/src/raindex_client/local_db/mod.rs`, the `Default` implementation for `LocalDb` that creates an RPC client pointing to `http://localhost:4444` is acceptable because the RPC client must be explicitly configured before actual usage in production scenarios.

Applied to files:

  • crates/settings/src/remote/manifest.rs
  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-10-14T07:51:55.148Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2202
File: crates/common/src/raindex_client/local_db/sync.rs:33-34
Timestamp: 2025-10-14T07:51:55.148Z
Learning: In `crates/common/src/raindex_client/local_db/sync.rs`, the hard-coded `DEFAULT_SYNC_CHAIN_ID` constant (set to `SUPPORTED_LOCAL_DB_CHAINS[0]`) will be replaced with proper chain ID handling in downstream PRs as part of the multi-network/orderbook implementation.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-09-01T13:35:33.688Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2116
File: crates/settings/src/orderbook.rs:569-0
Timestamp: 2025-09-01T13:35:33.688Z
Learning: In crates/settings/src/orderbook.rs, the deployment-block field is required in orderbook YAML configuration and should cause a parsing error when missing, not default to any value.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-03-31T14:36:11.049Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1474
File: crates/js_api/src/yaml/mod.rs:37-44
Timestamp: 2025-03-31T14:36:11.049Z
Learning: The OrderbookYaml implementation in crates/js_api/src/yaml/mod.rs intentionally parses YAML on demand without caching results. This is a deliberate design choice by the author to process YAML only when needed rather than optimizing for repeated calls.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-06-18T19:23:33.747Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1938
File: crates/settings/src/yaml/mod.rs:176-178
Timestamp: 2025-06-18T19:23:33.747Z
Learning: In crates/settings/src/yaml/mod.rs, the YamlError enum has two distinct error variants: `KeyNotFound(String)` for when a specific YAML key is not found in a hash/map, and `NotFound(String)` for when other types of entities (like networks, orderbooks, etc.) are not found in the configuration. These serve different purposes and should not be consolidated.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-07-04T10:26:24.289Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1956
File: crates/common/src/raindex_client/mod.rs:116-135
Timestamp: 2025-07-04T10:26:24.289Z
Learning: In crates/common/src/raindex_client/mod.rs, the get_multi_subgraph_args method intentionally treats Some(empty vector) the same as None for chain_ids parameter. Both cases should return all networks to support UI behavior where no selection or empty selection means "show all networks". Only when specific chain IDs are provided should the results be filtered.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-04-29T06:19:15.446Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1706
File: crates/quote/src/js_api/mod.rs:57-61
Timestamp: 2025-04-29T06:19:15.446Z
Learning: In the rain.orderbook codebase, the `get_id` function is designed to use `U256::from_str()` rather than `U256::from_hex()` for parsing order hashes, despite the potential for hex string inputs.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-08-02T03:55:25.215Z
Learnt from: brusherru
Repo: rainlanguage/rain.orderbook PR: 2036
File: packages/orderbook/test/js_api/filters.test.ts:19-36
Timestamp: 2025-08-02T03:55:25.215Z
Learning: In the rainlanguage/rain.orderbook project's WASM tests, the pattern of chaining `.value!` calls on WASM result types (like from VaultsFilterBuilder methods) is the established and preferred approach for handling WASM results, and should not be refactored into intermediate variables as it would add unnecessary verbosity without improving the code.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-06-18T18:24:32.049Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1938
File: crates/settings/src/yaml/orderbook.rs:180-199
Timestamp: 2025-06-18T18:24:32.049Z
Learning: In crates/settings/src/yaml/orderbook.rs, the user prefers to avoid refactoring duplicate search logic between get_orderbook_by_address and get_orderbook_by_network_key when there are only 2 functions, indicating they would consider it if more similar functions are added in the future.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-07-16T10:40:05.717Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2000
File: crates/common/src/raindex_client/vaults.rs:183-183
Timestamp: 2025-07-16T10:40:05.717Z
Learning: In the rainlanguage/rain.orderbook codebase, user findolor considers breaking changes from Option<U256> to U256 for required fields like decimals in RaindexVaultToken to be acceptable and safe, even when they affect multiple usage sites across the codebase.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-07-04T09:02:57.301Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1956
File: crates/common/src/fuzz/mod.rs:64-64
Timestamp: 2025-07-04T09:02:57.301Z
Learning: In rainlanguage/rain.orderbook, user findolor prefers to limit type consistency changes to only the parts directly related to the current work scope. For example, when updating chain_id fields from u64 to u32 in fuzz-related code, unrelated files like tauri-app wallet commands can remain as u64 if they serve different purposes and aren't part of the current changes.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-03-31T13:57:59.660Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1474
File: crates/js_api/src/yaml/mod.rs:31-34
Timestamp: 2025-03-31T13:57:59.660Z
Learning: The OrderbookYaml constructor in crates/js_api/src/yaml/mod.rs does not need early YAML validation. The author prefers to validate YAML only when it's actually used rather than during initialization.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-09-24T10:59:25.666Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 2163
File: crates/js_api/src/gui/state_management.rs:196-203
Timestamp: 2025-09-24T10:59:25.666Z
Learning: In crates/js_api/src/gui/state_management.rs, the RainMetaDocumentV1Item::hash(false) method returns a slice, so wrapping the result with FixedBytes() is necessary to convert it to the expected FixedBytes type for DotrainGuiStateV1.dotrain_hash field.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-10-06T08:05:02.438Z
Learnt from: CR
Repo: rainlanguage/rain.orderbook PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-10-06T08:05:02.438Z
Learning: Applies to crates/*/{src,tests,benches,examples}/**/*.rs : Rust code must pass clippy with all warnings denied (cargo clippy --workspace --all-targets --all-features -D warnings)

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-06-18T19:24:40.518Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1938
File: crates/settings/src/yaml/orderbook.rs:185-199
Timestamp: 2025-06-18T19:24:40.518Z
Learning: In crates/settings/src/yaml/orderbook.rs, the user prefers not to refactor get_orderbook_by_network_key to handle multiple orderbooks per network key since their current architecture maintains a one-to-one mapping between orderbooks and networks. They would consider the refactoring if the system evolves to support multiple orderbooks per network in the future.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
📚 Learning: 2025-04-07T09:54:21.782Z
Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1483
File: crates/settings/src/yaml/orderbook.rs:45-45
Timestamp: 2025-04-07T09:54:21.782Z
Learning: The validation in OrderbookYaml's new method that includes RemoteNetworksCfg::parse_all_from_yaml is intentional and should remain as is, without conditional handling for users that only have local networks.

Applied to files:

  • crates/settings/src/local_db_manifest.rs
🧬 Code graph analysis (2)
crates/settings/src/remote/manifest.rs (2)
crates/settings/src/local_db_manifest.rs (1)
  • parse_manifest_doc (50-61)
crates/settings/src/yaml/mod.rs (1)
  • load_yaml (363-369)
crates/settings/src/local_db_manifest.rs (2)
crates/settings/src/utils.rs (3)
  • parse_positive_u32 (4-24)
  • parse_positive_u64 (26-46)
  • parse_url (48-56)
crates/settings/src/yaml/mod.rs (2)
  • require_hash (411-440)
  • require_string (371-406)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test
  • GitHub Check: standard-tests (ubuntu-latest, test-js-bindings)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-legal)
  • GitHub Check: git-clean
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-static)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-artifacts)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-rs-static)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-test)
  • GitHub Check: standard-tests (ubuntu-latest, ob-rs-test, true)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-wasm-artifacts)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-sol-test)
  • GitHub Check: standard-tests (ubuntu-latest, rainix-rs-artifacts, true)
  • GitHub Check: build-tauri (ubuntu-22.04, true)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Deploy-Docs-Preview
  • GitHub Check: Deploy-Preview-Push
🔇 Additional comments (9)
crates/settings/src/remote/manifest.rs (3)

1-12: LGTM! Clean error handling design.

The error enum appropriately wraps the two failure modes (network/HTTP errors and YAML parsing errors) with automatic From conversions. The use of thiserror makes this idiomatic and maintainable.


14-19: LGTM! Fetch logic correctly integrates with existing parsing.

The function cleanly delegates to the existing YAML parsing infrastructure and properly propagates errors through the type system.


21-185: LGTM! Comprehensive test coverage.

The test suite effectively covers:

  • Happy path with full manifest validation
  • Forward compatibility (unknown fields ignored)
  • YAML parsing error scenarios
  • Type validation errors
  • HTTP error handling

The use of httpmock for async HTTP mocking is appropriate and the tests are well-structured.

crates/settings/src/local_db_manifest.rs (6)

1-48: LGTM! Well-structured data model and helpers.

The data structures cleanly model the manifest hierarchy, and the find method provides an efficient way to locate orderbooks across networks. The version helper functions offer a clear API for compatibility checks.


50-99: LGTM! Manifest parsing with version compatibility check.

The parsing logic correctly enforces manifest version compatibility at parse time (lines 74-86), failing fast with a clear error message when an incompatible version is encountered. Good use of location context in error messages.


101-134: LGTM! Network parsing with empty key rejection.

The parsing correctly rejects empty network keys (lines 119-127), ensuring data quality and preventing ambiguous configurations. Error messages include proper location context.


136-166: LGTM! Clean network parsing logic.

The function appropriately validates the network structure, parses the chain ID, and delegates orderbook parsing. Location strings provide good error context.


168-220: LGTM! Thorough orderbook field parsing.

The function validates all required fields with appropriate type parsing:

  • Address validation via Address::from_str
  • URL syntax validation via parse_url
  • Positive integer validation for block numbers and timestamps
  • Bytes parsing for block hash with error mapping

Error handling consistently includes location context for debugging.


222-626: LGTM! Excellent comprehensive test coverage.

The test suite thoroughly validates:

  • Header parsing with valid and incompatible versions
  • Empty network key rejection (both helper and full parse)
  • Individual field parsing (addresses, URLs, integers, bytes)
  • Missing required fields at all levels (manifest, network, orderbook)
  • Zero/invalid values for numeric fields
  • Type mismatches
  • Cross-network search functionality with positive and negative cases
  • Version helper functions

The tests use appropriate pattern matching to validate specific error variants and provide good coverage of edge cases.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 27, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9fb619 and b683159.

📒 Files selected for processing (6)
  • crates/settings/src/lib.rs (2 hunks)
  • crates/settings/src/local_db_manifest.rs (1 hunks)
  • crates/settings/src/local_db_sync.rs (2 hunks)
  • crates/settings/src/remote/manifest.rs (1 hunks)
  • crates/settings/src/remote/mod.rs (1 hunks)
  • crates/settings/src/utils.rs (1 hunks)
🔇 Additional comments (6)
crates/settings/src/local_db_manifest.rs (2)

33-40: Define behavior when multiple networks share the same chain_id

find() returns the first network with the chain_id, which is ambiguous if duplicates exist. Either enforce uniqueness of chain_id across networks at parse-time or expose a lookup that takes network key.

Would you like a follow-up PR to (a) enforce unique chain_id across networks during parsing, or (b) add find_by_network_key/find_all_by_chain_id?


93-97: Clarify whether empty orderbooks are valid

The parser allows an empty list. If at least one orderbook is required per network, add a non-empty check and test. If empty is valid, consider documenting that invariant.

crates/settings/src/local_db_sync.rs (1)

1-1: LGTM: centralizing numeric parsing in utils is the right direction

Good reuse; keeps error formatting consistent with other parsers. Please run the existing tests to confirm error strings remain identical.

Also applies to: 36-66

crates/settings/src/lib.rs (1)

7-7: LGTM: public modules exported

local_db_manifest and utils exposure looks good and matches new usage.

Also applies to: 24-24

crates/settings/src/remote/mod.rs (1)

2-2: Module presence and visibility verified

The file crates/settings/src/remote/manifest.rs exists (5,478 bytes) and is properly declared as pub mod manifest; in mod.rs. The module is correctly exposed with public visibility.

crates/settings/src/remote/manifest.rs (1)

14-19: Verify trust boundaries for manifest URL source.

The review concern is technically valid: the fetch() function accepts a URL and makes an unguarded reqwest::get() call. However, the function is currently only used in tests and no production call sites were found. More importantly, similar patterns exist throughout the remote module (fetch_networks, fetch_tokens) that also lack SSRF validation.

The key question: are these URLs trusted (e.g., hardcoded application config) or untrusted (user-provided input)? From the codebase, URLs appear to originate from YAML configuration files, but the full config loading mechanism and trust model aren't visible from code inspection alone.

If URLs can be user-controlled or externally sourced, the reviewer's suggestion to enforce http(s)-only and block loopback/private IPs is sound defense-in-depth. If they're always application-controlled configuration, it may be acceptable with documented trust boundaries. Clarify the intended use case and document the trust assumptions.

Comment thread crates/settings/src/local_db_manifest.rs
Comment thread crates/settings/src/local_db_manifest.rs Outdated
Comment on lines +109 to +113
let dump_url_str =
require_string(ob_yaml, Some("dump-url"), Some(location_ob.clone()))?;
let dump_url = parse_url(&dump_url_str, "dump-url", location_ob.clone())?;

let end_block = parse_positive_u64(
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Limit dump-url to http/https to prevent unsafe schemes

Current parse accepts any URL scheme. To avoid SSRF/unsafe fetches (e.g., file:, data:, ftp:), restrict to http(s) at parse-time and surface a clear error.

Apply in two steps:

  1. Add parse_http_url in utils (see comment in utils.rs).
  2. Use it here:
-            let dump_url = parse_url(&dump_url_str, "dump-url", location_ob.clone())?;
+            let dump_url = crate::utils::parse_http_url(&dump_url_str, "dump-url", location_ob.clone())?;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/settings/src/local_db_manifest.rs around lines 109 to 113, the code
currently calls parse_url which accepts any URL scheme; replace that with the
new parse_http_url (implemented in utils.rs as requested) so only http and https
schemes are allowed at parse time. Update the use/import if necessary, call
parse_http_url(&dump_url_str, "dump-url", location_ob.clone()) in place of
parse_url, and ensure the error returned from parse_http_url clearly indicates
that only http/https schemes are permitted (preserve the existing location
context).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump url is already Url type by the time we save it in our struct. we do the validation when populating the field

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how dump_url is used in the codebase
rg -nP --type=rust -C3 'dump_url' -g '!target/**'

Length of output: 1838


🏁 Script executed:

#!/bin/bash
# Check if there's any URL scheme validation in the codebase
rg -nP --type=rust -C3 '(scheme|http|https).*[Uu]rl|[Uu]rl.*(scheme|http|https)' -g '!target/**'

Length of output: 72502


🏁 Script executed:

#!/bin/bash
# Look at the parse_url implementation in utils.rs
cat crates/settings/src/utils.rs

Length of output: 3559


🏁 Script executed:

#!/bin/bash
# Check where dump_url is actually used/fetched in the codebase
rg -nP --type=rust -C5 '\.dump_url\b' -g '!target/**'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if reqwest has any scheme restrictions
rg -nP --type=rust -C3 'reqwest.*get|get.*reqwest' crates/settings/src/remote/manifest.rs

Length of output: 264


@findolor, I understand that parse_url validates the URL format when populating the field. However, my concern is specifically about the URL scheme. Currently, parse_url accepts any valid URL scheme including potentially unsafe ones like file://, data://, ftp://, etc.

Looking at the code:

  • parse_url (line 111) validates that the string is a well-formed URL
  • But it doesn't restrict the scheme to only http:// or https://

When dump_url is eventually fetched (I don't see it being used yet in the codebase), allowing non-HTTP schemes could pose security risks:

  • file:// could expose local files
  • data:// could bypass security controls
  • Other schemes might have unexpected behavior

Benefits of validating scheme at parse-time:

  1. ✅ Early error detection with clear message to manifest authors
  2. ✅ Security-by-default - prevent misconfigurations before they're deployed
  3. ✅ Self-documenting - makes it explicit that only HTTP(S) is supported

Are you planning to validate the scheme when dump_url is actually used? If so, where does that validation happen? Or would you prefer to add scheme validation at parse-time as suggested?

Comment on lines +119 to +121
let end_block_hash =
require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?;

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Validate end-block-hash shape (0x + 32 bytes)

If this is an Ethereum block hash, enforce 0x prefix and 32-byte hex length (66 chars). Prevents bad data landing in state.

             let end_block_hash =
                 require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?;
+            if !(end_block_hash.starts_with("0x") && end_block_hash.len() == 66) {
+                return Err(YamlError::Field {
+                    kind: FieldErrorKind::InvalidValue {
+                        field: "end-block-hash".to_string(),
+                        reason: "expected 0x-prefixed 32-byte hex".to_string(),
+                    },
+                    location: location_ob.clone(),
+                });
+            }
🤖 Prompt for AI Agents
In crates/settings/src/local_db_manifest.rs around lines 119 to 121, the
retrieved end_block_hash string must be validated to ensure it is an
Ethereum-style 0x-prefixed 32-byte hex (66 chars) and reject malformed values;
after calling require_string(...) assign to end_block_hash, then check that
end_block_hash.starts_with("0x") and end_block_hash.len() == 66 and that the hex
payload (strip the "0x") decodes as valid hex bytes (or matches a regex like
r"^0x[0-9a-fA-F]{64}$"); if validation fails, return an appropriate error (with
location_ob context) instead of accepting the value so bad data cannot land in
state.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the type to be Bytes from alloy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the ManifestOrderbook struct and end_block_hash parsing
cat crates/settings/src/local_db_manifest.rs | grep -A 10 "pub struct ManifestOrderbook"
echo "---"
cat crates/settings/src/local_db_manifest.rs | grep -A 5 "end_block_hash"

Length of output: 855


🏁 Script executed:

#!/bin/bash
# Check if there's a parse_bytes or similar function in utils
cat crates/settings/src/utils.rs | grep -i "bytes\|parse"
echo "---"
# Check recent commits on this branch
git log --oneline -5

Length of output: 2069


@findolor Great solution! Using Bytes from alloy will automatically handle hex validation.

One refinement to consider: since Ethereum block hashes are always exactly 32 bytes, you could use alloy::primitives::FixedBytes<32> (or the B256 type alias) instead of Bytes. This provides stronger compile-time guarantees:

use alloy::primitives::FixedBytes;

pub struct ManifestOrderbook {
    pub address: Address,
    pub dump_url: Url,
    pub end_block: u64,
    pub end_block_hash: FixedBytes<32>, // or B256
    pub end_block_time_ms: u64,
}

Then parsing becomes:

let end_block_hash_str = require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?;
let end_block_hash = FixedBytes::<32>::from_str(&end_block_hash_str).map_err(|e| YamlError::Field {
    kind: FieldErrorKind::InvalidValue {
        field: "end-block-hash".to_string(),
        reason: e.to_string(),
    },
    location: location_ob.clone(),
})?;

This enforces the 32-byte constraint at the type level and validates the 0x prefix and hex format automatically through FromStr. Either approach works well though! ✨

Comment on lines +1 to +4
use crate::local_db_manifest::{parse_manifest_doc, LocalDbManifest};
use crate::yaml::{load_yaml, YamlError};
use thiserror::Error;
use url::Url;
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: module docs and tracing.

Add a brief rustdoc and #[tracing::instrument(skip(...))] on fetch for observability.

🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 1 to 4, add a
module-level rustdoc describing that this module handles fetching and parsing
remote manifests, and annotate the fetch function with
#[tracing::instrument(skip(...))] for observability; specifically, import
tracing::instrument if needed and place #[tracing::instrument(skip(client,
credentials, request_body))] (or skip whichever parameters are non‑Debug/contain
secrets such as HTTP client, auth tokens, or request bodies) directly above the
fetch function signature so that calls are traced but sensitive/non‑serializable
arguments are not logged.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood! I've noted your decision. Feel free to let me know if you need anything else reviewed.

Comment on lines +6 to +12
#[derive(Error, Debug)]
pub enum FetchManifestError {
#[error(transparent)]
ReqwestError(#[from] reqwest::Error),
#[error(transparent)]
Yaml(#[from] YamlError),
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extend error type for domain-specific failures (size cap, scheme).

Add a concrete variant for oversized bodies; optionally add UnsupportedScheme if you later enforce http(s)-only pre-checks. This preserves intent and avoids overloading reqwest::Error.

 #[derive(Error, Debug)]
 pub enum FetchManifestError {
     #[error(transparent)]
     ReqwestError(#[from] reqwest::Error),
     #[error(transparent)]
     Yaml(#[from] YamlError),
+    #[error("manifest body too large ({0} bytes)")]
+    BodyTooLarge(usize),
+    // Optional if you decide to pre-validate:
+    // #[error("unsupported URL scheme: {0}")]
+    // UnsupportedScheme(String),
 }
🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 6 to 12, the
FetchManifestError enum only wraps reqwest::Error and YamlError which hides
domain-specific failures; add concrete variants such as OversizedBody { size:
usize, limit: usize } (or OversizedBody(usize, usize)) to represent when a
response exceeds the size cap, and optionally UnsupportedScheme(String) if you
later pre-check schemes; update places that currently map oversized/unsupported
cases into reqwest::Error to instead construct these new variants (and implement
From conversions only where appropriate), and update error messages/tests to use
the new variants so domain failures are explicit rather than overloaded into
reqwest::Error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood. I've noted your decision.

Comment on lines +14 to +19
pub async fn fetch(url: Url) -> Result<LocalDbManifest, FetchManifestError> {
let text = reqwest::get(url.to_string()).await?.text().await?;
let doc = load_yaml(&text)?;
let manifest = parse_manifest_doc(&doc)?;
Ok(manifest)
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat non-2xx as errors and avoid needless allocation.

Use error_for_status() so 4xx/5xx don’t parse as YAML, and pass Url directly (IntoUrl). Minimal patch:

-    let text = reqwest::get(url.to_string()).await?.text().await?;
+    let text = reqwest::get(url)
+        .await?
+        .error_for_status()?
+        .text()
+        .await?;

Stronger hardening (timeouts, pooling, size cap). Consider a shared Client with sane defaults and a max body size to avoid hangs/OOM. As per coding guidelines.

Outside-range additions:

// imports
use std::{sync::OnceLock, time::Duration};
use reqwest::redirect::Policy;

// global client + cap
static CLIENT: OnceLock<reqwest::Client> = OnceLock::new();
const MAX_MANIFEST_BYTES: usize = 512 * 1024; // 512 KiB cap (tune as needed)

And replace the body of fetch with:

let client = CLIENT.get_or_init(|| {
    reqwest::Client::builder()
        .timeout(Duration::from_secs(10))
        .redirect(Policy::limited(5))
        .user_agent(concat!("rain.orderbook/", env!("CARGO_PKG_VERSION")))
        .build()
        .expect("reqwest client")
});

let resp = client.get(url.clone()).send().await?.error_for_status()?;
if let Some(len) = resp.content_length() {
    if len > MAX_MANIFEST_BYTES as u64 {
        return Err(FetchManifestError::BodyTooLarge(len as usize));
    }
}
let bytes = resp.bytes().await?;
if bytes.len() > MAX_MANIFEST_BYTES {
    return Err(FetchManifestError::BodyTooLarge(bytes.len()));
}
let text = String::from_utf8_lossy(&bytes);
let doc = load_yaml(&text)?;
let manifest = parse_manifest_doc(&doc)?;
Ok(manifest)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood! I'll skip these hardening improvements for now.

Comment on lines +21 to +180
#[cfg(test)]
mod tests {
use super::*;
use alloy::primitives::address;
use httpmock::MockServer;

#[tokio::test]
async fn test_fetch_manifest_happy_path() {
let server = MockServer::start_async().await;
let yaml = r#"
manifest-version: 1
db-schema-version: 1
networks:
mainnet:
chain-id: 1
orderbooks:
- address: "0x0000000000000000000000000000000000000001"
dump-url: "http://example.com/dump1"
end-block: 123
end-block-hash: "0xabc"
end-block-time-ms: 1000
"#;

server
.mock_async(|when, then| {
when.method("GET").path("/");
then.status(200)
.header("content-type", "application/x-yaml")
.body(yaml);
})
.await;

let manifest = fetch(Url::parse(&server.base_url()).unwrap())
.await
.unwrap();

assert_eq!(manifest.manifest_version, 1);
assert_eq!(manifest.db_schema_version, 1);
let net = manifest.networks.get("mainnet").unwrap();
assert_eq!(net.chain_id, 1);
assert_eq!(net.orderbooks.len(), 1);
assert_eq!(
net.orderbooks[0].address,
address!("0x0000000000000000000000000000000000000001")
);
assert_eq!(net.orderbooks[0].end_block, 123);
assert_eq!(net.orderbooks[0].end_block_hash, "0xabc");
assert_eq!(net.orderbooks[0].end_block_time_ms, 1000);

// find helper
let found = manifest.find(1, address!("0x0000000000000000000000000000000000000001"));
assert!(found.is_some());
}

#[tokio::test]
async fn test_fetch_manifest_unknown_fields_ignored() {
let server = MockServer::start_async().await;
let yaml = r#"
manifest-version: 1
db-schema-version: 1
extra-root: ignored
networks:
goerli:
chain-id: 5
extra: ignored
orderbooks:
- address: "0x0000000000000000000000000000000000000002"
dump-url: "http://example.com/dump2"
end-block: 555
end-block-hash: "0xdef"
end-block-time-ms: 2000
extra-ob: ignored
"#;

server
.mock_async(|when, then| {
when.method("GET").path("/");
then.status(200).body(yaml);
})
.await;

let manifest = fetch(Url::parse(&server.base_url()).unwrap())
.await
.unwrap();

assert!(manifest.networks.contains_key("goerli"));
let net = manifest.networks.get("goerli").unwrap();
assert_eq!(net.chain_id, 5);
assert_eq!(net.orderbooks.len(), 1);
}

#[tokio::test]
async fn test_fetch_manifest_invalid_yaml() {
let server = MockServer::start_async().await;
let yaml = "manifest-version: [\n"; // malformed

server
.mock_async(|when, then| {
when.method("GET").path("/");
then.status(200).body(yaml);
})
.await;

let err = fetch(Url::parse(&server.base_url()).unwrap())
.await
.unwrap_err();
match err {
// Some malformed YAML inputs are surfaced as ScanError by the loader
FetchManifestError::Yaml(YamlError::ScanError(_)) => {}
// In certain cases, incomplete structures may parse into BadValue and
// be reported later as a field error; accept that as invalid YAML too.
FetchManifestError::Yaml(YamlError::Field { .. }) => {}
_ => panic!("expected YAML scan or field error"),
}
}

#[tokio::test]
async fn test_fetch_manifest_invalid_types_and_values() {
let server = MockServer::start_async().await;
let yaml = r#"
manifest-version: 1
db-schema-version: 1
networks:
mainnet:
chain-id: 1
orderbooks:
- address: 123 # invalid type
dump-url: "not-a-url"
end-block: 0
end-block-hash: 999 # invalid type
end-block-time-ms: 0
"#;

server
.mock_async(|when, then| {
when.method("GET").path("/");
then.status(200).body(yaml);
})
.await;

let err = fetch(Url::parse(&server.base_url()).unwrap())
.await
.unwrap_err();
match err {
FetchManifestError::Yaml(YamlError::Field { .. }) => {}
_ => panic!("expected field error"),
}
}

#[tokio::test]
async fn test_fetch_manifest_http_error_path() {
// Use an unsupported scheme to deterministically trigger a reqwest error
let url = Url::parse("ftp://example.com").unwrap();
let err = fetch(url).await.unwrap_err();
match err {
FetchManifestError::ReqwestError(_) => {}
other => panic!("expected reqwest error, got {other:?}"),
}
}
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Tests: add coverage for status errors and large bodies.

  • Add a 404/500 case and assert FetchManifestError::ReqwestError when using error_for_status().
  • Add a “too large” response (> MAX_MANIFEST_BYTES) to assert BodyTooLarge.
  • If enforcing scheme/http(s)-only, keep ftp test but assert your new error variant instead of reqwest’s.
🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 21 to 180, the tests
currently cover happy/path, malformed YAML, invalid values and an ftp reqwest
error but lack coverage for HTTP status errors and oversize bodies; add tests
that (1) mock a 404 (and optionally 500) response and assert fetch(...) returns
FetchManifestError::ReqwestError (coming from error_for_status), (2) mock a
response whose body length exceeds MAX_MANIFEST_BYTES and assert fetch(...)
returns the BodyTooLarge error variant, and (3) if the implementation enforces
scheme (http/https) instead of relying on reqwest to error for ftp, update the
existing ftp test to expect your custom scheme error variant rather than
ReqwestError; add these tests to the same tests module using MockServer
similarly to the existing tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood! I've noted your decision to proceed without the additional test coverage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood! I'll move forward without those additional tests.

Comment on lines +4 to +24
pub fn parse_positive_u32(value: &str, field: &str, location: String) -> Result<u32, YamlError> {
let parsed: u32 = value
.parse()
.map_err(|e: std::num::ParseIntError| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: e.to_string(),
},
location: location.clone(),
})?;
if parsed == 0 {
return Err(YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: "must be a positive integer".to_string(),
},
location,
});
}
Ok(parsed)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Deduplicate the positive integer parsers; avoid cloning location

Minor ergonomics/perf:

  • Factor the common logic into a generic helper to remove duplication.
  • Accept &str for location and to_string inside, avoiding repeated clones at call sites.

Example:

fn parse_positive<T>(s: &str, field: &str, location: &str) -> Result<T, YamlError>
where
    T: std::str::FromStr + PartialEq + Default,
    <T as std::str::FromStr>::Err: std::fmt::Display,
{
    let parsed: T = s.parse().map_err(|e| YamlError::Field {
        kind: FieldErrorKind::InvalidValue { field: field.to_string(), reason: e.to_string() },
        location: location.to_string(),
    })?;
    if parsed == T::default() {
        return Err(YamlError::Field {
            kind: FieldErrorKind::InvalidValue { field: field.to_string(), reason: "must be a positive integer".to_string() },
            location: location.to_string(),
        });
    }
    Ok(parsed)
}

Then define parse_positive_u32/parse_positive_u64 as thin wrappers calling parse_positive.

Also applies to: 26-46

🤖 Prompt for AI Agents
crates/settings/src/utils.rs lines 4-24 (and similar code at 26-46): the current
parse_positive_u32 duplicates parsing/validation logic and clones the location;
replace with a single generic helper parse_positive<T>(s: &str, field: &str,
location: &str) -> Result<T, YamlError> where T: FromStr + PartialEq + Default
and <T as FromStr>::Err: Display, that parses s, maps parse errors to
YamlError::Field using location.to_string() once, checks parsed == T::default()
to return the same "must be a positive integer" Field error (using
location.to_string()), and then implement parse_positive_u32 and
parse_positive_u64 as thin wrappers that call parse_positive with &str location
to avoid cloning at call sites; update call sites to pass &str for location.

Comment on lines +48 to +56
pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
Url::parse(value).map_err(|e: UrlParseError| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: e.to_string(),
},
location,
})
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add http(s)-only URL parser and use it for network fetches

To avoid unsafe schemes, provide a dedicated validator.

 pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
     Url::parse(value).map_err(|e: UrlParseError| YamlError::Field {
         kind: FieldErrorKind::InvalidValue {
             field: field.to_string(),
             reason: e.to_string(),
         },
         location,
     })
 }
+
+/// Parse URL and restrict scheme to http/https.
+pub fn parse_http_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
+    let url = parse_url(value, field, location.clone())?;
+    match url.scheme() {
+        "http" | "https" => Ok(url),
+        _ => Err(YamlError::Field {
+            kind: FieldErrorKind::InvalidValue {
+                field: field.to_string(),
+                reason: "unsupported url scheme; only http/https are allowed".to_string(),
+            },
+            location,
+        }),
+    }
+}

Optional: add a unit test asserting file:// is rejected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
Url::parse(value).map_err(|e: UrlParseError| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: e.to_string(),
},
location,
})
}
pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
Url::parse(value).map_err(|e: UrlParseError| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: e.to_string(),
},
location,
})
}
/// Parse URL and restrict scheme to http/https.
pub fn parse_http_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
let url = parse_url(value, field, location.clone())?;
match url.scheme() {
"http" | "https" => Ok(url),
_ => Err(YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: "unsupported url scheme; only http/https are allowed".to_string(),
},
location,
}),
}
}
🤖 Prompt for AI Agents
In crates/settings/src/utils.rs around lines 48 to 56, the current parse_url
accepts any valid URL scheme; add a new http(s)-only validator (e.g.,
parse_http_url) that calls Url::parse(value) and then checks url.scheme() is
"http" or "https", returning the same YamlError::Field InvalidValue if the
scheme is not allowed; update all network-fetch call sites to use the new
parse_http_url instead of parse_url; optionally add a unit test that verifies a
"file://" URL is rejected.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findolor, understood! I'll skip this suggestion.

version == MANIFEST_VERSION
}

pub fn parse_manifest_doc(doc: &StrictYaml) -> Result<LocalDbManifest, YamlError> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor this function? it's four screens long

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw there is a clippy rule we can enable to flag this automatically

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can enable clippy rule in a separate PR since it might flag some other things in the repo

@findolor findolor requested a review from 0xgleb October 28, 2025 06:28
@findolor findolor linked an issue Oct 29, 2025 that may be closed by this pull request
Base automatically changed from local-db-yaml-spec-bump to main October 31, 2025 17:51
@findolor findolor merged commit 1211c7d into main Nov 3, 2025
19 checks passed
@findolor findolor deleted the local-db-manifest-new branch November 3, 2025 08:57
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 3, 2025

@coderabbitai assess this PR size classification for the totality of the PR with the following criterias and report it in your comment:

S/M/L PR Classification Guidelines:

This guide helps classify merged pull requests by effort and complexity rather than just line count. The goal is to assess the difficulty and scope of changes after they have been completed.

Small (S)

Characteristics:

  • Simple bug fixes, typos, or minor refactoring
  • Single-purpose changes affecting 1-2 files
  • Documentation updates
  • Configuration tweaks
  • Changes that require minimal context to review

Review Effort: Would have taken 5-10 minutes

Examples:

  • Fix typo in variable name
  • Update README with new instructions
  • Adjust configuration values
  • Simple one-line bug fixes
  • Import statement cleanup

Medium (M)

Characteristics:

  • Feature additions or enhancements
  • Refactoring that touches multiple files but maintains existing behavior
  • Breaking changes with backward compatibility
  • Changes requiring some domain knowledge to review

Review Effort: Would have taken 15-30 minutes

Examples:

  • Add new feature or component
  • Refactor common utility functions
  • Update dependencies with minor breaking changes
  • Add new component with tests
  • Performance optimizations
  • More complex bug fixes

Large (L)

Characteristics:

  • Major feature implementations
  • Breaking changes or API redesigns
  • Complex refactoring across multiple modules
  • New architectural patterns or significant design changes
  • Changes requiring deep context and multiple review rounds

Review Effort: Would have taken 45+ minutes

Examples:

  • Complete new feature with frontend/backend changes
  • Protocol upgrades or breaking changes
  • Major architectural refactoring
  • Framework or technology upgrades

Additional Factors to Consider

When deciding between sizes, also consider:

  • Test coverage impact: More comprehensive test changes lean toward larger classification
  • Risk level: Changes to critical systems bump up a size category
  • Team familiarity: Novel patterns or technologies increase complexity

Notes:

  • the assessment must be for the totality of the PR, that means comparing the base branch to the last commit of the PR
  • the assessment output must be exactly one of: S, M or L (single-line comment) in format of: SIZE={S/M/L}
  • do not include any additional text, only the size classification
  • your assessment comment must not include tips or additional sections
  • do NOT tag me or anyone else on your comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 3, 2025

Tip

For best results, initiate chat on the files or code changes.

SIZE=M


🧠 Learnings used
Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:42.275Z
Learning: For PR #1884, after each commit review, I should provide a new comment with a PR size report in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding irrelevant files such as lock files (package-lock.json, cargo.lock, etc.).

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:42.275Z
Learning: After each commit in PR #1884, I should provide a new comment with the PR size report in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`. The report should exclude irrelevant files such as lock files (e.g., package-lock.json, cargo.lock).

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:25:47.873Z
Learning: After each new commit review for rainlanguage/rain.orderbook repository, generate a PR size report in a separate comment with the format: "TOTAL=number ADDITIONS=number DELETIONS=number". This report should exclude irrelevant files such as lock files (e.g., package-lock.json, cargo.lock).

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:25:47.873Z
Learning: After each new commit review for rainlanguage/rain.orderbook repository, generate a PR size report in a separate comment with the exact format: "TOTAL=number\nADDITIONS=number\nDELETIONS=number". This report should exclude irrelevant files such as lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:14:22.646Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, I must consistently report the PR size after EVERY new commit in the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number" without any formatting blocks. The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock) and can be included either as part of the review or as a separate comment.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:14:22.646Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, I must consistently report the PR size after EVERY new commit in the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number" without any additional text or formatting blocks. The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock) and can be included either as part of the review or as a separate comment.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T22:46:08.530Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, the PR size report should be the only content in the comment - no text before it, no text after it, no formatting blocks, just the raw report in the exact format: "TOTAL=number\nADDITIONS=number\nDELETIONS=number". The report must exclude lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.

Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1971
File: .github/workflows/vercel-docs-prod.yaml:32-34
Timestamp: 2025-07-11T12:30:09.245Z
Learning: findolor prefers not to pin Node versions explicitly in GitHub Actions workflows when using Nix with webapp-shell, considering the current setup sufficient for deterministic builds in the rainlanguage/rain.orderbook project.

Learnt from: findolor
Repo: rainlanguage/rain.orderbook PR: 1971
File: .github/workflows/vercel-docs-prod.yaml:10-14
Timestamp: 2025-07-11T12:29:49.284Z
Learning: In GitHub Actions workflows for the rainlanguage/rain.orderbook project, findolor is comfortable with job names that may seem misleading (like "Deploy-Docs-Preview" in a production workflow) when there's only one job in the workflow, as the context makes it clear enough.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:24:53.708Z
Learning: For PR #1884 in rainlanguage/rain.orderbook, provide a separate comment after each review with PR size statistics in the format: `TOTAL=number ADDITIONS=number DELETIONS=number`, excluding lock files like package-lock.json and cargo.lock.

Learnt from: rouzwelt
Repo: rainlanguage/rain.orderbook PR: 0
File: :0-0
Timestamp: 2025-05-21T21:26:31.566Z
Learning: For PR #1884 and future PRs in rainlanguage/rain.orderbook repository, provide PR size reports in a separate comment with ONLY the exact format "TOTAL=number\nADDITIONS=number\nDELETIONS=number", without any additional text or explanation. Reports should exclude irrelevant files such as lock files (package-lock.json, yarn.lock, pnpm-lock.yaml, cargo.lock).

@coderabbitai coderabbitai Bot mentioned this pull request Nov 17, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manifest Config + Fetch Logic Make Local DB Bootstrap Dumps Configurable

3 participants