Implement parse and fetch logic for local db manifest#2273
Conversation
|
@coderabbitai review |
WalkthroughAdds YAML manifest parsing and related data types (local manifest), remote manifest fetching over HTTP, shared parsing utilities, and exposes these modules from the settings crate. Refactors local_db_sync to use the new utilities. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant RemoteManifest as remote::manifest::fetch()
participant HTTP as Reqwest
participant YAML as load_yaml
participant Parser as local_db_manifest::parse_manifest_doc
participant LocalDb as LocalDbManifest
Client->>RemoteManifest: fetch(url)
RemoteManifest->>HTTP: GET url
HTTP-->>RemoteManifest: response text
RemoteManifest->>YAML: load_yaml(response)
YAML-->>RemoteManifest: StrictYaml doc
RemoteManifest->>Parser: parse_manifest_doc(doc)
Parser->>Parser: validate header, networks, orderbooks
Parser-->>RemoteManifest: LocalDbManifest
RemoteManifest-->>Client: Result<LocalDbManifest, FetchManifestError>
Note over LocalDb: Consumer may call .find(chain_id, address)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (3)crates/*/{src,tests,benches,examples}/**/*.rs📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
crates/**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (27)📓 Common learnings📚 Learning: 2025-05-19T12:25:02.658ZApplied to files:
📚 Learning: 2025-08-26T14:52:37.000ZApplied to files:
📚 Learning: 2025-06-17T16:21:24.384ZApplied to files:
📚 Learning: 2025-10-06T08:05:02.438ZApplied to files:
📚 Learning: 2025-07-11T06:40:49.511ZApplied to files:
📚 Learning: 2025-07-09T07:42:48.458ZApplied to files:
📚 Learning: 2025-06-24T08:45:10.971ZApplied to files:
📚 Learning: 2025-06-05T12:20:20.455ZApplied to files:
📚 Learning: 2025-09-02T08:04:34.729ZApplied to files:
📚 Learning: 2025-06-11T12:39:28.613ZApplied to files:
📚 Learning: 2025-10-06T11:13:29.956ZApplied to files:
📚 Learning: 2025-10-14T07:51:55.148ZApplied to files:
📚 Learning: 2025-09-01T13:35:33.688ZApplied to files:
📚 Learning: 2025-03-31T14:36:11.049ZApplied to files:
📚 Learning: 2025-06-18T19:23:33.747ZApplied to files:
📚 Learning: 2025-07-04T10:26:24.289ZApplied to files:
📚 Learning: 2025-04-29T06:19:15.446ZApplied to files:
📚 Learning: 2025-08-02T03:55:25.215ZApplied to files:
📚 Learning: 2025-06-18T18:24:32.049ZApplied to files:
📚 Learning: 2025-07-16T10:40:05.717ZApplied to files:
📚 Learning: 2025-07-04T09:02:57.301ZApplied to files:
📚 Learning: 2025-03-31T13:57:59.660ZApplied to files:
📚 Learning: 2025-09-24T10:59:25.666ZApplied to files:
📚 Learning: 2025-10-06T08:05:02.438ZApplied to files:
📚 Learning: 2025-06-18T19:24:40.518ZApplied to files:
📚 Learning: 2025-04-07T09:54:21.782ZApplied to files:
🧬 Code graph analysis (2)crates/settings/src/remote/manifest.rs (2)
crates/settings/src/local_db_manifest.rs (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
🔇 Additional comments (9)
Comment |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 10
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
crates/settings/src/lib.rs(2 hunks)crates/settings/src/local_db_manifest.rs(1 hunks)crates/settings/src/local_db_sync.rs(2 hunks)crates/settings/src/remote/manifest.rs(1 hunks)crates/settings/src/remote/mod.rs(1 hunks)crates/settings/src/utils.rs(1 hunks)
🔇 Additional comments (6)
crates/settings/src/local_db_manifest.rs (2)
33-40: Define behavior when multiple networks share the same chain_idfind() returns the first network with the chain_id, which is ambiguous if duplicates exist. Either enforce uniqueness of chain_id across networks at parse-time or expose a lookup that takes network key.
Would you like a follow-up PR to (a) enforce unique chain_id across networks during parsing, or (b) add find_by_network_key/find_all_by_chain_id?
93-97: Clarify whether empty orderbooks are validThe parser allows an empty list. If at least one orderbook is required per network, add a non-empty check and test. If empty is valid, consider documenting that invariant.
crates/settings/src/local_db_sync.rs (1)
1-1: LGTM: centralizing numeric parsing in utils is the right directionGood reuse; keeps error formatting consistent with other parsers. Please run the existing tests to confirm error strings remain identical.
Also applies to: 36-66
crates/settings/src/lib.rs (1)
7-7: LGTM: public modules exportedlocal_db_manifest and utils exposure looks good and matches new usage.
Also applies to: 24-24
crates/settings/src/remote/mod.rs (1)
2-2: Module presence and visibility verifiedThe file
crates/settings/src/remote/manifest.rsexists (5,478 bytes) and is properly declared aspub mod manifest;inmod.rs. The module is correctly exposed with public visibility.crates/settings/src/remote/manifest.rs (1)
14-19: Verify trust boundaries for manifest URL source.The review concern is technically valid: the
fetch()function accepts a URL and makes an unguardedreqwest::get()call. However, the function is currently only used in tests and no production call sites were found. More importantly, similar patterns exist throughout the remote module (fetch_networks,fetch_tokens) that also lack SSRF validation.The key question: are these URLs trusted (e.g., hardcoded application config) or untrusted (user-provided input)? From the codebase, URLs appear to originate from YAML configuration files, but the full config loading mechanism and trust model aren't visible from code inspection alone.
If URLs can be user-controlled or externally sourced, the reviewer's suggestion to enforce http(s)-only and block loopback/private IPs is sound defense-in-depth. If they're always application-controlled configuration, it may be acceptable with documented trust boundaries. Clarify the intended use case and document the trust assumptions.
| let dump_url_str = | ||
| require_string(ob_yaml, Some("dump-url"), Some(location_ob.clone()))?; | ||
| let dump_url = parse_url(&dump_url_str, "dump-url", location_ob.clone())?; | ||
|
|
||
| let end_block = parse_positive_u64( |
There was a problem hiding this comment.
Limit dump-url to http/https to prevent unsafe schemes
Current parse accepts any URL scheme. To avoid SSRF/unsafe fetches (e.g., file:, data:, ftp:), restrict to http(s) at parse-time and surface a clear error.
Apply in two steps:
- Add parse_http_url in utils (see comment in utils.rs).
- Use it here:
- let dump_url = parse_url(&dump_url_str, "dump-url", location_ob.clone())?;
+ let dump_url = crate::utils::parse_http_url(&dump_url_str, "dump-url", location_ob.clone())?;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In crates/settings/src/local_db_manifest.rs around lines 109 to 113, the code
currently calls parse_url which accepts any URL scheme; replace that with the
new parse_http_url (implemented in utils.rs as requested) so only http and https
schemes are allowed at parse time. Update the use/import if necessary, call
parse_http_url(&dump_url_str, "dump-url", location_ob.clone()) in place of
parse_url, and ensure the error returned from parse_http_url clearly indicates
that only http/https schemes are permitted (preserve the existing location
context).
There was a problem hiding this comment.
dump url is already Url type by the time we save it in our struct. we do the validation when populating the field
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how dump_url is used in the codebase
rg -nP --type=rust -C3 'dump_url' -g '!target/**'
Length of output: 1838
🏁 Script executed:
#!/bin/bash
# Check if there's any URL scheme validation in the codebase
rg -nP --type=rust -C3 '(scheme|http|https).*[Uu]rl|[Uu]rl.*(scheme|http|https)' -g '!target/**'
Length of output: 72502
🏁 Script executed:
#!/bin/bash
# Look at the parse_url implementation in utils.rs
cat crates/settings/src/utils.rs
Length of output: 3559
🏁 Script executed:
#!/bin/bash
# Check where dump_url is actually used/fetched in the codebase
rg -nP --type=rust -C5 '\.dump_url\b' -g '!target/**'
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if reqwest has any scheme restrictions
rg -nP --type=rust -C3 'reqwest.*get|get.*reqwest' crates/settings/src/remote/manifest.rs
Length of output: 264
@findolor, I understand that parse_url validates the URL format when populating the field. However, my concern is specifically about the URL scheme. Currently, parse_url accepts any valid URL scheme including potentially unsafe ones like file://, data://, ftp://, etc.
Looking at the code:
parse_url(line 111) validates that the string is a well-formed URL- But it doesn't restrict the scheme to only
http://orhttps://
When dump_url is eventually fetched (I don't see it being used yet in the codebase), allowing non-HTTP schemes could pose security risks:
file://could expose local filesdata://could bypass security controls- Other schemes might have unexpected behavior
Benefits of validating scheme at parse-time:
- ✅ Early error detection with clear message to manifest authors
- ✅ Security-by-default - prevent misconfigurations before they're deployed
- ✅ Self-documenting - makes it explicit that only HTTP(S) is supported
Are you planning to validate the scheme when dump_url is actually used? If so, where does that validation happen? Or would you prefer to add scheme validation at parse-time as suggested?
| let end_block_hash = | ||
| require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?; | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Validate end-block-hash shape (0x + 32 bytes)
If this is an Ethereum block hash, enforce 0x prefix and 32-byte hex length (66 chars). Prevents bad data landing in state.
let end_block_hash =
require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?;
+ if !(end_block_hash.starts_with("0x") && end_block_hash.len() == 66) {
+ return Err(YamlError::Field {
+ kind: FieldErrorKind::InvalidValue {
+ field: "end-block-hash".to_string(),
+ reason: "expected 0x-prefixed 32-byte hex".to_string(),
+ },
+ location: location_ob.clone(),
+ });
+ }🤖 Prompt for AI Agents
In crates/settings/src/local_db_manifest.rs around lines 119 to 121, the
retrieved end_block_hash string must be validated to ensure it is an
Ethereum-style 0x-prefixed 32-byte hex (66 chars) and reject malformed values;
after calling require_string(...) assign to end_block_hash, then check that
end_block_hash.starts_with("0x") and end_block_hash.len() == 66 and that the hex
payload (strip the "0x") decodes as valid hex bytes (or matches a regex like
r"^0x[0-9a-fA-F]{64}$"); if validation fails, return an appropriate error (with
location_ob context) instead of accepting the value so bad data cannot land in
state.
There was a problem hiding this comment.
updated the type to be Bytes from alloy
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the current state of the ManifestOrderbook struct and end_block_hash parsing
cat crates/settings/src/local_db_manifest.rs | grep -A 10 "pub struct ManifestOrderbook"
echo "---"
cat crates/settings/src/local_db_manifest.rs | grep -A 5 "end_block_hash"
Length of output: 855
🏁 Script executed:
#!/bin/bash
# Check if there's a parse_bytes or similar function in utils
cat crates/settings/src/utils.rs | grep -i "bytes\|parse"
echo "---"
# Check recent commits on this branch
git log --oneline -5
Length of output: 2069
@findolor Great solution! Using Bytes from alloy will automatically handle hex validation.
One refinement to consider: since Ethereum block hashes are always exactly 32 bytes, you could use alloy::primitives::FixedBytes<32> (or the B256 type alias) instead of Bytes. This provides stronger compile-time guarantees:
use alloy::primitives::FixedBytes;
pub struct ManifestOrderbook {
pub address: Address,
pub dump_url: Url,
pub end_block: u64,
pub end_block_hash: FixedBytes<32>, // or B256
pub end_block_time_ms: u64,
}Then parsing becomes:
let end_block_hash_str = require_string(ob_yaml, Some("end-block-hash"), Some(location_ob.clone()))?;
let end_block_hash = FixedBytes::<32>::from_str(&end_block_hash_str).map_err(|e| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: "end-block-hash".to_string(),
reason: e.to_string(),
},
location: location_ob.clone(),
})?;This enforces the 32-byte constraint at the type level and validates the 0x prefix and hex format automatically through FromStr. Either approach works well though! ✨
| use crate::local_db_manifest::{parse_manifest_doc, LocalDbManifest}; | ||
| use crate::yaml::{load_yaml, YamlError}; | ||
| use thiserror::Error; | ||
| use url::Url; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: module docs and tracing.
Add a brief rustdoc and #[tracing::instrument(skip(...))] on fetch for observability.
🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 1 to 4, add a
module-level rustdoc describing that this module handles fetching and parsing
remote manifests, and annotate the fetch function with
#[tracing::instrument(skip(...))] for observability; specifically, import
tracing::instrument if needed and place #[tracing::instrument(skip(client,
credentials, request_body))] (or skip whichever parameters are non‑Debug/contain
secrets such as HTTP client, auth tokens, or request bodies) directly above the
fetch function signature so that calls are traced but sensitive/non‑serializable
arguments are not logged.
There was a problem hiding this comment.
@findolor, understood! I've noted your decision. Feel free to let me know if you need anything else reviewed.
| #[derive(Error, Debug)] | ||
| pub enum FetchManifestError { | ||
| #[error(transparent)] | ||
| ReqwestError(#[from] reqwest::Error), | ||
| #[error(transparent)] | ||
| Yaml(#[from] YamlError), | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Extend error type for domain-specific failures (size cap, scheme).
Add a concrete variant for oversized bodies; optionally add UnsupportedScheme if you later enforce http(s)-only pre-checks. This preserves intent and avoids overloading reqwest::Error.
#[derive(Error, Debug)]
pub enum FetchManifestError {
#[error(transparent)]
ReqwestError(#[from] reqwest::Error),
#[error(transparent)]
Yaml(#[from] YamlError),
+ #[error("manifest body too large ({0} bytes)")]
+ BodyTooLarge(usize),
+ // Optional if you decide to pre-validate:
+ // #[error("unsupported URL scheme: {0}")]
+ // UnsupportedScheme(String),
}🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 6 to 12, the
FetchManifestError enum only wraps reqwest::Error and YamlError which hides
domain-specific failures; add concrete variants such as OversizedBody { size:
usize, limit: usize } (or OversizedBody(usize, usize)) to represent when a
response exceeds the size cap, and optionally UnsupportedScheme(String) if you
later pre-check schemes; update places that currently map oversized/unsupported
cases into reqwest::Error to instead construct these new variants (and implement
From conversions only where appropriate), and update error messages/tests to use
the new variants so domain failures are explicit rather than overloaded into
reqwest::Error.
There was a problem hiding this comment.
@findolor, understood. I've noted your decision.
| pub async fn fetch(url: Url) -> Result<LocalDbManifest, FetchManifestError> { | ||
| let text = reqwest::get(url.to_string()).await?.text().await?; | ||
| let doc = load_yaml(&text)?; | ||
| let manifest = parse_manifest_doc(&doc)?; | ||
| Ok(manifest) | ||
| } |
There was a problem hiding this comment.
Treat non-2xx as errors and avoid needless allocation.
Use error_for_status() so 4xx/5xx don’t parse as YAML, and pass Url directly (IntoUrl). Minimal patch:
- let text = reqwest::get(url.to_string()).await?.text().await?;
+ let text = reqwest::get(url)
+ .await?
+ .error_for_status()?
+ .text()
+ .await?;Stronger hardening (timeouts, pooling, size cap). Consider a shared Client with sane defaults and a max body size to avoid hangs/OOM. As per coding guidelines.
Outside-range additions:
// imports
use std::{sync::OnceLock, time::Duration};
use reqwest::redirect::Policy;
// global client + cap
static CLIENT: OnceLock<reqwest::Client> = OnceLock::new();
const MAX_MANIFEST_BYTES: usize = 512 * 1024; // 512 KiB cap (tune as needed)And replace the body of fetch with:
let client = CLIENT.get_or_init(|| {
reqwest::Client::builder()
.timeout(Duration::from_secs(10))
.redirect(Policy::limited(5))
.user_agent(concat!("rain.orderbook/", env!("CARGO_PKG_VERSION")))
.build()
.expect("reqwest client")
});
let resp = client.get(url.clone()).send().await?.error_for_status()?;
if let Some(len) = resp.content_length() {
if len > MAX_MANIFEST_BYTES as u64 {
return Err(FetchManifestError::BodyTooLarge(len as usize));
}
}
let bytes = resp.bytes().await?;
if bytes.len() > MAX_MANIFEST_BYTES {
return Err(FetchManifestError::BodyTooLarge(bytes.len()));
}
let text = String::from_utf8_lossy(&bytes);
let doc = load_yaml(&text)?;
let manifest = parse_manifest_doc(&doc)?;
Ok(manifest)There was a problem hiding this comment.
@findolor, understood! I'll skip these hardening improvements for now.
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
| use alloy::primitives::address; | ||
| use httpmock::MockServer; | ||
|
|
||
| #[tokio::test] | ||
| async fn test_fetch_manifest_happy_path() { | ||
| let server = MockServer::start_async().await; | ||
| let yaml = r#" | ||
| manifest-version: 1 | ||
| db-schema-version: 1 | ||
| networks: | ||
| mainnet: | ||
| chain-id: 1 | ||
| orderbooks: | ||
| - address: "0x0000000000000000000000000000000000000001" | ||
| dump-url: "http://example.com/dump1" | ||
| end-block: 123 | ||
| end-block-hash: "0xabc" | ||
| end-block-time-ms: 1000 | ||
| "#; | ||
|
|
||
| server | ||
| .mock_async(|when, then| { | ||
| when.method("GET").path("/"); | ||
| then.status(200) | ||
| .header("content-type", "application/x-yaml") | ||
| .body(yaml); | ||
| }) | ||
| .await; | ||
|
|
||
| let manifest = fetch(Url::parse(&server.base_url()).unwrap()) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| assert_eq!(manifest.manifest_version, 1); | ||
| assert_eq!(manifest.db_schema_version, 1); | ||
| let net = manifest.networks.get("mainnet").unwrap(); | ||
| assert_eq!(net.chain_id, 1); | ||
| assert_eq!(net.orderbooks.len(), 1); | ||
| assert_eq!( | ||
| net.orderbooks[0].address, | ||
| address!("0x0000000000000000000000000000000000000001") | ||
| ); | ||
| assert_eq!(net.orderbooks[0].end_block, 123); | ||
| assert_eq!(net.orderbooks[0].end_block_hash, "0xabc"); | ||
| assert_eq!(net.orderbooks[0].end_block_time_ms, 1000); | ||
|
|
||
| // find helper | ||
| let found = manifest.find(1, address!("0x0000000000000000000000000000000000000001")); | ||
| assert!(found.is_some()); | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_fetch_manifest_unknown_fields_ignored() { | ||
| let server = MockServer::start_async().await; | ||
| let yaml = r#" | ||
| manifest-version: 1 | ||
| db-schema-version: 1 | ||
| extra-root: ignored | ||
| networks: | ||
| goerli: | ||
| chain-id: 5 | ||
| extra: ignored | ||
| orderbooks: | ||
| - address: "0x0000000000000000000000000000000000000002" | ||
| dump-url: "http://example.com/dump2" | ||
| end-block: 555 | ||
| end-block-hash: "0xdef" | ||
| end-block-time-ms: 2000 | ||
| extra-ob: ignored | ||
| "#; | ||
|
|
||
| server | ||
| .mock_async(|when, then| { | ||
| when.method("GET").path("/"); | ||
| then.status(200).body(yaml); | ||
| }) | ||
| .await; | ||
|
|
||
| let manifest = fetch(Url::parse(&server.base_url()).unwrap()) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| assert!(manifest.networks.contains_key("goerli")); | ||
| let net = manifest.networks.get("goerli").unwrap(); | ||
| assert_eq!(net.chain_id, 5); | ||
| assert_eq!(net.orderbooks.len(), 1); | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_fetch_manifest_invalid_yaml() { | ||
| let server = MockServer::start_async().await; | ||
| let yaml = "manifest-version: [\n"; // malformed | ||
|
|
||
| server | ||
| .mock_async(|when, then| { | ||
| when.method("GET").path("/"); | ||
| then.status(200).body(yaml); | ||
| }) | ||
| .await; | ||
|
|
||
| let err = fetch(Url::parse(&server.base_url()).unwrap()) | ||
| .await | ||
| .unwrap_err(); | ||
| match err { | ||
| // Some malformed YAML inputs are surfaced as ScanError by the loader | ||
| FetchManifestError::Yaml(YamlError::ScanError(_)) => {} | ||
| // In certain cases, incomplete structures may parse into BadValue and | ||
| // be reported later as a field error; accept that as invalid YAML too. | ||
| FetchManifestError::Yaml(YamlError::Field { .. }) => {} | ||
| _ => panic!("expected YAML scan or field error"), | ||
| } | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_fetch_manifest_invalid_types_and_values() { | ||
| let server = MockServer::start_async().await; | ||
| let yaml = r#" | ||
| manifest-version: 1 | ||
| db-schema-version: 1 | ||
| networks: | ||
| mainnet: | ||
| chain-id: 1 | ||
| orderbooks: | ||
| - address: 123 # invalid type | ||
| dump-url: "not-a-url" | ||
| end-block: 0 | ||
| end-block-hash: 999 # invalid type | ||
| end-block-time-ms: 0 | ||
| "#; | ||
|
|
||
| server | ||
| .mock_async(|when, then| { | ||
| when.method("GET").path("/"); | ||
| then.status(200).body(yaml); | ||
| }) | ||
| .await; | ||
|
|
||
| let err = fetch(Url::parse(&server.base_url()).unwrap()) | ||
| .await | ||
| .unwrap_err(); | ||
| match err { | ||
| FetchManifestError::Yaml(YamlError::Field { .. }) => {} | ||
| _ => panic!("expected field error"), | ||
| } | ||
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_fetch_manifest_http_error_path() { | ||
| // Use an unsupported scheme to deterministically trigger a reqwest error | ||
| let url = Url::parse("ftp://example.com").unwrap(); | ||
| let err = fetch(url).await.unwrap_err(); | ||
| match err { | ||
| FetchManifestError::ReqwestError(_) => {} | ||
| other => panic!("expected reqwest error, got {other:?}"), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Tests: add coverage for status errors and large bodies.
- Add a 404/500 case and assert FetchManifestError::ReqwestError when using error_for_status().
- Add a “too large” response (> MAX_MANIFEST_BYTES) to assert BodyTooLarge.
- If enforcing scheme/http(s)-only, keep ftp test but assert your new error variant instead of reqwest’s.
🤖 Prompt for AI Agents
In crates/settings/src/remote/manifest.rs around lines 21 to 180, the tests
currently cover happy/path, malformed YAML, invalid values and an ftp reqwest
error but lack coverage for HTTP status errors and oversize bodies; add tests
that (1) mock a 404 (and optionally 500) response and assert fetch(...) returns
FetchManifestError::ReqwestError (coming from error_for_status), (2) mock a
response whose body length exceeds MAX_MANIFEST_BYTES and assert fetch(...)
returns the BodyTooLarge error variant, and (3) if the implementation enforces
scheme (http/https) instead of relying on reqwest to error for ftp, update the
existing ftp test to expect your custom scheme error variant rather than
ReqwestError; add these tests to the same tests module using MockServer
similarly to the existing tests.
There was a problem hiding this comment.
@findolor, understood! I've noted your decision to proceed without the additional test coverage.
There was a problem hiding this comment.
@findolor, understood! I'll move forward without those additional tests.
| pub fn parse_positive_u32(value: &str, field: &str, location: String) -> Result<u32, YamlError> { | ||
| let parsed: u32 = value | ||
| .parse() | ||
| .map_err(|e: std::num::ParseIntError| YamlError::Field { | ||
| kind: FieldErrorKind::InvalidValue { | ||
| field: field.to_string(), | ||
| reason: e.to_string(), | ||
| }, | ||
| location: location.clone(), | ||
| })?; | ||
| if parsed == 0 { | ||
| return Err(YamlError::Field { | ||
| kind: FieldErrorKind::InvalidValue { | ||
| field: field.to_string(), | ||
| reason: "must be a positive integer".to_string(), | ||
| }, | ||
| location, | ||
| }); | ||
| } | ||
| Ok(parsed) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Deduplicate the positive integer parsers; avoid cloning location
Minor ergonomics/perf:
- Factor the common logic into a generic helper to remove duplication.
- Accept &str for location and to_string inside, avoiding repeated clones at call sites.
Example:
fn parse_positive<T>(s: &str, field: &str, location: &str) -> Result<T, YamlError>
where
T: std::str::FromStr + PartialEq + Default,
<T as std::str::FromStr>::Err: std::fmt::Display,
{
let parsed: T = s.parse().map_err(|e| YamlError::Field {
kind: FieldErrorKind::InvalidValue { field: field.to_string(), reason: e.to_string() },
location: location.to_string(),
})?;
if parsed == T::default() {
return Err(YamlError::Field {
kind: FieldErrorKind::InvalidValue { field: field.to_string(), reason: "must be a positive integer".to_string() },
location: location.to_string(),
});
}
Ok(parsed)
}Then define parse_positive_u32/parse_positive_u64 as thin wrappers calling parse_positive.
Also applies to: 26-46
🤖 Prompt for AI Agents
crates/settings/src/utils.rs lines 4-24 (and similar code at 26-46): the current
parse_positive_u32 duplicates parsing/validation logic and clones the location;
replace with a single generic helper parse_positive<T>(s: &str, field: &str,
location: &str) -> Result<T, YamlError> where T: FromStr + PartialEq + Default
and <T as FromStr>::Err: Display, that parses s, maps parse errors to
YamlError::Field using location.to_string() once, checks parsed == T::default()
to return the same "must be a positive integer" Field error (using
location.to_string()), and then implement parse_positive_u32 and
parse_positive_u64 as thin wrappers that call parse_positive with &str location
to avoid cloning at call sites; update call sites to pass &str for location.
| pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> { | ||
| Url::parse(value).map_err(|e: UrlParseError| YamlError::Field { | ||
| kind: FieldErrorKind::InvalidValue { | ||
| field: field.to_string(), | ||
| reason: e.to_string(), | ||
| }, | ||
| location, | ||
| }) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add http(s)-only URL parser and use it for network fetches
To avoid unsafe schemes, provide a dedicated validator.
pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
Url::parse(value).map_err(|e: UrlParseError| YamlError::Field {
kind: FieldErrorKind::InvalidValue {
field: field.to_string(),
reason: e.to_string(),
},
location,
})
}
+
+/// Parse URL and restrict scheme to http/https.
+pub fn parse_http_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> {
+ let url = parse_url(value, field, location.clone())?;
+ match url.scheme() {
+ "http" | "https" => Ok(url),
+ _ => Err(YamlError::Field {
+ kind: FieldErrorKind::InvalidValue {
+ field: field.to_string(),
+ reason: "unsupported url scheme; only http/https are allowed".to_string(),
+ },
+ location,
+ }),
+ }
+}Optional: add a unit test asserting file:// is rejected.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> { | |
| Url::parse(value).map_err(|e: UrlParseError| YamlError::Field { | |
| kind: FieldErrorKind::InvalidValue { | |
| field: field.to_string(), | |
| reason: e.to_string(), | |
| }, | |
| location, | |
| }) | |
| } | |
| pub fn parse_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> { | |
| Url::parse(value).map_err(|e: UrlParseError| YamlError::Field { | |
| kind: FieldErrorKind::InvalidValue { | |
| field: field.to_string(), | |
| reason: e.to_string(), | |
| }, | |
| location, | |
| }) | |
| } | |
| /// Parse URL and restrict scheme to http/https. | |
| pub fn parse_http_url(value: &str, field: &str, location: String) -> Result<Url, YamlError> { | |
| let url = parse_url(value, field, location.clone())?; | |
| match url.scheme() { | |
| "http" | "https" => Ok(url), | |
| _ => Err(YamlError::Field { | |
| kind: FieldErrorKind::InvalidValue { | |
| field: field.to_string(), | |
| reason: "unsupported url scheme; only http/https are allowed".to_string(), | |
| }, | |
| location, | |
| }), | |
| } | |
| } |
🤖 Prompt for AI Agents
In crates/settings/src/utils.rs around lines 48 to 56, the current parse_url
accepts any valid URL scheme; add a new http(s)-only validator (e.g.,
parse_http_url) that calls Url::parse(value) and then checks url.scheme() is
"http" or "https", returning the same YamlError::Field InvalidValue if the
scheme is not allowed; update all network-fetch call sites to use the new
parse_http_url instead of parse_url; optionally add a unit test that verifies a
"file://" URL is rejected.
There was a problem hiding this comment.
@findolor, understood! I'll skip this suggestion.
| version == MANIFEST_VERSION | ||
| } | ||
|
|
||
| pub fn parse_manifest_doc(doc: &StrictYaml) -> Result<LocalDbManifest, YamlError> { |
There was a problem hiding this comment.
can we refactor this function? it's four screens long
There was a problem hiding this comment.
btw there is a clippy rule we can enable to flag this automatically
There was a problem hiding this comment.
i think we can enable clippy rule in a separate PR since it might flag some other things in the repo
|
@coderabbitai assess this PR size classification for the totality of the PR with the following criterias and report it in your comment: S/M/L PR Classification Guidelines:This guide helps classify merged pull requests by effort and complexity rather than just line count. The goal is to assess the difficulty and scope of changes after they have been completed. Small (S)Characteristics:
Review Effort: Would have taken 5-10 minutes Examples:
Medium (M)Characteristics:
Review Effort: Would have taken 15-30 minutes Examples:
Large (L)Characteristics:
Review Effort: Would have taken 45+ minutes Examples:
Additional Factors to ConsiderWhen deciding between sizes, also consider:
Notes:
|
|
Tip For best results, initiate chat on the files or code changes. SIZE=M 🧠 Learnings used |
Motivation
See issue:
Solution
utils.rsunder settings crate for common parse logicChecks
By submitting this for review, I'm confirming I've done the following:
Summary by CodeRabbit
New Features
Refactor
Tests