-
-
Notifications
You must be signed in to change notification settings - Fork 157
feat: Support configuring Parseable via TOML config file #1467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: “niladrix719” <niladrix719@gmail.com>
WalkthroughAdds TOML-based configuration loading and a CLI option to specify a TOML file. Introduces a new config loader module that resolves and parses TOML, applies values as environment defaults, can inject storage settings into CLI args, and updates initialization to support multiple storage backends. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application Start
participant Loader as config_loader
participant FS as File System
participant TOML as TOML Parser
participant Env as Environment
participant CLI as CLI Parser
participant Init as Storage Init
App->>Loader: parse_cli_with_config()
Loader->>Loader: locate_config_file(args, env, default)
Loader->>FS: read config file (if found)
FS-->>Loader: file contents
Loader->>TOML: parse TOML -> FileConfig
TOML-->>Loader: FileConfig
Loader->>Loader: merge inline & top-level env maps
Loader->>Env: set env vars for keys not already present
Loader->>Loader: inject storage args (if no storage subcommand)
Loader->>CLI: parse modified args into Cli
CLI-->>Loader: Cli
Loader-->>App: Cli with resolved config
App->>Init: initialize storage (Local / S3 / Blob / Gcs)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/parseable/mod.rs (1)
126-164: Consider extracting common metastore construction logic.The S3, Blob, and GCS branches have identical structure with only the storage type differing. This duplication could be reduced with a helper function or macro.
Consider extracting the common pattern:
fn create_parseable_with_metastore<S: ObjectStorageProvider + 'static>( options: Options, #[cfg(feature = "kafka")] kafka: KafkaConfig, storage: S, ) -> Parseable { let metastore = ObjectStoreMetastore { storage: storage.construct_client(), }; Parseable::new( options, #[cfg(feature = "kafka")] kafka, Arc::new(storage), Arc::new(metastore), ) }Then each branch becomes:
StorageOptions::S3(args) => create_parseable_with_metastore(args.options, args.kafka, args.storage),src/config_loader.rs (3)
27-33: Consider using more specific error kinds.All configuration errors are currently mapped to
ErrorKind::Io, including validation failures (e.g., missing storage config at line 98-101). Using more specific error kinds likeErrorKind::ValueValidationorErrorKind::MissingRequiredArgumentwould provide clearer error messages.Example:
pub fn parse_cli_with_config() -> Cli { let raw_args: Vec<OsString> = std_env::args_os().collect(); let adjusted_args = apply_config(raw_args) .unwrap_or_else(|err| { let kind = if err.to_string().contains("storage") { ErrorKind::MissingRequiredArgument } else { ErrorKind::Io }; Error::raw(kind, err.to_string()).exit() }); Cli::parse_from(adjusted_args) }
90-92: P_CONFIG_FILE assignment needs path conversion.Line 91 passes
&config_path(aPathBuf) directly toset_env_var. This works becausePathBufimplementsAsRef<OsStr>, but the value should be displayed as a string for better user experience in CLI help/telemetry.if std_env::var_os("P_CONFIG_FILE").is_none() { - set_env_var("P_CONFIG_FILE", &config_path); + set_env_var("P_CONFIG_FILE", config_path.display().to_string()); }
221-277: Consider adding more comprehensive test coverage.The existing tests cover basic scenarios well, but additional test cases would improve robustness:
- Config file precedence (CLI
--config-file>P_CONFIG_FILE> defaultparseable.toml)- Error handling (invalid TOML, missing required storage)
- Nested table rejection
- Array value serialization format
- Behavior when
P_CONFIG_FILEis already set- Validation logic for Local storage (staging path conflicts, hot tier restrictions)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Cargo.toml(1 hunks)README.md(1 hunks)src/cli.rs(1 hunks)src/config_loader.rs(1 hunks)src/lib.rs(1 hunks)src/parseable/mod.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-10-28T02:10:41.140Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).
Applied to files:
src/parseable/mod.rssrc/config_loader.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/parseable/mod.rssrc/config_loader.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
Applied to files:
src/parseable/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/parseable/mod.rs
📚 Learning: 2025-10-21T02:22:24.403Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/parseable/mod.rs:419-432
Timestamp: 2025-10-21T02:22:24.403Z
Learning: In Parseable's internal stream creation (`create_internal_stream_if_not_exists` in `src/parseable/mod.rs`), errors should not propagate to fail server initialization. The function creates both pmeta and pbilling internal streams, and failures are logged but the function always returns `Ok(())` to ensure server startup resilience. Individual stream creation failures should not prevent syncing of successfully created streams.
Applied to files:
src/parseable/mod.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
Repo: parseablehq/parseable PR: 1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/parseable/mod.rs
📚 Learning: 2025-08-25T01:33:36.437Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/storage.rs:51-68
Timestamp: 2025-08-25T01:33:36.437Z
Learning: In the Parseable storage system, only one storage provider (localfs, s3, azureblob, or gcs) is meant to be initialized per process, which ensures that global metrics like STORAGE_REQUEST_RESPONSE_TIME and STORAGE_FILES_SCANNED are registered exactly once without risk of duplicate registration panics.
Applied to files:
src/parseable/mod.rs
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).
Applied to files:
src/parseable/mod.rs
🧬 Code graph analysis (2)
src/cli.rs (1)
src/option.rs (1)
file_path(101-113)
src/parseable/mod.rs (1)
src/config_loader.rs (1)
parse_cli_with_config(27-33)
🔇 Additional comments (10)
src/lib.rs (1)
25-25: LGTM!The private module declaration correctly integrates the new configuration loading subsystem into the crate.
README.md (1)
88-106: LGTM!The documentation clearly explains the TOML configuration feature, including auto-discovery, custom paths, and the precedence hierarchy (CLI > env vars > TOML).
src/cli.rs (1)
69-78: LGTM!The
--config-fileoption is well-defined with appropriate validation, environment variable binding, and clear help text. Theglobal = trueflag correctly ensures the option is available across all subcommands.src/parseable/mod.rs (2)
92-93: LGTM! Config-driven initialization enables TOML support.The switch from direct CLI parsing to
config_loader::parse_cli_with_config()correctly integrates TOML configuration support while maintaining backward compatibility.
97-111: Verify validation consistency across storage backends for hot tier incompatibility claim.The validation at lines 106-111 prevents hot tier with local-store, but examination reveals:
- Hot tier includes explicit S3 sync functionality (
hottier.rs:download_from_s3(), "sync the hot tier files from S3...")- Hot tier is architecturally a local cache layer that syncs from remote storage (e.g., S3)
- S3, Blob, and GCS branches lack equivalent validation
This suggests either:
- The local-store constraint requires clarification (why is hot tier incompatible with local storage when it works with S3?)
- Similar validations should apply consistently across backends, OR
- Validations should be added to backends that currently lack them
Confirm the intended behavior and apply validations consistently.
src/config_loader.rs (4)
114-156: LGTM! Config file location logic is well-structured.The precedence order (CLI → env → default) is correctly implemented, and the CLI argument parsing properly handles both space-separated and
=-joined forms of the--config-fileoption.
158-180: LGTM! Environment variable override logic is correct.The function properly validates env var names, respects already-set variables, and handles unsupported value types (nested tables) with appropriate warnings.
194-219: LGTM! Helper predicates are well-implemented.The storage injection logic correctly identifies when to inject storage configuration, and the validation functions appropriately handle edge cases.
189-189: Verify whether downstream code consumes array environment variables and whether TOML format causes parsing failures.The investigation confirms the concern is valid: arrays at line 189 are serialized using TOML's
to_string()format (producing strings like[1, 2, 3]), then passed directly toset_env_var()without reformatting. Environment variables conventionally use simpler formats (CSV or space-separated), and no tests were found demonstrating the intended array handling.However, I found no evidence of downstream code consuming these array environment variables or actual parsing failures. The impact depends on whether your application or dependencies read these array env vars and expect a different format.
Cargo.toml (1)
65-65: Update to toml crate version 0.9.5 or verify version 0.9.8 availability.The latest stable release is toml 0.9.5 (released Aug 4, 2025), but the Cargo.toml specifies version 0.9.8. Verify whether version 0.9.8 exists as a valid release, or update to 0.9.5. No published security advisories were found for the toml crate version 0.9.8.
Signed-off-by: “niladrix719” <niladrix719@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/config_loader.rs (3)
65-110: Consider documenting precedence rules in code comments.The config merging and injection behavior might not be immediately clear:
- Line 90:
env.extend(inline_env)means top-level TOML keys override[env]table entries- Lines 98-107: Storage injection only happens when no storage subcommand is provided via CLI
Adding brief inline comments explaining these precedence rules would help future maintainers.
162-184: Consider structured logging for warnings (optional).The
eprintln!warnings (lines 165-170, 177-181) provide useful feedback, but if Parseable has a structured logging system elsewhere, consider using it here for consistency and to allow filtering/formatting of warnings.
225-281: Expand test coverage to include error and edge cases.Current tests validate the happy path, but consider adding tests for:
- Invalid TOML syntax (parse errors)
- Missing config file with explicit path
- Array values in config (to verify serialization)
- Nested tables (should be ignored with warnings)
- Empty config file
This would improve confidence in error handling and edge case behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/config_loader.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-28T02:10:41.140Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).
Applied to files:
src/config_loader.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/config_loader.rs
🔇 Additional comments (2)
src/config_loader.rs (2)
35-53: LGTM: Race condition properly addressed.The
ENV_LOCKmutex correctly serializes all environment variable mutations, addressing the data race concern from the previous review. The safety comment clearly documents the guarantee.
186-196: Clarify whether TOML arrays should be supported as environment variables.Array values are handled in the code (line 193) but lack tests and documentation, unlike explicitly unsupported tables. The behavior converts arrays to TOML syntax
["a", "b"], which differs from standard env var formats. Either arrays should returnNonelike tables (if unsupported), or the design and format should be documented with examples and tests if intentional.
Fixes #1157
Description
Goal: allow configuring Parseable via a TOML file so users don’t juggle dozens of P_* env vars.
Approach: load a config file before Clap runs, hydrate env defaults, and let CLI/env overrides keep their precedence.
Key changes: new config_loader, --config-file flag, added toml dependency, docs/tests for the workflow
This PR has:
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.