Skip to content

Conversation

@niladrix719
Copy link

@niladrix719 niladrix719 commented Nov 23, 2025

Fixes #1157

Description

Goal: allow configuring Parseable via a TOML file so users don’t juggle dozens of P_* env vars.

Approach: load a config file before Clap runs, hydrate env defaults, and let CLI/env overrides keep their precedence.

Key changes: new config_loader, --config-file flag, added toml dependency, docs/tests for the workflow


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features

    • Added TOML configuration support for environment defaults (defaults to parseable.toml)
    • New CLI option to specify a custom config file
    • Config values map to environment variables with CLI flags taking precedence
    • Storage backend selection can be provided via TOML (multiple backends supported)
  • Documentation

    • Added "Configure with TOML" guide and examples to the README

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: “niladrix719” <niladrix719@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 23, 2025

Walkthrough

Adds TOML-based configuration loading and a CLI option to specify a TOML file. Introduces a new config loader module that resolves and parses TOML, applies values as environment defaults, can inject storage settings into CLI args, and updates initialization to support multiple storage backends.

Changes

Cohort / File(s) Summary
Dependency addition
Cargo.toml
Added toml = "0.9.8" dependency for TOML parsing.
Documentation
README.md
Added "Configure with TOML" section describing automatic parseable.toml loading, --config-file / P_CONFIG_FILE, config-to-env mapping, and precedence (CLI flags > env > TOML).
CLI augmentation
src/cli.rs
Added pub config_file: Option<PathBuf> to Cli with --config-file/-c, P_CONFIG_FILE env support, and file_path validation.
Configuration loading module
src/config_loader.rs
New module providing pub fn parse_cli_with_config() -> Cli, config discovery (locate_config_file), TOML parsing/merging (apply_config), env application (skip already-set vars), storage injection into args, concurrency guard for env mutations, error propagation, and tests.
Module integration
src/lib.rs
Added mod config_loader; to include the new module in the crate.
Initialization refactor
src/parseable/mod.rs
Replaced direct Cli::parse() use with parse_cli_with_config(); expanded storage initialization to handle multiple backends (Local, S3, Blob, Gcs) and adjusted imports and error handling accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application Start
    participant Loader as config_loader
    participant FS as File System
    participant TOML as TOML Parser
    participant Env as Environment
    participant CLI as CLI Parser
    participant Init as Storage Init

    App->>Loader: parse_cli_with_config()
    Loader->>Loader: locate_config_file(args, env, default)
    Loader->>FS: read config file (if found)
    FS-->>Loader: file contents
    Loader->>TOML: parse TOML -> FileConfig
    TOML-->>Loader: FileConfig
    Loader->>Loader: merge inline & top-level env maps
    Loader->>Env: set env vars for keys not already present
    Loader->>Loader: inject storage args (if no storage subcommand)
    Loader->>CLI: parse modified args into Cli
    CLI-->>Loader: Cli
    Loader-->>App: Cli with resolved config
    App->>Init: initialize storage (Local / S3 / Blob / Gcs)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review config discovery and precedence (CLI arg > existing env > TOML).
  • Verify env application safety (ENV_LOCK) and correct serialization of TOML values.
  • Inspect storage injection logic and associated error paths in src/parseable/mod.rs.
  • Check tests in src/config_loader.rs for concurrency and environment isolation.

Poem

In burrows of code I hop and compile,
A TOML tucked in makes config worthwhile.
I set defaults gently, never overwrite,
CLI leads the chorus, env keeps the light.
Hooray — storage chosen, the rabbit's delight! 🐰

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main feature: adding TOML config file support for Parseable configuration, directly matching the changeset's primary objective.
Description check ✅ Passed The description covers the goal, approach, and key changes; includes a reference to issue #1157; and confirms all three checklist items are completed.
Linked Issues check ✅ Passed The PR successfully implements the objective from issue #1157 by adding TOML config file reading, environment variable hydration, and maintaining CLI/env precedence as required.
Out of Scope Changes check ✅ Passed All changes directly support TOML configuration: new config_loader module, --config-file flag, toml dependency, and updates to initialization logic are all in-scope for issue #1157.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 23, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@niladrix719
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

nitisht added a commit to parseablehq/.github that referenced this pull request Nov 23, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/parseable/mod.rs (1)

126-164: Consider extracting common metastore construction logic.

The S3, Blob, and GCS branches have identical structure with only the storage type differing. This duplication could be reduced with a helper function or macro.

Consider extracting the common pattern:

fn create_parseable_with_metastore<S: ObjectStorageProvider + 'static>(
    options: Options,
    #[cfg(feature = "kafka")] kafka: KafkaConfig,
    storage: S,
) -> Parseable {
    let metastore = ObjectStoreMetastore {
        storage: storage.construct_client(),
    };
    Parseable::new(
        options,
        #[cfg(feature = "kafka")] kafka,
        Arc::new(storage),
        Arc::new(metastore),
    )
}

Then each branch becomes:

StorageOptions::S3(args) => create_parseable_with_metastore(args.options, args.kafka, args.storage),
src/config_loader.rs (3)

27-33: Consider using more specific error kinds.

All configuration errors are currently mapped to ErrorKind::Io, including validation failures (e.g., missing storage config at line 98-101). Using more specific error kinds like ErrorKind::ValueValidation or ErrorKind::MissingRequiredArgument would provide clearer error messages.

Example:

pub fn parse_cli_with_config() -> Cli {
    let raw_args: Vec<OsString> = std_env::args_os().collect();
    let adjusted_args = apply_config(raw_args)
        .unwrap_or_else(|err| {
            let kind = if err.to_string().contains("storage") {
                ErrorKind::MissingRequiredArgument
            } else {
                ErrorKind::Io
            };
            Error::raw(kind, err.to_string()).exit()
        });

    Cli::parse_from(adjusted_args)
}

90-92: P_CONFIG_FILE assignment needs path conversion.

Line 91 passes &config_path (a PathBuf) directly to set_env_var. This works because PathBuf implements AsRef<OsStr>, but the value should be displayed as a string for better user experience in CLI help/telemetry.

     if std_env::var_os("P_CONFIG_FILE").is_none() {
-        set_env_var("P_CONFIG_FILE", &config_path);
+        set_env_var("P_CONFIG_FILE", config_path.display().to_string());
     }

221-277: Consider adding more comprehensive test coverage.

The existing tests cover basic scenarios well, but additional test cases would improve robustness:

  • Config file precedence (CLI --config-file > P_CONFIG_FILE > default parseable.toml)
  • Error handling (invalid TOML, missing required storage)
  • Nested table rejection
  • Array value serialization format
  • Behavior when P_CONFIG_FILE is already set
  • Validation logic for Local storage (staging path conflicts, hot tier restrictions)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7adda55 and c317271.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • Cargo.toml (1 hunks)
  • README.md (1 hunks)
  • src/cli.rs (1 hunks)
  • src/config_loader.rs (1 hunks)
  • src/lib.rs (1 hunks)
  • src/parseable/mod.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-10-28T02:10:41.140Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).

Applied to files:

  • src/parseable/mod.rs
  • src/config_loader.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

  • src/parseable/mod.rs
  • src/config_loader.rs
📚 Learning: 2025-07-28T17:10:39.448Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

Applied to files:

  • src/parseable/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.

Applied to files:

  • src/parseable/mod.rs
📚 Learning: 2025-10-21T02:22:24.403Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/parseable/mod.rs:419-432
Timestamp: 2025-10-21T02:22:24.403Z
Learning: In Parseable's internal stream creation (`create_internal_stream_if_not_exists` in `src/parseable/mod.rs`), errors should not propagate to fail server initialization. The function creates both pmeta and pbilling internal streams, and failures are logged but the function always returns `Ok(())` to ensure server startup resilience. Individual stream creation failures should not prevent syncing of successfully created streams.

Applied to files:

  • src/parseable/mod.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
Repo: parseablehq/parseable PR: 1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.

Applied to files:

  • src/parseable/mod.rs
📚 Learning: 2025-08-25T01:33:36.437Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/storage.rs:51-68
Timestamp: 2025-08-25T01:33:36.437Z
Learning: In the Parseable storage system, only one storage provider (localfs, s3, azureblob, or gcs) is meant to be initialized per process, which ensures that global metrics like STORAGE_REQUEST_RESPONSE_TIME and STORAGE_FILES_SCANNED are registered exactly once without risk of duplicate registration panics.

Applied to files:

  • src/parseable/mod.rs
📚 Learning: 2025-09-06T04:26:17.191Z
Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).

Applied to files:

  • src/parseable/mod.rs
🧬 Code graph analysis (2)
src/cli.rs (1)
src/option.rs (1)
  • file_path (101-113)
src/parseable/mod.rs (1)
src/config_loader.rs (1)
  • parse_cli_with_config (27-33)
🔇 Additional comments (10)
src/lib.rs (1)

25-25: LGTM!

The private module declaration correctly integrates the new configuration loading subsystem into the crate.

README.md (1)

88-106: LGTM!

The documentation clearly explains the TOML configuration feature, including auto-discovery, custom paths, and the precedence hierarchy (CLI > env vars > TOML).

src/cli.rs (1)

69-78: LGTM!

The --config-file option is well-defined with appropriate validation, environment variable binding, and clear help text. The global = true flag correctly ensures the option is available across all subcommands.

src/parseable/mod.rs (2)

92-93: LGTM! Config-driven initialization enables TOML support.

The switch from direct CLI parsing to config_loader::parse_cli_with_config() correctly integrates TOML configuration support while maintaining backward compatibility.


97-111: Verify validation consistency across storage backends for hot tier incompatibility claim.

The validation at lines 106-111 prevents hot tier with local-store, but examination reveals:

  1. Hot tier includes explicit S3 sync functionality (hottier.rs: download_from_s3(), "sync the hot tier files from S3...")
  2. Hot tier is architecturally a local cache layer that syncs from remote storage (e.g., S3)
  3. S3, Blob, and GCS branches lack equivalent validation

This suggests either:

  • The local-store constraint requires clarification (why is hot tier incompatible with local storage when it works with S3?)
  • Similar validations should apply consistently across backends, OR
  • Validations should be added to backends that currently lack them

Confirm the intended behavior and apply validations consistently.

src/config_loader.rs (4)

114-156: LGTM! Config file location logic is well-structured.

The precedence order (CLI → env → default) is correctly implemented, and the CLI argument parsing properly handles both space-separated and =-joined forms of the --config-file option.


158-180: LGTM! Environment variable override logic is correct.

The function properly validates env var names, respects already-set variables, and handles unsupported value types (nested tables) with appropriate warnings.


194-219: LGTM! Helper predicates are well-implemented.

The storage injection logic correctly identifies when to inject storage configuration, and the validation functions appropriately handle edge cases.


189-189: Verify whether downstream code consumes array environment variables and whether TOML format causes parsing failures.

The investigation confirms the concern is valid: arrays at line 189 are serialized using TOML's to_string() format (producing strings like [1, 2, 3]), then passed directly to set_env_var() without reformatting. Environment variables conventionally use simpler formats (CSV or space-separated), and no tests were found demonstrating the intended array handling.

However, I found no evidence of downstream code consuming these array environment variables or actual parsing failures. The impact depends on whether your application or dependencies read these array env vars and expect a different format.

Cargo.toml (1)

65-65: Update to toml crate version 0.9.5 or verify version 0.9.8 availability.

The latest stable release is toml 0.9.5 (released Aug 4, 2025), but the Cargo.toml specifies version 0.9.8. Verify whether version 0.9.8 exists as a valid release, or update to 0.9.5. No published security advisories were found for the toml crate version 0.9.8.

Signed-off-by: “niladrix719” <niladrix719@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/config_loader.rs (3)

65-110: Consider documenting precedence rules in code comments.

The config merging and injection behavior might not be immediately clear:

  • Line 90: env.extend(inline_env) means top-level TOML keys override [env] table entries
  • Lines 98-107: Storage injection only happens when no storage subcommand is provided via CLI

Adding brief inline comments explaining these precedence rules would help future maintainers.


162-184: Consider structured logging for warnings (optional).

The eprintln! warnings (lines 165-170, 177-181) provide useful feedback, but if Parseable has a structured logging system elsewhere, consider using it here for consistency and to allow filtering/formatting of warnings.


225-281: Expand test coverage to include error and edge cases.

Current tests validate the happy path, but consider adding tests for:

  • Invalid TOML syntax (parse errors)
  • Missing config file with explicit path
  • Array values in config (to verify serialization)
  • Nested tables (should be ignored with warnings)
  • Empty config file

This would improve confidence in error handling and edge case behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c317271 and b09645f.

📒 Files selected for processing (1)
  • src/config_loader.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-28T02:10:41.140Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).

Applied to files:

  • src/config_loader.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

  • src/config_loader.rs
🔇 Additional comments (2)
src/config_loader.rs (2)

35-53: LGTM: Race condition properly addressed.

The ENV_LOCK mutex correctly serializes all environment variable mutations, addressing the data race concern from the previous review. The safety comment clearly documents the guarantee.


186-196: Clarify whether TOML arrays should be supported as environment variables.

Array values are handled in the code (line 193) but lack tests and documentation, unlike explicitly unsupported tables. The behavior converts arrays to TOML syntax ["a", "b"], which differs from standard env var formats. Either arrays should return None like tables (if unsupported), or the design and format should be documented with examples and tests if intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow configuring parseable instance with a toml file

1 participant