Skip to content

Conversation

@nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Oct 27, 2025

helps for a new querier to fetch the hottier info at server start
and sync the files from object store

Summary by CodeRabbit

  • New Features

    • Hot tier configuration is now stored as a full, restorable configuration and can be initialized automatically at server startup.
    • Streams expose current hot-tier status and configuration for inspection.
  • Improvements

    • Hot tier state is synced more reliably across restarts with backward-compatibility for older metadata.
  • Bug Fixes

    • Disabling or deleting a hot tier now reliably clears persisted hot-tier metadata so UI/state reflect the change.

helps for a new querier to fetch the hottier info at server start
and sync the files from object store
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

Hot-tier handling changed from a boolean flag to an optional StreamHotTier object across metadata, storage, migration, and in-memory stream logic. A startup initializer was added to create missing per-stream hot-tier metadata files; startup flows now invoke this initializer before downloading hot-tier data from S3.

Changes

Cohort / File(s) Summary
HTTP Handler Integration
src/handlers/http/logstream.rs
Updated calls to set_hot_tier to pass Some(hottier.clone()) instead of true; persistence now stores Option<StreamHotTier> in stream metadata.
Startup Initialization
src/handlers/http/modal/mod.rs, src/handlers/http/modal/query_server.rs, src/handlers/http/modal/server.rs
Added initialize_hot_tier_metadata_on_startup(&HotTierManager) and integrated calls into query-server and modal-server startup/init flows; failures are logged and startup proceeds to download_from_s3().
Core Data Model & Serialization
src/hottier.rs, src/storage/mod.rs, src/metadata.rs
StreamHotTier derives Clone, PartialEq, Eq; added pub hot_tier: Option<StreamHotTier> to LogStreamMetadata and hot_tier: Option<StreamHotTier> to ObjectStoreFormat (serde default/skip).
In-Memory Stream & Storage Sync
src/parseable/streams.rs, src/parseable/mod.rs
set_hot_tier signature changed to accept Option<StreamHotTier>; added get_hot_tier() and is_hot_tier_enabled(); storage-to-memory init now propagates hot_tier (with backward-compat handling for previous boolean flag).
Metadata Migration & Persistence
src/migration/mod.rs
Included hot_tier in ObjectStoreFormatLogStreamMetadata mapping so persisted metadata carries the optional hot-tier object.
Metastore Simplification
src/metastore/metastores/object_store_metastore.rs
Minor refactor: delete_overview() condensed to a single-line Ok(self.storage.delete_object(&path).await?).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Startup as Server Startup
    participant Init as initialize_hot_tier_metadata_on_startup
    participant HTM as HotTierManager
    participant Storage as Object Store (metadata)
    participant Mem as In-Memory Streams

    Startup->>Init: call initialize_hot_tier_metadata_on_startup(hot_tier_manager)
    Init->>HTM: list in-memory hot-tier configs
    loop each stream missing local metadata
        Init->>Storage: check metadata file
        alt missing
            Init->>Init: create default hot-tier metadata (used_size=0, available=size, oldest_entry=None)
            Init->>Storage: write metadata file
        end
    end
    Init-->>Startup: return (log per-stream errors)
    Startup->>HTM: call download_from_s3()
    Startup->>Mem: apply hot-tier state to in-memory streams
Loading
sequenceDiagram
    autonumber
    participant Init as parseable initialization
    participant Storage as Stored LogStreamMetadata
    participant Mem as In-Memory Stream

    Init->>Storage: read LogStreamMetadata (includes hot_tier & legacy flag)
    alt hot_tier present
        Init->>Mem: stream.set_hot_tier(Some(hot_tier))
    else hot_tier absent but hot_tier_enabled == true
        Init->>Mem: stream.set_hot_tier(None)  %% enable via legacy flag
    else no hot-tier info
        Init-->>Mem: no hot-tier action
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Focus areas:
    • src/parseable/mod.rs — storage→memory propagation and backward-compat logic.
    • src/handlers/http/modal/mod.rs — async startup initialization, file I/O and per-stream error handling.
    • src/parseable/streams.rs & src/handlers/http/logstream.rs — updated set_hot_tier signature and all call sites.
    • Cross-module serialization (ObjectStoreFormat) and migration mapping (src/migration/mod.rs) for persisted compatibility.

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • praveen5959
  • parmesant

Poem

🐰
I nibble bytes at break of day,
turned flags to objects on my way.
I wrote small files so streams wake warm,
and hopped off happy, tail in form. 🔥

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is largely incomplete compared to the required template. The provided description consists of only a single, incomplete sentence ("helps for a new querier to fetch the hottier info at server start and sync the files from object store") and lacks the structured sections specified in the template: there is no issue reference, no formal "Description" section with goals and rationale, no key changes listed, and critically, none of the required checklist items are addressed. The description is also vague in its phrasing and does not provide sufficient detail about testing, comments, or documentation. Follow the pull request description template by adding: a formal Description section explaining the goal and rationale for the changes, a structured list of key changes across the modified files, and completion of the checklist items confirming that testing has been performed, code comments have been added where appropriate, and documentation has been updated. Consider providing more detail about how the hot tier configuration persistence enables the new querier startup behavior.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "chore: persist hottier config in object storage" is concise and directly related to the main change in the pull request. It accurately captures the core architectural change of storing hot tier configuration in object storage metadata instead of transient state. While the PR also includes startup initialization and syncing functionality, the title appropriately focuses on the primary change—persistence—which enables the other features. The title is clear, specific, and scannable for understanding the changeset's intent.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/handlers/http/logstream.rs (1)

452-495: Missing in-memory state update on hot tier deletion.

The delete handler updates persistent metadata (lines 483-484) but doesn't clear the in-memory stream state. After deletion:

  • ✓ Persistent metadata cleared (hot_tier = None)
  • ✓ Hot tier files deleted
  • ✗ In-memory state unchanged (stream.get_hot_tier() still returns old value)

This causes state inconsistency. Add stream.set_hot_tier(None) before updating persistent metadata.

     hot_tier_manager.delete_hot_tier(&stream_name).await?;

+    let stream = PARSEABLE.get_stream(&stream_name)?;
+    stream.set_hot_tier(None);
+
     let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice(
         &PARSEABLE
🧹 Nitpick comments (4)
src/storage/mod.rs (1)

124-127: Consider deriving hot_tier_enabled from hot_tier presence.

Both hot_tier_enabled: bool and hot_tier: Option<StreamHotTier> coexist in the struct. The enabled state could be derived from hot_tier.is_some(), reducing redundancy. However, retaining both fields may be intentional for backward compatibility with existing metadata.

src/metadata.rs (1)

90-91: Consider deriving enabled state from hot tier presence.

Similar to ObjectStoreFormat, both hot_tier_enabled and hot_tier fields coexist. The boolean could be derived from hot_tier.is_some(). This redundancy may be intentional for backward compatibility.

src/handlers/http/modal/mod.rs (1)

598-645: Well-structured startup initialization function.

The implementation correctly:

  • Collects hot tier configurations without holding locks during async operations (lines 605-621)
  • Checks for existing metadata files before creating new ones (line 611)
  • Sets appropriate default values for new metadata (lines 626-631)
  • Handles errors gracefully by logging and continuing (lines 637-640)

Consider adding info-level logging when successfully initializing hot tier metadata to provide visibility into the startup process:

         if let Err(e) = hot_tier_manager
             .put_hot_tier(&stream_name, &mut hot_tier_metadata)
             .await
         {
             warn!(
                 "Failed to initialize hot tier metadata for stream {}: {}",
                 stream_name, e
             );
+        } else {
+            info!("Initialized hot tier metadata for stream {}", stream_name);
         }
src/parseable/mod.rs (1)

368-383: Consider refactoring to pass hot tier fields to constructor.

Making metadata mutable to set fields after construction works but is less idiomatic. Consider updating LogStreamMetadata::new() to accept hot_tier_enabled and hot_tier parameters, allowing immutable initialization.

Example refactor in src/metadata.rs:

pub fn new(
    created_at: String,
    time_partition: String,
    time_partition_limit: Option<NonZeroU32>,
    custom_partition: Option<String>,
    static_schema_flag: bool,
    static_schema: HashMap<String, Arc<Field>>,
    stream_type: StreamType,
    schema_version: SchemaVersion,
    log_source: Vec<LogSourceEntry>,
    telemetry_type: TelemetryType,
    hot_tier_enabled: bool,
    hot_tier: Option<StreamHotTier>,
) -> Self {
    // ... existing logic ...
}

Then in this file:

-let mut metadata = LogStreamMetadata::new(
+let metadata = LogStreamMetadata::new(
     created_at,
     time_partition,
     time_partition_limit,
     custom_partition,
     static_schema_flag,
     static_schema,
     stream_type,
     schema_version,
     log_source,
     telemetry_type,
+    hot_tier_enabled,
+    hot_tier.clone(),
 );
-
-// Set hot tier fields from the stored metadata
-metadata.hot_tier_enabled = hot_tier_enabled;
-metadata.hot_tier = hot_tier.clone();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2b324 and 727ab5a.

📒 Files selected for processing (11)
  • src/handlers/http/logstream.rs (3 hunks)
  • src/handlers/http/modal/mod.rs (2 hunks)
  • src/handlers/http/modal/query_server.rs (2 hunks)
  • src/handlers/http/modal/server.rs (2 hunks)
  • src/hottier.rs (1 hunks)
  • src/metadata.rs (2 hunks)
  • src/metastore/metastores/object_store_metastore.rs (1 hunks)
  • src/migration/mod.rs (2 hunks)
  • src/parseable/mod.rs (2 hunks)
  • src/parseable/streams.rs (2 hunks)
  • src/storage/mod.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-24T11:54:20.259Z
Learnt from: parmesant
PR: parseablehq/parseable#1449
File: src/metastore/metastores/object_store_metastore.rs:83-98
Timestamp: 2025-10-24T11:54:20.259Z
Learning: In the `get_overviews` method in `src/metastore/metastores/object_store_metastore.rs`, using `.ok()` to convert all storage errors to `None` when fetching overview objects is the intended behavior. This intentionally treats missing files and other errors (network, permissions, etc.) the same way.

Applied to files:

  • src/metastore/metastores/object_store_metastore.rs
🧬 Code graph analysis (5)
src/parseable/mod.rs (2)
src/metadata.rs (1)
  • new (99-136)
src/parseable/streams.rs (1)
  • new (118-135)
src/parseable/streams.rs (2)
src/validator.rs (1)
  • hot_tier (137-148)
src/hottier.rs (1)
  • get_hot_tier (174-189)
src/handlers/http/modal/query_server.rs (1)
src/handlers/http/modal/mod.rs (1)
  • initialize_hot_tier_metadata_on_startup (601-645)
src/handlers/http/modal/server.rs (1)
src/handlers/http/modal/mod.rs (1)
  • initialize_hot_tier_metadata_on_startup (601-645)
src/migration/mod.rs (1)
src/validator.rs (1)
  • hot_tier (137-148)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (15)
src/metastore/metastores/object_store_metastore.rs (1)

113-113: LGTM! Formatting cleanup.

The single-line expression is more concise and equally readable.

src/storage/mod.rs (2)

28-28: LGTM! Import added for new hot tier field.


248-248: LGTM! Proper default initialization.

src/metadata.rs (1)

29-29: LGTM! Import added for new field type.

src/hottier.rs (1)

54-54: LGTM! Necessary trait derives added.

Clone is required for storing hot tier objects (used in logstream.rs:400,421), and PartialEq/Eq enable equality comparisons. All derives are safe for this struct.

src/handlers/http/modal/query_server.rs (2)

25-25: LGTM! Import for startup initialization.


145-149: Verify error handling strategy for hot tier initialization.

The initialization error is logged but doesn't fail the startup. This resilient approach allows the server to start even if hot tier metadata initialization encounters issues. Confirm this is the intended behavior—streams without initialized hot tier metadata may not benefit from hot tier functionality until manually configured.

src/parseable/streams.rs (4)

56-56: LGTM! Import for hot tier type.


925-929: LGTM! Hot tier setter properly synchronizes state.

The method correctly:

  • Accepts Option<StreamHotTier> instead of a boolean
  • Sets both hot_tier and hot_tier_enabled fields
  • Derives the enabled flag from hot_tier.is_some(), maintaining consistency

931-933: LGTM! Hot tier getter implementation.

Returns a clone of the hot tier configuration, which is appropriate for data behind a read lock.


935-937: LGTM! Enabled flag getter.

Provides a simple boolean check for hot tier status without needing to retrieve the full configuration.

src/handlers/http/logstream.rs (1)

377-432: LGTM! Hot tier enabling properly updates both states.

The implementation correctly updates:

  1. In-memory state via stream.set_hot_tier(Some(hottier.clone())) (line 400)
  2. Persistent metadata via stream_metadata.hot_tier = Some(hottier.clone()) (line 421)

Both states remain synchronized.

src/handlers/http/modal/server.rs (2)

28-28: LGTM! Import for startup initialization.


147-151: Verify error handling strategy for hot tier initialization.

Similar to query_server.rs, initialization failures are logged but don't prevent startup. Confirm this resilient approach aligns with operational requirements.

src/migration/mod.rs (1)

358-413: LGTM! Hot tier field correctly propagated through metadata flow.

The changes correctly destructure and pass the hot_tier field from ObjectStoreFormat to LogStreamMetadata, maintaining consistency with the updated metadata model.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/handlers/http/modal/mod.rs (1)

623-642: Well-implemented startup initialization.

The default values are sensible (fresh metadata starts with zero usage and full availability), error handling appropriately allows startup to proceed despite per-stream failures, and the version defaulting is good defensive coding.

If you anticipate many streams with hot tier configurations, consider parallelizing the initialization loop using futures::stream utilities to speed up startup, though the current sequential approach is perfectly acceptable for typical deployments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 727ab5a and 43a4a58.

📒 Files selected for processing (1)
  • src/handlers/http/modal/mod.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (2)
src/handlers/http/modal/mod.rs (2)

40-40: LGTM!

The import additions are necessary for the new startup initialization function.


604-621: Excellent lock management.

The pattern of collecting stream configurations into a Vec while holding the lock, then releasing it before async operations, is exactly the right approach. This prevents blocking other readers during I/O.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/parseable/mod.rs (1)

397-400: Clarify the consistency enforcement logic.

The conditional call to set_hot_tier serves two purposes that aren't clear from the current comment:

  1. Consistency enforcement: When hot_tier is Some, calling set_hot_tier(Some(...)) ensures hot_tier_enabled is set to true, correcting any potential storage inconsistency where hot_tier exists but hot_tier_enabled is false.

  2. Backward compatibility preservation: By only calling set_hot_tier when hot_tier is Some, the code preserves the backward compatibility case where hot_tier_enabled = true but hot_tier = None (old storage format).

Consider expanding the comment to explain this logic:

-        // Set hot tier configuration in memory based on stored metadata
+        // Enforce consistency: when hot_tier config exists, ensure hot_tier_enabled is true.
+        // Skip calling set_hot_tier when hot_tier is None to preserve backward compatibility
+        // case where hot_tier_enabled=true but hot_tier=None (old storage format).
         if let Some(hot_tier_config) = hot_tier {
             stream.set_hot_tier(Some(hot_tier_config));
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43a4a58 and 4de6b85.

📒 Files selected for processing (2)
  • src/parseable/mod.rs (2 hunks)
  • src/parseable/streams.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/parseable/mod.rs (3)
src/validator.rs (2)
  • hot_tier (137-148)
  • stream_name (36-71)
src/parseable/streams.rs (1)
  • new (118-135)
src/metadata.rs (1)
  • new (99-136)
src/parseable/streams.rs (2)
src/validator.rs (1)
  • hot_tier (137-148)
src/hottier.rs (1)
  • get_hot_tier (174-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (1)
src/parseable/streams.rs (1)

925-937: LGTM! Hot tier accessor methods are well-implemented.

The three new methods correctly manage hot tier state:

  • set_hot_tier properly enforces consistency by deriving hot_tier_enabled from hot_tier.is_some()
  • get_hot_tier and is_hot_tier_enabled provide clean read access
  • All methods correctly acquire the appropriate locks

@nitisht nitisht merged commit 0c35ecb into parseablehq:main Oct 28, 2025
13 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the persist-hot-tier-info branch October 28, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants