-
-
Couldn't load subscription status.
- Fork 153
chore: persist hottier config in object storage #1453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: persist hottier config in object storage #1453
Conversation
helps for a new querier to fetch the hottier info at server start and sync the files from object store
WalkthroughHot-tier handling changed from a boolean flag to an optional Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Startup as Server Startup
participant Init as initialize_hot_tier_metadata_on_startup
participant HTM as HotTierManager
participant Storage as Object Store (metadata)
participant Mem as In-Memory Streams
Startup->>Init: call initialize_hot_tier_metadata_on_startup(hot_tier_manager)
Init->>HTM: list in-memory hot-tier configs
loop each stream missing local metadata
Init->>Storage: check metadata file
alt missing
Init->>Init: create default hot-tier metadata (used_size=0, available=size, oldest_entry=None)
Init->>Storage: write metadata file
end
end
Init-->>Startup: return (log per-stream errors)
Startup->>HTM: call download_from_s3()
Startup->>Mem: apply hot-tier state to in-memory streams
sequenceDiagram
autonumber
participant Init as parseable initialization
participant Storage as Stored LogStreamMetadata
participant Mem as In-Memory Stream
Init->>Storage: read LogStreamMetadata (includes hot_tier & legacy flag)
alt hot_tier present
Init->>Mem: stream.set_hot_tier(Some(hot_tier))
else hot_tier absent but hot_tier_enabled == true
Init->>Mem: stream.set_hot_tier(None) %% enable via legacy flag
else no hot-tier info
Init-->>Mem: no hot-tier action
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/logstream.rs (1)
452-495: Missing in-memory state update on hot tier deletion.The delete handler updates persistent metadata (lines 483-484) but doesn't clear the in-memory stream state. After deletion:
- ✓ Persistent metadata cleared (
hot_tier = None)- ✓ Hot tier files deleted
- ✗ In-memory state unchanged (
stream.get_hot_tier()still returns old value)This causes state inconsistency. Add
stream.set_hot_tier(None)before updating persistent metadata.hot_tier_manager.delete_hot_tier(&stream_name).await?; + let stream = PARSEABLE.get_stream(&stream_name)?; + stream.set_hot_tier(None); + let mut stream_metadata: ObjectStoreFormat = serde_json::from_slice( &PARSEABLE
🧹 Nitpick comments (4)
src/storage/mod.rs (1)
124-127: Consider derivinghot_tier_enabledfromhot_tierpresence.Both
hot_tier_enabled: boolandhot_tier: Option<StreamHotTier>coexist in the struct. The enabled state could be derived fromhot_tier.is_some(), reducing redundancy. However, retaining both fields may be intentional for backward compatibility with existing metadata.src/metadata.rs (1)
90-91: Consider deriving enabled state from hot tier presence.Similar to
ObjectStoreFormat, bothhot_tier_enabledandhot_tierfields coexist. The boolean could be derived fromhot_tier.is_some(). This redundancy may be intentional for backward compatibility.src/handlers/http/modal/mod.rs (1)
598-645: Well-structured startup initialization function.The implementation correctly:
- Collects hot tier configurations without holding locks during async operations (lines 605-621)
- Checks for existing metadata files before creating new ones (line 611)
- Sets appropriate default values for new metadata (lines 626-631)
- Handles errors gracefully by logging and continuing (lines 637-640)
Consider adding info-level logging when successfully initializing hot tier metadata to provide visibility into the startup process:
if let Err(e) = hot_tier_manager .put_hot_tier(&stream_name, &mut hot_tier_metadata) .await { warn!( "Failed to initialize hot tier metadata for stream {}: {}", stream_name, e ); + } else { + info!("Initialized hot tier metadata for stream {}", stream_name); }src/parseable/mod.rs (1)
368-383: Consider refactoring to pass hot tier fields to constructor.Making
metadatamutable to set fields after construction works but is less idiomatic. Consider updatingLogStreamMetadata::new()to accepthot_tier_enabledandhot_tierparameters, allowing immutable initialization.Example refactor in
src/metadata.rs:pub fn new( created_at: String, time_partition: String, time_partition_limit: Option<NonZeroU32>, custom_partition: Option<String>, static_schema_flag: bool, static_schema: HashMap<String, Arc<Field>>, stream_type: StreamType, schema_version: SchemaVersion, log_source: Vec<LogSourceEntry>, telemetry_type: TelemetryType, hot_tier_enabled: bool, hot_tier: Option<StreamHotTier>, ) -> Self { // ... existing logic ... }Then in this file:
-let mut metadata = LogStreamMetadata::new( +let metadata = LogStreamMetadata::new( created_at, time_partition, time_partition_limit, custom_partition, static_schema_flag, static_schema, stream_type, schema_version, log_source, telemetry_type, + hot_tier_enabled, + hot_tier.clone(), ); - -// Set hot tier fields from the stored metadata -metadata.hot_tier_enabled = hot_tier_enabled; -metadata.hot_tier = hot_tier.clone();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/handlers/http/logstream.rs(3 hunks)src/handlers/http/modal/mod.rs(2 hunks)src/handlers/http/modal/query_server.rs(2 hunks)src/handlers/http/modal/server.rs(2 hunks)src/hottier.rs(1 hunks)src/metadata.rs(2 hunks)src/metastore/metastores/object_store_metastore.rs(1 hunks)src/migration/mod.rs(2 hunks)src/parseable/mod.rs(2 hunks)src/parseable/streams.rs(2 hunks)src/storage/mod.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-24T11:54:20.259Z
Learnt from: parmesant
PR: parseablehq/parseable#1449
File: src/metastore/metastores/object_store_metastore.rs:83-98
Timestamp: 2025-10-24T11:54:20.259Z
Learning: In the `get_overviews` method in `src/metastore/metastores/object_store_metastore.rs`, using `.ok()` to convert all storage errors to `None` when fetching overview objects is the intended behavior. This intentionally treats missing files and other errors (network, permissions, etc.) the same way.
Applied to files:
src/metastore/metastores/object_store_metastore.rs
🧬 Code graph analysis (5)
src/parseable/mod.rs (2)
src/metadata.rs (1)
new(99-136)src/parseable/streams.rs (1)
new(118-135)
src/parseable/streams.rs (2)
src/validator.rs (1)
hot_tier(137-148)src/hottier.rs (1)
get_hot_tier(174-189)
src/handlers/http/modal/query_server.rs (1)
src/handlers/http/modal/mod.rs (1)
initialize_hot_tier_metadata_on_startup(601-645)
src/handlers/http/modal/server.rs (1)
src/handlers/http/modal/mod.rs (1)
initialize_hot_tier_metadata_on_startup(601-645)
src/migration/mod.rs (1)
src/validator.rs (1)
hot_tier(137-148)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (15)
src/metastore/metastores/object_store_metastore.rs (1)
113-113: LGTM! Formatting cleanup.The single-line expression is more concise and equally readable.
src/storage/mod.rs (2)
28-28: LGTM! Import added for new hot tier field.
248-248: LGTM! Proper default initialization.src/metadata.rs (1)
29-29: LGTM! Import added for new field type.src/hottier.rs (1)
54-54: LGTM! Necessary trait derives added.
Cloneis required for storing hot tier objects (used inlogstream.rs:400,421), andPartialEq/Eqenable equality comparisons. All derives are safe for this struct.src/handlers/http/modal/query_server.rs (2)
25-25: LGTM! Import for startup initialization.
145-149: Verify error handling strategy for hot tier initialization.The initialization error is logged but doesn't fail the startup. This resilient approach allows the server to start even if hot tier metadata initialization encounters issues. Confirm this is the intended behavior—streams without initialized hot tier metadata may not benefit from hot tier functionality until manually configured.
src/parseable/streams.rs (4)
56-56: LGTM! Import for hot tier type.
925-929: LGTM! Hot tier setter properly synchronizes state.The method correctly:
- Accepts
Option<StreamHotTier>instead of a boolean- Sets both
hot_tierandhot_tier_enabledfields- Derives the enabled flag from
hot_tier.is_some(), maintaining consistency
931-933: LGTM! Hot tier getter implementation.Returns a clone of the hot tier configuration, which is appropriate for data behind a read lock.
935-937: LGTM! Enabled flag getter.Provides a simple boolean check for hot tier status without needing to retrieve the full configuration.
src/handlers/http/logstream.rs (1)
377-432: LGTM! Hot tier enabling properly updates both states.The implementation correctly updates:
- In-memory state via
stream.set_hot_tier(Some(hottier.clone()))(line 400)- Persistent metadata via
stream_metadata.hot_tier = Some(hottier.clone())(line 421)Both states remain synchronized.
src/handlers/http/modal/server.rs (2)
28-28: LGTM! Import for startup initialization.
147-151: Verify error handling strategy for hot tier initialization.Similar to
query_server.rs, initialization failures are logged but don't prevent startup. Confirm this resilient approach aligns with operational requirements.src/migration/mod.rs (1)
358-413: LGTM! Hot tier field correctly propagated through metadata flow.The changes correctly destructure and pass the
hot_tierfield fromObjectStoreFormattoLogStreamMetadata, maintaining consistency with the updated metadata model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/handlers/http/modal/mod.rs (1)
623-642: Well-implemented startup initialization.The default values are sensible (fresh metadata starts with zero usage and full availability), error handling appropriately allows startup to proceed despite per-stream failures, and the version defaulting is good defensive coding.
If you anticipate many streams with hot tier configurations, consider parallelizing the initialization loop using
futures::streamutilities to speed up startup, though the current sequential approach is perfectly acceptable for typical deployments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/handlers/http/modal/mod.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (2)
src/handlers/http/modal/mod.rs (2)
40-40: LGTM!The import additions are necessary for the new startup initialization function.
604-621: Excellent lock management.The pattern of collecting stream configurations into a Vec while holding the lock, then releasing it before async operations, is exactly the right approach. This prevents blocking other readers during I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/parseable/mod.rs (1)
397-400: Clarify the consistency enforcement logic.The conditional call to
set_hot_tierserves two purposes that aren't clear from the current comment:
Consistency enforcement: When
hot_tierisSome, callingset_hot_tier(Some(...))ensureshot_tier_enabledis set totrue, correcting any potential storage inconsistency wherehot_tierexists buthot_tier_enabledisfalse.Backward compatibility preservation: By only calling
set_hot_tierwhenhot_tierisSome, the code preserves the backward compatibility case wherehot_tier_enabled = truebuthot_tier = None(old storage format).Consider expanding the comment to explain this logic:
- // Set hot tier configuration in memory based on stored metadata + // Enforce consistency: when hot_tier config exists, ensure hot_tier_enabled is true. + // Skip calling set_hot_tier when hot_tier is None to preserve backward compatibility + // case where hot_tier_enabled=true but hot_tier=None (old storage format). if let Some(hot_tier_config) = hot_tier { stream.set_hot_tier(Some(hot_tier_config)); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/parseable/mod.rs(2 hunks)src/parseable/streams.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/parseable/mod.rs (3)
src/validator.rs (2)
hot_tier(137-148)stream_name(36-71)src/parseable/streams.rs (1)
new(118-135)src/metadata.rs (1)
new(99-136)
src/parseable/streams.rs (2)
src/validator.rs (1)
hot_tier(137-148)src/hottier.rs (1)
get_hot_tier(174-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (1)
src/parseable/streams.rs (1)
925-937: LGTM! Hot tier accessor methods are well-implemented.The three new methods correctly manage hot tier state:
set_hot_tierproperly enforces consistency by derivinghot_tier_enabledfromhot_tier.is_some()get_hot_tierandis_hot_tier_enabledprovide clean read access- All methods correctly acquire the appropriate locks
helps for a new querier to fetch the hottier info at server start
and sync the files from object store
Summary by CodeRabbit
New Features
Improvements
Bug Fixes