Skip to content

Conversation

@nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Feb 9, 2026

accept header x-p-dataset-tag in dataset creation request
supported values - agent-observability, k8s-observability, database-observability
anything else, server will log warn and set as null

store this field in stream.json (objectstoremetadata) and logstreammetadata (in memory struct)

Summary by CodeRabbit

Release Notes

  • New Features
    • Added dataset tag support for log streams, enabling categorization by observability type (Agent Observability, K8s Observability, or Database Observability).
    • Dataset tags can be specified via HTTP headers when creating or updating streams and are persisted in stream metadata.

accept header x-p-dataset-tag in dataset creation request
supported values - agent-observability, k8s-observability, database-observability
anything else, server will log warn and set as null

store this field in stream.json (objectstoremetadata) and logstreammetadata (in memory struct)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Walkthrough

This PR introduces dataset tagging support across the log stream infrastructure by adding a DatasetTag enum, extending metadata structures to carry the tag, updating function signatures for stream creation and restoration, and propagating the tag through both header parsing and storage serialization flows.

Changes

Cohort / File(s) Summary
DatasetTag Type Definition
src/handlers/mod.rs
New public enum DatasetTag with three variants (AgentObservability, K8sObservability, DatabaseObservability) with serde kebab-case support; TryFrom<&str> and Display trait implementations for parsing and rendering; new public constant DATASET_TAG_KEY.
Stream Metadata and Storage
src/metadata.rs, src/storage/mod.rs, src/parseable/streams.rs
New dataset_tag: Option<DatasetTag> field added to LogStreamMetadata, ObjectStoreFormat, and StreamInfo structs; LogStreamMetadata::new signature extended to accept and assign dataset tag; new public accessor get_dataset_tag() method added to Stream.
Stream Creation and Initialization
src/parseable/mod.rs, src/migration/mod.rs
Function signatures updated to accept dataset_tag: Option<DatasetTag> in create_stream, create_stream_if_not_exists, and create_update_stream; dataset tag propagated through stream creation/restoration from both headers and storage; storage-backed stream restoration now reads and preserves dataset tag.
HTTP Request Processing
src/handlers/http/modal/utils/logstream_utils.rs, src/handlers/http/ingest.rs
New dataset_tag: Option<DatasetTag> field added to PutStreamHeaders; header parsing extended to extract DATASET_TAG_KEY with fallback on invalid values; two ingest call sites updated to pass dataset tag to create_stream_if_not_exists.
Call Site Updates
src/connectors/kafka/processor.rs, src/storage/field_stats.rs
Call sites to create_stream_if_not_exists updated with additional None argument for the new dataset tag parameter.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HTTPHandler as HTTP Handler
    participant StreamCreation as Stream Creation
    participant Storage as Storage Layer
    
    Client->>HTTPHandler: PUT /stream with DATASET_TAG header
    HTTPHandler->>HTTPHandler: Parse PutStreamHeaders<br/>(extract dataset_tag)
    HTTPHandler->>StreamCreation: create_update_stream<br/>(dataset_tag)
    StreamCreation->>StreamCreation: LogStreamMetadata::new<br/>(with dataset_tag)
    StreamCreation->>Storage: Serialize ObjectStoreFormat<br/>(with dataset_tag)
    Storage->>Storage: Persist metadata to storage
    Storage-->>StreamCreation: Success
    StreamCreation-->>HTTPHandler: Stream created
    HTTPHandler-->>Client: 200 OK
    
    Note over Storage: On recovery/restart
    Storage->>StreamCreation: Load ObjectStoreFormat<br/>(with dataset_tag)
    StreamCreation->>StreamCreation: Restore LogStreamMetadata<br/>(with dataset_tag)
    StreamCreation->>StreamCreation: Stream available with<br/>dataset_tag preserved
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • de-sh
  • parmesant

Poem

🐰 A rabbit hops through streams so bright,
With dataset tags now in sight,
From headers parsed with careful care,
To storage saved, metadata's there,
The tagging flows, both left and right! 🏷️

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the key implementation details but lacks the structured template sections, missing explanations of testing, comments/documentation, and rationale for the chosen approach. Follow the repository template by adding 'Description', 'Key changes', testing confirmation, and documentation sections to provide complete context for reviewers.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'allow tags at dataset creation' accurately summarizes the main change—adding support for dataset tags during dataset creation, which is the primary feature introduced across all file modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

@nitisht nitisht merged commit 92b0ed0 into parseablehq:main Feb 9, 2026
12 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the tag-dataset branch February 9, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants