diff --git a/docs/environment-variables.md b/docs/environment-variables.md index 9ba6481b..fe605386 100644 --- a/docs/environment-variables.md +++ b/docs/environment-variables.md @@ -333,6 +333,8 @@ OpenObserve is configured using the following environment variables. | ZO_NATS_DELIVER_POLICY | all | Starting point in the stream for message delivery. Allowed values are all, last, new. | | ZO_NATS_SUB_CAPACITY | 65535 | Maximum subscription capacity. | | ZO_NATS_QUEUE_MAX_SIZE | 2048 | Maximum queue size in megabytes. | +| ZO_NATS_KV_WATCH_MODULES | 2048 | Defines which internal modules use the NATS Key-Value Watcher instead of the default NATS Queue for event synchronization. Add one or more module prefixes separated by commas, such as /nodes/ or /user_sessions/. When left empty, all modules use the default NATS Queue mechanism. | +| ZO_NATS_EVENT_STORAGE | memory | Controls how NATS JetStream stores event data. Use memory for high-speed, in-memory event storage or file for durable, disk-based storage that persists across restarts.
Performance Benchmark Results:
• File Storage: 10,965 ops/sec (10.71 MB/s throughput, ~911 µs mean latency)
• Memory Storage: 16,957 ops/sec (16.56 MB/s throughput, ~589 µs mean latency)
Memory storage offers ~55 percent higher throughput and lower latency, while file storage ensures durability. | ## S3 and Object Storage @@ -573,7 +575,7 @@ OpenObserve is configured using the following environment variables. | ZO_QUICK_MODE_ENABLED | false | Indicates if quick mode is enabled. | | ZO_QUICK_MODE_NUM_FIELDS | 500 | The number of fields to consider for quick mode. | | ZO_QUICK_MODE_STRATEGY | | Possible values are `first`, `last`, `both`. | - +| ZO_QUICK_MODE_FORCE_ENABLED | true | | ## Miscellaneous | Environment Variable | Default Value | Description | diff --git a/docs/images/add-regex-pattern.png b/docs/images/add-regex-pattern.png index 48ed42ff..84bf226d 100644 Binary files a/docs/images/add-regex-pattern.png and b/docs/images/add-regex-pattern.png differ diff --git a/docs/images/apply-multiple-reg-pattern.png b/docs/images/apply-multiple-reg-pattern.png new file mode 100644 index 00000000..bfd98e20 Binary files /dev/null and b/docs/images/apply-multiple-reg-pattern.png differ diff --git a/docs/images/config-hash-pattern-ingestion-time.png b/docs/images/config-hash-pattern-ingestion-time.png new file mode 100644 index 00000000..f8e27518 Binary files /dev/null and b/docs/images/config-hash-pattern-ingestion-time.png differ diff --git a/docs/images/config-hash-pattern-query-time.png b/docs/images/config-hash-pattern-query-time.png new file mode 100644 index 00000000..01c027ac Binary files /dev/null and b/docs/images/config-hash-pattern-query-time.png differ diff --git a/docs/images/drop-at-ingestion-time-test-config.png b/docs/images/drop-at-ingestion-time-test-config.png index 4eab6e3c..5d4e9a44 100644 Binary files a/docs/images/drop-at-ingestion-time-test-config.png and b/docs/images/drop-at-ingestion-time-test-config.png differ diff --git a/docs/images/drop-at-query-time-test-config.png b/docs/images/drop-at-query-time-test-config.png index 7793bdb9..e842e249 100644 Binary files a/docs/images/drop-at-query-time-test-config.png and b/docs/images/drop-at-query-time-test-config.png differ diff --git a/docs/images/example-1-query-recommendations.png b/docs/images/example-1-query-recommendations.png new file mode 100644 index 00000000..26a945ed Binary files /dev/null and b/docs/images/example-1-query-recommendations.png differ diff --git a/docs/images/example-2-query-recommendations.png b/docs/images/example-2-query-recommendations.png new file mode 100644 index 00000000..c9c7e707 Binary files /dev/null and b/docs/images/example-2-query-recommendations.png differ diff --git a/docs/images/extended-retention.png b/docs/images/extended-retention.png index 2dc8084e..ed7591b5 100644 Binary files a/docs/images/extended-retention.png and b/docs/images/extended-retention.png differ diff --git a/docs/images/hashed-at-ingestion-time.png b/docs/images/hashed-at-ingestion-time.png new file mode 100644 index 00000000..d98996bc Binary files /dev/null and b/docs/images/hashed-at-ingestion-time.png differ diff --git a/docs/images/hashed-at-query-time.png b/docs/images/hashed-at-query-time.png new file mode 100644 index 00000000..02f06325 Binary files /dev/null and b/docs/images/hashed-at-query-time.png differ diff --git a/docs/images/match-all-hash.png b/docs/images/match-all-hash.png new file mode 100644 index 00000000..fe80a35d Binary files /dev/null and b/docs/images/match-all-hash.png differ diff --git a/docs/images/organization-in-openobserve.png b/docs/images/organization-in-openobserve.png index 021e4151..cf54150a 100644 Binary files a/docs/images/organization-in-openobserve.png and b/docs/images/organization-in-openobserve.png differ diff --git a/docs/images/organization-role-permission.png b/docs/images/organization-role-permission.png index 925a840f..7885414a 100644 Binary files a/docs/images/organization-role-permission.png and b/docs/images/organization-role-permission.png differ diff --git a/docs/images/redact-at-ingestion-time-test-config.png b/docs/images/redact-at-ingestion-time-test-config.png index 23c0618d..671a5ab5 100644 Binary files a/docs/images/redact-at-ingestion-time-test-config.png and b/docs/images/redact-at-ingestion-time-test-config.png differ diff --git a/docs/images/redact-at-query-test-config.png b/docs/images/redact-at-query-test-config.png index 05bd3d07..59aa66ac 100644 Binary files a/docs/images/redact-at-query-test-config.png and b/docs/images/redact-at-query-test-config.png differ diff --git a/docs/images/redact-or-drop-during-regex-pattern-execution.png b/docs/images/redact-or-drop-during-regex-pattern-execution.png index 739200ff..930c86ea 100644 Binary files a/docs/images/redact-or-drop-during-regex-pattern-execution.png and b/docs/images/redact-or-drop-during-regex-pattern-execution.png differ diff --git a/docs/images/regex-pattern-execution-time.png b/docs/images/regex-pattern-execution-time.png index 3b3846bd..1db00e56 100644 Binary files a/docs/images/regex-pattern-execution-time.png and b/docs/images/regex-pattern-execution-time.png differ diff --git a/docs/images/regex-selection-view.png b/docs/images/regex-selection-view.png index 1119fc9c..8f5132fe 100644 Binary files a/docs/images/regex-selection-view.png and b/docs/images/regex-selection-view.png differ diff --git a/docs/images/select-query-recommendations.png b/docs/images/select-query-recommendations.png new file mode 100644 index 00000000..b273c9e4 Binary files /dev/null and b/docs/images/select-query-recommendations.png differ diff --git a/docs/images/stream-details-access.png b/docs/images/stream-details-access.png index 12f15444..4659d664 100644 Binary files a/docs/images/stream-details-access.png and b/docs/images/stream-details-access.png differ diff --git a/docs/images/stream-details-configuration.png b/docs/images/stream-details-configuration.png new file mode 100644 index 00000000..b5a977b3 Binary files /dev/null and b/docs/images/stream-details-configuration.png differ diff --git a/docs/images/stream-details-schema-settings.png b/docs/images/stream-details-schema-settings.png new file mode 100644 index 00000000..4fd3824f Binary files /dev/null and b/docs/images/stream-details-schema-settings.png differ diff --git a/docs/images/stream-details.png b/docs/images/stream-details.png index 9c908b34..a157d0b9 100644 Binary files a/docs/images/stream-details.png and b/docs/images/stream-details.png differ diff --git a/docs/images/stream-fields.png b/docs/images/stream-fields.png new file mode 100644 index 00000000..78b643c8 Binary files /dev/null and b/docs/images/stream-fields.png differ diff --git a/docs/images/stream-name.png b/docs/images/stream-name.png new file mode 100644 index 00000000..19e965da Binary files /dev/null and b/docs/images/stream-name.png differ diff --git a/docs/images/stream-start-end-time.png b/docs/images/stream-start-end-time.png new file mode 100644 index 00000000..59789f32 Binary files /dev/null and b/docs/images/stream-start-end-time.png differ diff --git a/docs/images/use-query-recommendations.png b/docs/images/use-query-recommendations.png new file mode 100644 index 00000000..a63d0b79 Binary files /dev/null and b/docs/images/use-query-recommendations.png differ diff --git a/docs/ingestion/.pages b/docs/ingestion/.pages index 9692d1fb..901db369 100644 --- a/docs/ingestion/.pages +++ b/docs/ingestion/.pages @@ -1,5 +1,5 @@ nav: - - Index: index.md + - Ingestion Overview: index.md - Logs: logs - Metrics: metrics - Traces: traces diff --git a/docs/ingestion/index.md b/docs/ingestion/index.md index 40a306f4..36e8fe28 100644 --- a/docs/ingestion/index.md +++ b/docs/ingestion/index.md @@ -3,7 +3,6 @@ description: >- Ingest logs, metrics, and traces into OpenObserve via OTEL, Fluentbit, APIs, syslog, Prometheus, or programmatically using Go, Python, or curl. --- -# Ingestion Logs metrics and traces can be ingested into OpenObserve from a variety of sources. This section describes how to ingest data from the following sources: @@ -17,6 +16,11 @@ Logs metrics and traces can be ingested into OpenObserve from a variety of sourc 1. [Fluent-bit](logs/fluent-bit) 1. [Fluentd](logs/fluentd) 1. [Amazon Kinesis Firehose](logs/kinesis_firehose) +1. [Syslog](logs/syslog) +1. [Python](logs/python) +1. [Go](logs/go) +1. [Curl](logs/curl) + ### APIs diff --git a/docs/user-guide/alerts/alerts.md b/docs/user-guide/alerts/alerts.md index 56938ccb..a96ef987 100644 --- a/docs/user-guide/alerts/alerts.md +++ b/docs/user-guide/alerts/alerts.md @@ -3,101 +3,88 @@ description: >- Learn how alerting works in OpenObserve. Supports real-time and scheduled alerts with thresholds, frequency, silence periods, and aggregation options. --- -# Alerts +## What are alerts? +Alerts automatically monitor your data streams and notify you when conditions are met. When predefined conditions trigger, alerts send notifications to your configured destinations. -Alerting provides mechanism to notify users when certain conditions are met. OpenObserve supports both scheduled and real time alerts. For the most part you should use Standard alerts as they are more efficient and can be used for most use cases. - -Real time alerts are useful when you want to be notified of a condition as soon as it occurs. Realtime alerts are suited primarily in the scenarios like "panic" in log or known malicious ip address in logs. Realtime alerts are evaluated at ingestion time based on condition specified, they are evaluated per record and can be computationally expensive. - -## Concepts - -Following is the definition of the fields in alerts: - -- **Threshold**: The threshold above/below which the alert will trigger. e.g. if the threshold is >100 and the query returns a value of 101 then the alert will trigger. - - For Scheduled - Standard: - - Threshold is measured against the number of records returned by the SQL query - - For Scheduled - With aggregation: - - This is fired whenever the SQL query returns more than `0` records - - For Scheduled with SQL: - - Threshold is measured against the number of records returned by the SQL query - -- **Period**: Period for which the query should run. e.g. 10 minutes means that whenever the query will run it will use the last 10 minutes of data. If the query runs at 4:00 PM then it will use the data from 3:50 PM to 4:00 PM. - -- **Frequency**: How often the alert should be evaluated. 2 minutes means that the query will be run every 2 minutes and will be evaluated based on the parameters provided. - -- **Silence notification for**: If the alert triggers then how long should it wait before sending another notification. This is to prevent overloading of alert messages. e.g. if the alert triggers at 4:00 PM and the silence notification is set to 10 minutes then it will not send another notification until 4:10 PM even if the alert is still after 1 minute. This is to avoid spamming the user with notifications. - -- Aggregation: The aggregation function to be used for the query. e.g. if the query is `SELECT COUNT(*) FROM table` then the aggregation function is `COUNT`. If the query is `SELECT AVG(column) FROM table` then the aggregation function is `AVG`. - -OpenObserve supports following kinds of alerts: - -## Standard alerts - -Standard alerts are evaluated at frequency (every 1 minute by default ) for the condition of the alert, over duration specified as part of alert. If the condition evaluates to true a notification is sent to alert destination. Additionally user can delay notification after a notification is generated once for specified time duration. - -For example: - -> A user wants to be notified of condition if error code 500 occurs more than 15 time for duration of 2 mins & wants such evaluation to happen at every 1 minute frequency. - -Watch this video to understand more. - - - -### Scheduled - Standard - -You can configure the condition which will be converted to SQL query and executed at specified frequency. - -We can configure the alert like this: - -**Threshold** is measured against the number of records returned by the SQL query +--- -![Standard alert](../../images/alerts/standard_alert.png) +## Alert types +There are two types of alerts in OpenObserve: Real-time and Scheduled. -The above alert configuration will result in the following SQL query (It's simplified for understanding): +- **Real-time alerts**: They monitor data continuously and trigger instantly when conditions are met. Use for critical events requiring instant action. For example, when the error count exceeds 10, alert sends notification to the destination within seconds. +- **Scheduled alerts**: They run at fixed intervals to evaluate aggregated or historical data. Use for routine monitoring and trend analysis. For example, every hour, the alert evaluates your data and checks if average response time exceeds 500ms. If the condition is met, the alert sends a notification. -```sql -select count(*) from default where severity = 'INFO' -``` +--- -The above query will be run every 1 minute for the last 2 minutes data. If count(*) > 3 (Threshold) then the alert will trigger. Additionally, the alert will not send another notification for 10 minutes after the first notification is sent. +## Core components +### Stream +The data source being monitored. -### Scheduled - With aggregation +### Period +Time window evaluated per alert run. If period is 30 minutes, the alert evaluates the last 30 minutes of data each run. -We fire when record count > 0 +### Frequency +How often the alert evaluation runs. Real-time alerts run continuously; scheduled alerts run as per the configured frequency. -![Standard alert](../../images/alerts/aggregation_alert.png) +### Condition: Quick mode and SQL mode +Conditions determine when the alert fires. +OpenObserve supports two modes: +??? note "Quick mode (Real-time and scheduled alerts)" + Build conditions using the UI. Combine fields, operators, and values with `OR` or `AND` logic. Group conditions for complex nested logic. -### Scheduled - with SQL + - **Logic**: + + - `OR`: Fire if ANY condition is true + - `AND`: Fire only if ALL conditions are true + - **Example**: `error_count > 100` OR `response_time > 500` + - **Operators**: `>` (greater than), `<` (less than), `==` (equal to) + - **Groups**: Nest multiple conditions for complex scenarios + - **Groups example**: `(error_count > 100 AND response_time > 500) OR status == "critical"`: This alert condition fires when BOTH error count AND response time are high, OR when status is critical -Threshold = number of records returned +??? note "SQL mode (Scheduled alerts only)" + Write custom SQL queries and VRL logic to define precise trigger conditions. Useful for complex filtering, aggregations, and multi-window comparisons. -### Scheduled - with PromQL + - Requires knowledge of SQL and VRL + - Enables advanced workflows with multi-window analysis -TODO +### Destination +Where alerts are sent. Choose one or combine multiple: +- **Email**: Send to team members or distribution lists. Requires SMTP configuration. +- **Webhook**: Send to external systems via HTTP. Integrates with Slack, Microsoft Teams, Jira, ServiceNow, and more. +- **Actions**: Execute custom Python scripts. Most flexible; can send to Slack AND log to stream simultaneously. Supports stateful workflows -## Real time alerts +### Silence Period +The silence period prevents duplicate notifications by temporarily pausing alert triggers after firing. -Real time alerts are evaluated at ingestion time based on condition specified , they are evaluated per record. +--- -For example: +## Multi-window alerts +Compare current data against historical data to detect anomalies and trends. +Raw numbers alone cannot reveal trends. Multi-window alerts provide context by comparing current results with past data to detect anomalies and performance shifts.
+For example, 200 errors in 30 minutes is critical if you normally see 50, but normal if you typically see 180-210. -> A user wants to be notified of when API response time is more than 100 ms +### Workflow: - -![Real Time Alert](../../images/alerts_realtime.png) - +1. **Set up windows**: Define current window (time period to monitor) and reference windows (historical periods to compare) +2. **Write SQL**: Query data for all windows +3. **Write VRL logic**: Compare results and calculate differences +4. **Set threshold**: Alert triggers if comparison exceeds your condition -Please note we selected `Slack` destination for demo, but you can add others in `Alert destination`. +### SQL and VRL editor +After configuring windows, navigate to **Conditions** > **SQL mode** > **View Editor**. Use the **SQL Editor** to query data and **VRL Editor** to process results. Run queries to see output, apply VRL to see combined results, then set your alert condition. -Watch this video to understand more. +### Use cases +- **Spike detection**: Detect sudden increases in error counts by comparing the current window with a previous time period. +- **Performance degradation**: Identify when average response times are trending upward compared to historical data. +- **Anomalous behavior**: Detect unusual activity patterns in user behavior, traffic, or system performance that deviate from expected norms. - +--- ## FAQ -**Q**: If I set the frequency to 5 minutes and the current time is 23:03, when will the next runs happen? -**A**: OpenObserve aligns the next run to the nearest upcoming time that is divisible by the frequency, starting from the top of the hour in the configured timezone. This ensures that all runs occur at consistent and predictable intervals. +**Question**: If I set the frequency to 5 minutes and the current time is 23:03, when will the next runs happen?
+**Answer**: OpenObserve aligns the next run to the nearest upcoming time that is divisible by the frequency, starting from the top of the hour in the configured timezone. This ensures that all runs occur at consistent and predictable intervals.
**Example**
If the current time is 23:03, here is when the next run will occur for different frequencies: diff --git a/docs/user-guide/enrichment-tables/enrichment-table-upload-recovery.md b/docs/user-guide/enrichment-tables/enrichment-table-upload-recovery.md index 8b5851c8..37f764bb 100644 --- a/docs/user-guide/enrichment-tables/enrichment-table-upload-recovery.md +++ b/docs/user-guide/enrichment-tables/enrichment-table-upload-recovery.md @@ -59,4 +59,25 @@ When no local disk cache is available: - The querier fetches the latest enrichment data from the metadata database, such as PostgreSQL, and the remote storage system, such as S3. It then provides the data to the restarting node. +## Region-based caching in multi-region super clusters +In a multi-region super cluster deployment, enrichment tables are typically queried from all regions when a node starts up and rebuilds its cache. While this ensures data completeness, it can slow startup or cause failures if one or more regions are unavailable. + +To address this, OpenObserve Enterprise supports primary region–based caching, controlled by the environment variable `ZO_ENRICHMENT_TABLE_GET_REGION`. + +### Requirements + +- Available only in Enterprise Edition. +- Requires Super Cluster to be enabled. +- The `ZO_ENRICHMENT_TABLE_GET_REGION` variable must specify a valid region name. + +### How it works +When a node starts, OpenObserve calls internal methods such as `get_enrichment_table_data()` and `cache_enrichment_tables()` to retrieve enrichment table data.
+The boolean parameter `apply_primary_region_if_specified` controls whether to use only the primary region for these fetch operations. + +In a multi-region super cluster deployment, when `apply_primary_region_if_specified = true`, OpenObserve checks the value of `ZO_ENRICHMENT_TABLE_GET_REGION`. + +- If `ZO_ENRICHMENT_TABLE_GET_REGION` specifies a primary region, the node queries only that region to fetch enrichment table data during cache initialization. +- If `ZO_ENRICHMENT_TABLE_GET_REGION` is not set, or the region name is empty, OpenObserve continues to query all regions as before. + + diff --git a/docs/user-guide/federated-search/how-to-use-federated-search.md b/docs/user-guide/federated-search/how-to-use-federated-search.md index a319cdee..2ef71487 100644 --- a/docs/user-guide/federated-search/how-to-use-federated-search.md +++ b/docs/user-guide/federated-search/how-to-use-federated-search.md @@ -27,7 +27,7 @@ Query your current cluster when you know the data is in your cluster or when you 4. Select one specific cluster from the **Region** dropdown. 5. Select **Run query**. -> For detailed explanation, see **Normal cluster query execution** in the [Federated Search Architecture](../federated-search/federated-search-architecture/) page. +> For detailed explanation, see **Normal cluster query execution** in the [Federated Search Architecture](https://openobserve.ai/docs/user-guide/federated-search/federated-search-architecture/) page.
**Result**
@@ -54,7 +54,7 @@ Use federated search when you need data from multiple clusters. 4. Leave the **Region** dropdown unselected, or select multiple clusters. 5. Select **Run query**. -> For detailed explanation, see **Federated search for one different cluster** and **Federated search for multiple clusters** in the [Federated Search Architecture](../federated-search-architecture/) page. +> For detailed explanation, see **Federated search for one different cluster** and **Federated search for multiple clusters** in the [Federated search architecture](https://openobserve.ai/docs/user-guide/federated-search/federated-search-architecture/) page.
**Result**
@@ -75,4 +75,4 @@ Use this quick reference to understand how region selection affects query execut **Next step** -- [Federated Search Architecture](../federated-search-architecture/) \ No newline at end of file +- [Federated search architecture](https://openobserve.ai/docs/user-guide/federated-search/federated-search-architecture/) \ No newline at end of file diff --git a/docs/user-guide/federated-search/index.md b/docs/user-guide/federated-search/index.md index b25b2c13..6cfc77fb 100644 --- a/docs/user-guide/federated-search/index.md +++ b/docs/user-guide/federated-search/index.md @@ -41,7 +41,7 @@ Before using federated search, understand these core concepts: > **Important**: Querying your current cluster uses normal cluster query execution, not federated search architecture. -> For detailed technical explanations of deployment modes, architecture, and how queries execute, see the [Federated Search Architecture](../federated-search-architecture/) page. +> For detailed technical explanations of deployment modes, architecture, and how queries execute, see the [Federated search architecture](https://openobserve.ai/docs/user-guide/federated-search/federated-search-architecture/) page. ## When to use federated search @@ -61,5 +61,5 @@ Before using federated search, understand these core concepts: **Next steps** -- [How to Use Federated Search](../how-to-use-federated-search/) -- [Federated Search Architecture](../federated-search-architecture/) \ No newline at end of file +- [How to use federated search](https://openobserve.ai/docs/user-guide/federated-search/how-to-use-federated-search/) +- [Federated search architecture](https://openobserve.ai/docs/user-guide/federated-search/federated-search-architecture/) \ No newline at end of file diff --git a/docs/user-guide/identity-and-access-management/organizations.md b/docs/user-guide/identity-and-access-management/organizations.md index 304188cb..01c5d844 100644 --- a/docs/user-guide/identity-and-access-management/organizations.md +++ b/docs/user-guide/identity-and-access-management/organizations.md @@ -11,48 +11,48 @@ Organizations provide logical boundaries for separating data, users, and access ![Organizations in OpenObserve](../../images/organization-in-openobserve.png) -## Organization Types +## Organization types OpenObserve supports two types of organizations: - **Default organization:** Automatically created for each user upon account creation. Typically named **default** and owned by the user. The UI labels it as type **default**. - **Custom organization:** Any organization other than the **default**. These are created manually using the UI or ingestion (if enabled). Displayed in the UI as type **custom**. -!!! Info "What Is **_meta** Organization?" - **_meta Organization** is considered as a **custom** organization. It is a system-level organization that exists in both single-node and multi-node (HA) deployments. - - - The **_meta** organization provides visibility into the health and status of the OpenObserve instance, including node metrics, resource usage, and configuration across all organizations. - - Use the **IAM > Roles > Permission** in the **_meta** organization to manage users across all organizations and control who can list, create, update, or delete organizations. - -## Access - -In OpenObserve, access to organization-level operations, such as listing, creating, updating, or deleting organizations, depends on the deployment mode. - -### Open-Source Mode -Any authenticated user can create new organizations using the Add Organization button in the UI. -### Enterprise Mode with RBAC Enabled -- Access to organization management is strictly controlled through RBAC, which must be configured in the _meta organization. -- The **root** user always has unrestricted access to all organizations, including **_meta**. -- Only roles defined in **_meta** can include permissions for managing organizations. -- The **organization** module is available in the role editor only within the **_meta** organization. - -!!! Info "How to Grant Organization Management Access?" - To delegate organization management to users in enterprise mode: - - 1. Switch to the **_meta** organization. - 2. Go to **IAM > Roles**. - 3. Create a new role or edit an existing one. - 4. In the **Permissions** tab, locate the Organizations module. - 5. Select the required operations: - - - **List**: View existing organizations - - **Create**: Add new organizations - - **Update**: Modify organization details - - **Delete**: Remove organizations - 6. Click **Save**.
- ![Grant Organization Management Access in OpenObserve](../../images/organization-role-permission.png) - - Once this role is assigned to a user within the **_meta** organization, they will have access to manage organizations across the system. +### _meta organization +**_meta Organization** is considered as a **custom** organization. It is a system-level organization that exists in both single-node and multi-node (HA) deployments. + +- The **_meta** organization provides visibility into the health and status of the OpenObserve instance, including node metrics, resource usage, and configuration across all organizations. +- Use the **IAM > Roles > Permission** in the **_meta** organization to manage users across all organizations and control who can list, create, update, or delete organizations. + +!!! note "Who can access" + ## Who can access + In OpenObserve, access to organization-level operations, such as listing, creating, updating, or deleting organizations, depends on the deployment mode. + + ### Access in the open-source mode + Any authenticated user can create new organizations using the **Add Organization** button in the UI. + ### Access in the enterprise mode with RBAC enabled + - Access to organization management is strictly controlled through RBAC, which must be configured in the _meta organization. + - The **root** user always has unrestricted access to all organizations, including **_meta**. + - Only roles defined in **_meta** can include permissions for managing organizations. + - The **organization** module is available in the role editor only within the **_meta** organization. + +## How to grant organization management access? +To delegate organization management to users in enterprise mode: + +1. Switch to the **_meta** organization. +2. Go to **IAM > Roles**. +3. Create a new role or edit an existing one. +4. In the **Permissions** tab, locate the Organizations module. +5. Select the required operations: + + - **Create**: Add new organizations + - **Update**: Modify organization details + !!! note "Note" + By default, OpenObserve displays the list of organizations a user belongs to. You do not need to explicitly grant permission to view or retrieve organization details. +6. Click **Save**.
+![Grant Organization Management Access in OpenObserve](../../images/organization-role-permission.png) + +Once this role is assigned to a user within the **_meta** organization, they will have access to manage organizations across the system. ## Create an Organization diff --git a/docs/user-guide/management/aggregation-cache.md b/docs/user-guide/management/aggregation-cache.md index 873a04a6..1303f012 100644 --- a/docs/user-guide/management/aggregation-cache.md +++ b/docs/user-guide/management/aggregation-cache.md @@ -5,6 +5,8 @@ description: Learn how streaming aggregation works in OpenObserve Enterprise. --- This page explains what streaming aggregation is and shows how to use it to improve query performance with aggregation cache in OpenObserve. +!!! info "Availability" + This feature is available in Enterprise Edition. === "Overview" diff --git a/docs/user-guide/management/audit-trail.md b/docs/user-guide/management/audit-trail.md index 030e645e..4d738f1e 100644 --- a/docs/user-guide/management/audit-trail.md +++ b/docs/user-guide/management/audit-trail.md @@ -9,7 +9,7 @@ description: >- !!! info "Availability" This feature is available in Enterprise Edition and Cloud. Not available in Open Source. -## What is Audit Trail +## What is audit trail? Audit Trail records user actions across all organizations in OpenObserve. It captures non-ingestion API calls and helps you monitor activity and improve security. !!! note "Who can access" diff --git a/docs/user-guide/management/sensitive-data-redaction.md b/docs/user-guide/management/sensitive-data-redaction.md index 195ce602..4867a298 100644 --- a/docs/user-guide/management/sensitive-data-redaction.md +++ b/docs/user-guide/management/sensitive-data-redaction.md @@ -1,27 +1,29 @@ --- title: Sensitive Data Redaction in OpenObserve Enterprise -description: Learn how to redact or drop sensitive data using regex patterns during log ingestion or query time in OpenObserve Enterprise Edition. +description: Learn how to redact, hash, or drop sensitive data using regex patterns during log ingestion or query time in OpenObserve Enterprise Edition. --- -This document explains how to configure and manage regex patterns for redacting or dropping sensitive data in OpenObserve. +This document explains how to configure and manage regex patterns to redact, hash, and drop sensitive data in OpenObserve. !!! info "Availability" This feature is available in Enterprise Edition and Cloud. Not available in Open Source. ## Overview -The **Sensitive Data Redaction** feature helps prevent accidental exposure of sensitive data by applying regex-based detection to values ingested into streams and to values already stored in streams. Based on this detection, sensitive values can be either **redacted** or **dropped**. This ensures data is protected before it is stored and hidden when displayed in query results. You can configure these actions to run at ingestion time or at query time. +The **Sensitive Data Redaction** feature helps prevent accidental exposure of sensitive data by applying regex-based detection to values ingested into streams and to values already stored in streams. Based on this detection, sensitive values can be either **redacted**, **hashed**, or **dropped**. This ensures data is protected before it is stored and hidden when displayed in query results. You can configure these actions to run at ingestion time or at query time. **Ingestion time** -> **Note**: Use ingestion time redaction or drop when you want to ensure sensitive data is never stored on disk. This is the most secure option for compliance requirements, as the original sensitive data cannot be recovered once it's redacted or dropped during ingestion. +> **Note**: Use ingestion time redaction, hash, or drop when you want to ensure sensitive data is never stored on disk. This is the most secure option for compliance requirements, as the original sensitive data cannot be recovered once it is redacted, hashed, or dropped during ingestion. -- **Redaction**: Sensitive data is masked before being stored on disk. +- **Redact**: Sensitive data is masked before being stored on disk. +- **Hash**: Sensitive data is replaced with a [searchable](#search-hashed-values-uusing-match_all_hash) hash before being stored on disk. - **Drop**: Sensitive data is removed before being stored on disk. **Query time** -> **Note**: If you have already ingested sensitive data and it's stored on disk, you can use query time redaction or drop to protect it. This allows you to apply sensitive data redaction to existing data. +> **Note**: If you have already ingested sensitive data and it is stored on disk, you can use query time redaction or drop to protect it. This allows you to apply sensitive data redaction to existing data. - **Redaction**: Sensitive data is read from disk but masked before results are displayed. +- **Hash**: Sensitive data is read from disk but masked with a [searchable](#search-hashed-values-uusing-match_all_hash) hash before results are displayed. - **Drop**: Sensitive data is read from disk but excluded from the query results. !!! note "Where to find" @@ -46,11 +48,12 @@ The **Sensitive Data Redaction** feature helps prevent accidental exposure of se - To associate patterns with stream fields, users need List permission on **Regexp Patterns** AND edit permission on **Streams** modules. -!!! warning "Important Note" +!!! warning "Important note" - Regex patterns can only be applied to fields with UTF8 data type. - The stream must have ingested data before you can apply regex patterns. Empty streams will not show field options for pattern association. -## Create Regex Patterns + +## Create regex patterns **To create a regex pattern:** @@ -111,7 +114,7 @@ The **Sensitive Data Redaction** feature helps prevent accidental exposure of se **Example**
The following screenshots illustrate the pattern creation process: - 1. Review the logs that includes PII. + 1. Review the logs that include PII.
The `message` field in the `pii_test_stream` contains names, email addresses, IP addresses, SSNs, and credit card numbers.
@@ -124,7 +127,7 @@ The **Sensitive Data Redaction** feature helps prevent accidental exposure of se **Email Addresses**: ![Email regex](../../images/email-regex.png) -## Apply Regex Patterns +## Apply regex patterns Once your patterns are created and tested, you can apply them to specific fields in a stream to redact or drop sensitive data during ingestion or at query time.
**To apply a pattern to a field:** @@ -147,8 +150,8 @@ Once your patterns are created and tested, you can apply them to specific fields After selecting a pattern, a detail view appears. ![Regex selection](../../images/regex-selection-view.png) -??? "Step 3: Choose whether to Redact or Drop" - ### Step 3: Choose whether to Redact or Drop +??? "Step 3: Choose whether to Redact, Hash, or Drop" + ### Step 3: Choose whether to Redact, Hash, or Drop ![Regex pattern execution action- redact or drop](../../images/redact-or-drop-during-regex-pattern-execution.png) When applying a regex pattern, you must choose one of the following actions in the pattern details screen: @@ -158,28 +161,33 @@ Once your patterns are created and tested, you can apply them to specific fields - Replaces only the matching portion of the field value with `[REDACTED]`, while preserving the rest of the field. - Use this when the field contains both sensitive and non-sensitive information and you want to retain the overall context. + **Hash**: + + - Replaces the matched sensitive value with a **deterministic hashed token** while keeping its position within the field. + + **Drop**: - Removes the entire field from the log record if the regex pattern matches. - Use this when the entire field should be excluded from storage or analysis. - Select the appropriate action between Redact and Drop. + Select the appropriate action. ??? "Step 4: Choose when the action needs to be executed" ### Step 4: Choose when the action needs to be executed - In the pattern details screen, select when the chosen action (redact or drop) should be executed, at ingestion time, query time, or both. - + In the pattern details screen, select when the chosen action (redact, hash, or drop) should be executed, at ingestion time, query time, or both. + ![Regex pattern execution time](../../images/regex-pattern-execution-time.png) **Ingestion**: - - The data is redacted or dropped before it is written to disk. + - The data is redacted, hashed, or dropped before it is written to disk. - This ensures that sensitive information is never stored in OpenObserve. - Example: If an email address is redacted at ingestion, only the masked value `[REDACTED]` will be stored in the logs. **Query**: - - The data is stored in its original form but is redacted or dropped only when queried. + - The data is stored in its original form but is redacted, hashed, or dropped only when queried. - This allows administrators to preserve the original data while preventing exposure of sensitive values during searches. - Example: An email address stored in raw form will be hidden as `[REDACTED]` in query results. @@ -192,15 +200,11 @@ Once your patterns are created and tested, you can apply them to specific fields 1. To add the regex pattern to Applied Patterns, click **Add Pattern**. ![Add regex pattern](../../images/add-regex-pattern.png) 2. Select **Update Changes**. - ![Update regex patterns](../../images/update-regex-patterns.png) ??? "Step 6: (Optional) Apply multiple patterns" You can apply multiple patterns to the same field, as shown below: - Configure a regex pattern to detect emails: - ![regex-patterns-redact](../../images/regex-patterns-redact.png) - Configure a regex pattern to IP addresses: - ![regex-patterns-drop](../../images/regex-patterns-drop.png) + ![apply-multiple-reg-pattern](apply-multiple-reg-pattern.png) All applied patterns will appear in the left-hand panel with check marks. ??? "Step 7: Save configuration" @@ -208,7 +212,7 @@ Once your patterns are created and tested, you can apply them to specific fields When finished, click **Update Changes** to save the configuration. This activates the regex rules for the selected field. -## Test Redaction and Drop Operations +## Test Redact, Hash and Drop operations The following regex patterns are applied to the `message` field of the `pii_test` stream: @@ -220,7 +224,7 @@ The following regex patterns are applied to the `message` field of the `pii_test | Credit Card | Drop | Query | Excludes credit card numbers from results | ??? "Test 1: Redact at ingestion time" - ### Test 1: Redact at ingestion time + ### Redact at ingestion time **Pattern Configuration**: ![redact-at-ingestion-time-test-config](../../images/redact-at-ingestion-time-test-config.png) @@ -230,7 +234,7 @@ The following regex patterns are applied to the `message` field of the `pii_test 2. Select the `pii_test` stream from the dropdown. 3. Ingest a log entry containing a full name in the message field. ```bash - $ curl -u root@example.com:FNIB8MWspXZRkRgS -k https://dev2.internal.zinclabs.dev/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"User John Doe logged in successfully"}]' + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"User John Doe logged in successfully"}]' {"code":200,"status":[{"name":"pii_test","successful":1,"failed":0}]} ``` 4. Set the time range to include the test data. @@ -246,7 +250,7 @@ The following regex patterns are applied to the `message` field of the `pii_test ??? "Test 2: Drop at ingestion time" - ### Test 2: Drop at ingestion time + ### Drop at ingestion time **Pattern Configuration**: ![drop-at-query-time-test-config](../../images/drop-at-ingestion-time-test-config.png) @@ -254,9 +258,9 @@ The following regex patterns are applied to the `message` field of the `pii_test 1. From the left-hand menu, select **Logs**. 2. Select the `pii_test` stream from the dropdown. - 3. Ingest a log entry containing a IP address in the message field. + 3. Ingest a log entry containing an IP address in the message field. ```bash - $ curl -u root@example.com:FNIB8MWspXZRkRgS -k https://dev2.internal.zinclabs.dev/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Connection from IP 192.168.1.100 established"}]' + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Connection from IP 192.168.1.100 established"}]' {"code":200,"status":[{"name":"pii_test","successful":1,"failed":0}]} ``` 4. Set the time range to include the test data. @@ -270,8 +274,26 @@ The following regex patterns are applied to the `message` field of the `pii_test - Other fields remain intact. - This demonstrates field-level drop at ingestion. -??? "Test 3: Redact at query time" - ### Test 3: Redact at query time +??? "Test 3: Hash at ingestion time" + ### Hash at ingestion time + **Pattern Configuration**: + ![config-hash-pattern-ingestion-time](../../images/config-hash-pattern-ingestion-time.png) + + **Test Steps:** + + 1. From the left-hand menu, select **Logs**. + 2. Select the `pii_test` stream from the dropdown. + 3. Ingest a log entry containing a card details in the logs field. + ```bash + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"job":"test","level":"info","log":"Payment processed with card 4111-1111-1111-1111"}]' + ``` + 4. Set the time range to include the test data. + 5. Click **Run Query**. + 6. Verify results: + ![hashed-at-ingestion-time](../../images/hashed-at-ingestion-time.png) + +??? "Test 4: Redact at query time" + ### Redact at query time **Pattern Configuration**: ![redact-at-query-test-config](../../images/redact-at-query-test-config.png) @@ -279,9 +301,9 @@ The following regex patterns are applied to the `message` field of the `pii_test 1. From the left-hand menu, select **Logs**. 2. Select the `pii_test` stream from the dropdown. - 3. Ingest a log entry containing a email addresses in the message field. + 3. Ingest a log entry containing an email addresses in the message field. ```bash - $ curl -u root@example.com:FNIB8MWspXZRkRgS -k https://dev2.internal.zinclabs.dev/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Password reset requested for john.doe@company.com"}]' + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Password reset requested for john.doe@company.com"}]' {"code":200,"status":[{"name":"pii_test","successful":1,"failed":0}]} ``` @@ -297,8 +319,8 @@ The following regex patterns are applied to the `message` field of the `pii_test - Useful for compliance while maintaining data for authorized access. -??? "Test 4: Drop at query time" - ### Test 4: Drop at query time +??? "Test 5: Drop at query time" + ### Drop at query time **Pattern Configuration**: ![Drop at Query Time- Test Config](../../images/drop-at-query-time-test-config.png) @@ -308,7 +330,7 @@ The following regex patterns are applied to the `message` field of the `pii_test 2. Select the `pii_test` stream from the dropdown. 3. Ingest a log entry containing credit card details in the message field. ```bash - $ curl -u root@example.com:FNIB8MWspXZRkRgS -k https://dev2.internal.zinclabs.dev/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Payment processed with card 4111-1111-1111-1111"}]' + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"level":"info","job":"test","message":"Payment processed with card 4111-1111-1111-1111"}]' {"code":200,"status":[{"name":"pii_test","successful":1,"failed":0}]} ``` 4. Set the time range to include the test data. @@ -322,6 +344,36 @@ The following regex patterns are applied to the `message` field of the `pii_test - The `message` field with the credit card details gets dropped in query results. - This demonstrates field-level drop at query time. +??? "Test 6: Hash at query time" + ### Hash at query time + **Pattern Configuration**: + ![config-hash-pattern-query-time](../../images/config-hash-pattern-query-time.png) + + **Test Steps:** + + 1. From the left-hand menu, select **Logs**. + 2. Select the `pii_test` stream from the dropdown. + 3. Ingest a log entry containing a card details in the logs field. + ```bash + $ curl -u example@example.com:FNIB8MWshsuhyehH -k https://example.zinclabs/api/default/pii_test/_json -d '[{"job":"test","level":"info","log":"Payment processed with card 4111-1111-1111-1111"}]' + ``` + 4. Set the time range to include the test data. + 5. Click **Run Query**. + 6. Verify results: + ![hash-at-query-time](../../images/hashed-at-query-time.png) + +## Search hashed values uUsing `match_all_hash` +The `match_all_hash` user-defined function (UDF) complements the SDR Hash feature. It allows you to search for logs that contain the hashed equivalent of a specific sensitive value. +When data is hashed using Sensitive Data Redaction, the original value is replaced with a deterministic hash. You can use `match_all_hash()` to find all records that contain the hashed token, even though the original value no longer exists in storage. +Example: +```sql +match_all_hash('4111-1111-1111-1111') +``` +This query returns all records where the SDR Hash of the provided value exists in any field. +In the example below, it retrieves the log entry containing +[REDACTED:907fe4882defa795fa74d530361d8bfb], the hashed version of the given card number. +![match-all-hash](../../images/match-all-hash.png) + ## Limitations diff --git a/docs/user-guide/metrics/downsampling-metrics.md b/docs/user-guide/metrics/downsampling-metrics.md index b3f81efb..c60469a7 100644 --- a/docs/user-guide/metrics/downsampling-metrics.md +++ b/docs/user-guide/metrics/downsampling-metrics.md @@ -10,7 +10,7 @@ This guide provides an overview of downsampling, including its configuration, ru Downsampling summarizes historical data into fewer data points. Each summarized data point is calculated using an aggregation method, such as the last recorded value, the average, or the total, applied over a defined time block. -## Configure Downsampling +## Configure downsampling Downsampling is configured using the following environment variables.: @@ -21,7 +21,7 @@ Downsampling is configured using the following environment variables.: > Refer to the [Downsampling Rule](#downsampling-rule) section.
-#### Downsampling Configuration For Helm Chart Users +#### Downsampling configuration for Helm Chart users Add the environment variables under the `enterprise.parameters` section in your `values.yaml` file: ``` @@ -32,7 +32,7 @@ enterprise: `O2_METRICS_DOWNSAMPLING_RULES`: "o2_cpu_usage:avg:30d:5m" ``` -#### Downsampling Configuration For Terraform Users +#### Downsampling configuration for Terraform users Set the same variables in your `terraform.tfvars` file: ``` @@ -42,7 +42,7 @@ Set the same variables in your `terraform.tfvars` file: > **Note**: **After setting the environment variables, make sure to redeploy the OpenObserve instance for the changes to apply.** -### Downsampling Rule +### Downsampling rule User-defined rules determine how downsampling is applied to metrics streams. You can define multiple downsampling rules to target different streams or use different configurations. @@ -64,14 +64,14 @@ Here: - **offset**: It defines the age of data eligible for downsampling. For example, 15d for applying downsampling on data older than 15 days. - **step**: The time block used to group data points. For example, 30m for applying downsampling to retain one value every 30 minutes. -### Sample Downsampling Rules +### Sample downsampling rules -#### Single Rule +#### Single rule ```yaml O2_METRICS_DOWNSAMPLING_RULES: "o2_cpu_metrics:avg:30d:5m" ``` Retains one average value every 5 minutes for `o2_cpu_metrics` data older than 30 days.
-**Multiple Rules** +**Multiple rules** ```yaml O2_METRICS_DOWNSAMPLING_RULES: "o2_cpu_metrics:avg:30d:5m, o2_app_logs:last:10d:10m" ``` @@ -82,7 +82,7 @@ O2_METRICS_DOWNSAMPLING_RULES: "o2_cpu_.*:sum:10d:60m" ``` Targets all streams starting with `o2_cpu_`, and for each matching stream, retains one hourly sum for data older than 10 days. -### Downsampling Example +### Downsampling example **Scenario**
A system is recording CPU usage data every 10 seconds to the stream `o2_cpu_usage`, generating a large volume of high-resolution metrics. Over time, this data becomes too granular and expensive to store or query efficiently for historical analysis. @@ -94,7 +94,7 @@ Downsample data older than 30 days to retain one average for every 2-minute time `O2_COMPACT_DOWNSAMPLING_INTERVAL` = "180" `O2_METRICS_DOWNSAMPLING_RULES` = "o2_cpu_usage:avg:30d:2m" -**Input Metrics**
+**Input metrics**
```json @@ -158,7 +158,7 @@ Downsample data older than 30 days to retain one average for every 2-minute time { "timestamp": "2024-03-01 00:08:50", "cpu": 20.2 } ``` -**Downsampling Time Blocks (Step = 2m) and Average CPU Usage** +**Downsampling time blocks (Step = 2m) and average CPU usage** - Time Block 1: From 00:00:00 to 00:01:59, average CPU usage is 20.55 - Time Block 2: From 00:02:00 to 00:03:59, average CPU usage is 21.75 @@ -166,7 +166,7 @@ Downsample data older than 30 days to retain one average for every 2-minute time - Time Block 4: From 00:06:00 to 00:07:59, average CPU usage is 21.65 - Time Block 5: From 00:08:00 to 00:09:59, average CPU usage is 20.88 (not processed yet) -**Downsampling Job Runs and Outputs** +**Downsampling job runs and outputs** Job 1 runs at 00:03:00 and processes Time Block 1
Output: diff --git a/docs/user-guide/metrics/file-access-time-metric.md b/docs/user-guide/metrics/file-access-time-metric.md index a3d6e472..bcf68579 100644 --- a/docs/user-guide/metrics/file-access-time-metric.md +++ b/docs/user-guide/metrics/file-access-time-metric.md @@ -3,11 +3,11 @@ description: >- Analyze file access age in OpenObserve to gauge query performance. Buckets track how recently files were accessed, revealing hot vs. cold data trends. --- -## What Is File Access Time Metric? +## What is file access time metric? This histogram metric helps analyze the age of files accessed by the querier. This helps in understanding the distribution of file access times across queries and evaluating system performance. -## How Does It Works? +## How does it work? The metric tracks file age in hourly buckets ranging from 1 hour to 32 hours. Each data point represents how long ago a file was accessed during query execution. **The metric is exposed as:** @@ -16,7 +16,7 @@ The metric tracks file age in hourly buckets ranging from 1 hour to 32 hours. Ea Zo_file_access_time_bucket ``` -## Example Usage +## Example usage To calculate the 95th percentile of file access age for logs over a 5-minute window: ``` diff --git a/docs/user-guide/performance/.pages b/docs/user-guide/performance/.pages index d40557f5..d3c5e4df 100644 --- a/docs/user-guide/performance/.pages +++ b/docs/user-guide/performance/.pages @@ -4,3 +4,4 @@ nav: - Monitor Download Queue Size and Disk Cache Metrics: monitor-download-queue-size-and-disk-cache-metrics.md - Configure Disk Cache Eviction Strategy: disk-cache-strategy.md - Tantivy Index: tantivy-index.md + - Broadcast Join: broadcast-join.md diff --git a/docs/user-guide/performance/broadcast-join.md b/docs/user-guide/performance/broadcast-join.md new file mode 100644 index 00000000..ddcfebc7 --- /dev/null +++ b/docs/user-guide/performance/broadcast-join.md @@ -0,0 +1,247 @@ +--- +title: Broadcast Join - OpenObserve Query Optimization Feature +description: Broadcast join in OpenObserve: 43% faster queries, 99.9% less network transfer. Automatic optimization for enrichment tables and subqueries. +--- + +This document explains how broadcast join works as an automatic query optimization feature in OpenObserve. + +!!! info "Availability" + This feature is available in Enterprise Edition. + +## Overview + +Broadcast Join is a query optimization feature in OpenObserve that dramatically improves the performance of distributed join operations. Instead of shuffling large amounts of data across the network, this optimization broadcasts smaller datasets to all computing nodes, enabling local join processing. + +**Key Benefits:** +- 43% faster query execution, verified by reducing time from 14 seconds to 8 seconds on production queries +- 99.9% reduction in network data transfer, verified by reducing transfer from 8.9 million rows to 5.7 thousand rows +- Automatic activation when conditions are met +- Works across superclusters + +--- + +## How broadcast join works + +Broadcast join reduces data transmission and improves parallel execution by broadcasting the smaller dataset to all computing nodes instead of shuffling large amounts of data across the network. + +When the smaller side of a join operation is small enough to fit in memory, it is broadcast to all nodes performing the join operation, allowing each node to perform the join locally. + +--- + +## Two types of broadcast join + +OpenObserve automatically applies broadcast join in two scenarios: + +### 1. Enrichment table pattern +**When**: Joining a large dataset with a small, pre-loaded reference table +**Example**: Adding service metadata to logs +**Key**: Enrichment table must already be in memory on all nodes + +### 2. Subquery pattern +**When**: Using IN-list queries with a limited subquery result +**Example**: Filtering by a list of selected namespaces or trace IDs +**Key**: Subquery must have LIMIT and return < 10,000 rows + +--- + +## Enrichment table broadcast join + +### When it applies + +This optimization is automatically triggered when **all three conditions** are met: + +1. **Enrichment table is loaded in memory** and consistent across all nodes, including superclusters +2. **Enrichment table is on the LEFT side** of the join +3. **Right side is a table scan** with optional filter + +### Example + +```sql +SELECT logs.service_name, + enrich.region, + enrich.team_owner, + COUNT(*) AS total_errors +FROM logs +JOIN service_metadata AS enrich + ON logs.service_name = enrich.service_name +WHERE logs.status_code = 500 +GROUP BY logs.service_name, enrich.region, enrich.team_owner; +``` + +### Execution flow + +1. The enrichment table (`service_metadata`) is already available in memory on all nodes +2. The `logs` table is scanned with the filter (`status_code = 500`) +3. Each node performs the join locally between its log partition and the enrichment table +4. The leader node merges results from all nodes + +--- + +## Subquery broadcast join + +### When it applies + +This optimization is automatically triggered when **all four conditions** are met: + +1. **Subquery has Aggregate + LIMIT**: Must produce a bounded result set +2. **Main query is a simple scan**: Table scan with optional filters +3. **Only one join**: Two tables only, not three or more +4. **Within size limits**: Subquery result less than 10,000 rows and less than 10 MB + +### Example 1: Kubernetes namespace filtering + +```sql +SELECT kubernetes_namespace_name, + array_agg(DISTINCT kubernetes_container_name) AS container_name +FROM default +WHERE log LIKE '%zinc%' + AND kubernetes_namespace_name IN ( + SELECT DISTINCT kubernetes_namespace_name + FROM default + WHERE log LIKE '%zinc%' + ORDER BY kubernetes_namespace_name + LIMIT 10 + ) +GROUP BY kubernetes_namespace_name +ORDER BY kubernetes_namespace_name +LIMIT 10; +``` + +### Example 2: Trace ID lookup + +```sql +SELECT trace_id, + array_agg(DISTINCT service_name) AS name +FROM trace_list_index +WHERE trace_id IN ( + SELECT DISTINCT trace_id + FROM trace_list_index + ORDER BY trace_id + LIMIT 10000 + ) +GROUP BY trace_id +ORDER BY trace_id +LIMIT 10; +``` + +**Performance**: +- Without Broadcast Join: 14 seconds +- With Broadcast Join: 8 seconds, representing a 43% improvement +- Data transfer: Reduced from 8,938,099 rows to 5,743 rows + +### Execution flow + +1. Leader node executes the subquery and returns 10 namespaces +2. Subquery results are saved to object storage in Arrow format +3. Result set is broadcast to all follower nodes +4. Each follower performs the join locally with its partition of the main table +5. Leader merges results from all nodes + +--- + +## Configuration + +Broadcast join is enabled by default. You can adjust the limits for the subquery pattern: + +### Maximum rows +```bash +export ZO_FEATURE_BROADCAST_JOIN_LEFT_SIDE_MAX_ROWS=10000 +``` +**Default**: 10,000 rows +**Purpose**: Subquery results exceeding this will not trigger broadcast join + +### Maximum size +```bash +export ZO_FEATURE_BROADCAST_JOIN_LEFT_SIDE_MAX_SIZE=10485760 +``` +**Default**: 10 MB +**Purpose**: Subquery results exceeding this size will not trigger broadcast join + +**Note**: These limits only apply to the subquery pattern. Enrichment tables are managed separately. + +--- + +## When broadcast join is not applied + +### When enrichment pattern will not trigger: + +- Enrichment table is not pre-loaded in memory +- Enrichment table is on the RIGHT side of the join +- Query has multiple join operations +- Enrichment table state is inconsistent across nodes + +### When subquery pattern will not trigger: + +- Subquery result exceeds the configured limit of 10,000 rows +- Subquery result exceeds the configured limit of 10 MB +- Subquery lacks LIMIT clause +- Subquery lacks Aggregate operation +- Main query is not a simple table scan +- Query involves three or more tables in multi-table joins + +### Performance note: + +**Low cardinality IN-lists**, for example IN (1, 2, 3): For queries with low cardinality, the performance is basically the same as ordinary queries. Broadcast join is most effective for high cardinality IN-list queries. + +--- + +## Verifying broadcast join is active + +Use `EXPLAIN` to view the query execution plan: + +```sql +EXPLAIN SELECT ... +``` + +**For Subquery Pattern**, the plan will show: +- Temporary storage path for the broadcasted data +- The subquery is executed separately from the main query + +**For Enrichment Pattern**, the plan will show: +- Enrichment table positioned as the left side of the join +- No data repartitioning for the enrichment table + +--- + +## Troubleshooting + +### Issue: Broadcast join not triggering + +**Check**: +1. Does the subquery have both Aggregate and LIMIT? +2. Count subquery rows: `SELECT COUNT(*) FROM (subquery)` +3. Is it a two-table join only? +4. Verify configuration: `echo $ZO_FEATURE_BROADCAST_JOIN_LEFT_SIDE_MAX_ROWS` + +**For enrichment tables**: +1. Is the table loaded in memory as an enrichment table? +2. Is it on the LEFT side of the join? +3. Is the right side a simple table scan? + +### Issue: Performance degradation + +**Possible causes**: +- Broadcasted table is too large; adjust limits +- High network latency to object storage +- Query does not benefit from broadcast; try standard join + +**Solution**: Add more selective filters to reduce subquery size + +--- + +## Summary + +Broadcast Join optimizes distributed queries by broadcasting small datasets instead of shuffling large ones. OpenObserve supports two patterns: + +**Enrichment tables**: Pre-loaded reference tables joined with large datasets with zero network overhead + +**Subquery pattern**: IN-list queries with limited results showing 99.9% reduction in data transfer + +Both patterns activate automatically when conditions are met, requiring minimal configuration while delivering substantial performance improvements. + +**Key takeaways**: +- 43% faster queries verified in production +- 99.9% less network data transfer +- Automatic optimization without requiring query rewriting +- Works across superclusters +- Enterprise-ready with configurable limits \ No newline at end of file diff --git a/docs/user-guide/pipelines/pipelines.md b/docs/user-guide/pipelines/pipelines.md index d9722a07..afbe8ddd 100644 --- a/docs/user-guide/pipelines/pipelines.md +++ b/docs/user-guide/pipelines/pipelines.md @@ -34,6 +34,9 @@ Use real-time pipelines when you need immediate processing, such as monitoring l A scheduled pipeline automates the processing of historical data from an existing stream at user-defined intervals. This is useful when you need to extract, transform, and load (ETL) data at regular intervals without manual intervention. ![Scheduled Pipelines in OpenObserve](../../images/pipelines-new-%20scheduled.png) +!!! note "Performance" + OpenObserve maintains a cache for scheduled pipelines to prevent the alert manager from making unnecessary database calls. This cache becomes particularly beneficial when the number of scheduled pipelines is high. For example, with 500 scheduled pipelines, the cache eliminates 500 separate database queries each time the pipelines are triggered, significantly improving performance. + #### How they work 1. **Source**: To create a scheduled pipeline, you need an existing stream, which serves as the source stream. @@ -44,7 +47,7 @@ A scheduled pipeline automates the processing of historical data from an existin ![Scheduled Pipelines Transform in OpenObserve](../../images/pipeline-new-scheduled-condition.png) 4. **Destination**: The transformed data is sent to the following destination(s) for storage or further processing: - **Stream**: The supported destination stream types are Logs, Metrics, Traces, or Enrichment tables.
**Note**: Enrichment Tables can only be used as destination streams in scheduled pipelines. - - **Remote**: Select **Remote** if you wish to send data to [external destination](#external-pipeline-destinations). + - **Remote**: Select **Remote** if you wish to send data to [external destination](https://openobserve.ai/docs/user-guide/pipelines/remote-destination/). #### Frequency and Period The scheduled pipeline runs based on the user-defined **Frequency** and **Period**. @@ -60,20 +63,6 @@ The scheduled pipeline runs based on the user-defined **Frequency** and **Period #### When to use Use scheduled pipelines for tasks that require processing at fixed intervals instead of continuously, such as generating periodic reports and processing historical data in batches. -## External Pipeline Destinations -OpenObserve allows you to route pipeline data to external destinations. - -To configure an external destination for pipelines: - -1. Navigate to the **Pipeline Destination** configuration page. You can access the configuration page while setting up the remote pipeline destination from the pipeline editor or directly from **Management** (Settings icon in the navigation menu) > **Pipeline Destinations** > **Add Destination**. -2. In the **Add Destination** form, provide a descriptive name for the external destination. -3. Under **URL**, specify the endpoint where the data should be sent. -4. Select the HTTP method based on your requirement. -5. Add headers for authentication. In the **Header** field, enter authentication-related details (e.g., Authorization). In the **Value** field, provide the corresponding authentication token. -6. Use the toggle **Skip TLS Verify** to enable or disable Transport Layer Security (TLS) verification.
-**Note**: Enable the **Skip TLS Verify** toggle to bypass security and certificate verification checks for the selected destination. Use with caution, as disabling verification may expose data to security risks. You may enable the toggle for development or testing environments but is not recommended for production unless absolutely necessary. -![Remote Destination](../../images/pipeline-new-remote-destination.png) - ## Next Steps - [Create and Use Pipelines](../use-pipelines/) diff --git a/docs/user-guide/streams/.pages b/docs/user-guide/streams/.pages index 18f2a1a2..60604675 100644 --- a/docs/user-guide/streams/.pages +++ b/docs/user-guide/streams/.pages @@ -4,4 +4,7 @@ nav: - Stream Details: stream-details.md - Schema Settings: schema-settings.md - Extended Retention: extended-retention.md - - Summary Streams: summary-streams.md \ No newline at end of file + - Summary Streams: summary-streams.md + - Field and Index Types in Streams: fields-and-index-in-streams.md + - Query Recommendations Stream: query-recommendations.md + \ No newline at end of file diff --git a/docs/user-guide/streams/data-type-and-index-type-in-streams.md b/docs/user-guide/streams/data-type-and-index-type-in-streams.md new file mode 100644 index 00000000..31d97842 --- /dev/null +++ b/docs/user-guide/streams/data-type-and-index-type-in-streams.md @@ -0,0 +1,76 @@ +This guide explains how to define and configure stream fields in OpenObserve, including supported field types and index options. + + +## Define stream fields + +When creating a stream, each field requires three components: +![define strean fields](../../images/stream-fields.png) + +1. **Field Name**: A unique identifier for the field within the stream. +2. **Data Type**: The data type that values in this field will use. +3. **Index Type**: The indexing strategy to optimize query performance. + +## Data types + +Each field in a stream must have a defined data type. OpenObserve supports the following: + +- **utf8**: Text fields. Use for messages, names, or IDs. +- **int64**: Signed 64-bit integers. Use for counts, timestamps, or status codes. +- **uint64**: Unsigned 64-bit integers. Use for numeric IDs or large counters. +- **float64**: Floating point numbers. Use for durations, sizes, or metrics. +- **boolean**: True or false flags. Use for condition checks or binary states. + +Maintain consistent data types across all records. Mixed types in the same field can cause indexing and query issues. + +## Full text search + +Each field may use one index type to accelerate queries. Once applied, it cannot be changed. Choose based on query patterns and field characteristics. + +- **Best for**: utf8 fields with unstructured text such as logs or messages. +- **Common filters**: `match_all('timeout')`. +- **Performance**: High speed for large text searches. +- **Storage impact**: Approximately 25 percent overhead. +- **Avoid if**: The field contains structured values like hostnames or IDs. + +## Secondary index + +- **Best for**: Fields with repeated values such as status codes or namespaces. +- **Common filters**: Equality or list inclusion, such as `status_code = 500` or `namespace IN ('core', 'admin')`. +- **Performance**: Fast exact-match lookups. +- **Storage impact**: Moderate and grows with cardinality. +- **Avoid if**: The field has mostly unique values. + +## Bloom filter + +- **Best for**: High-cardinality fields such as trace_id or request_id. +- **Common filters**: Specific equality filters, such as `trace_id = 'abc123'`. +- **Performance**: Quickly skips unrelated data. +- **Storage impact**: Minimal. +- **Avoid if**: The field has few distinct values or if you filter using ranges. + +## Key-value partition + +- **Best for**: Low-cardinality fields such as environment or service. +- **Common filters**: Exact value filters, such as `environment = 'prod'`. +- **Performance**: Reads only relevant partition. +- **Storage impact**: Efficient if files remain large. +- **Avoid if**: The field has many distinct values. + +## Prefix partition + +- **Best for**: utf8 fields with repeated prefixes such as hexadecimal identifiers. +- **Common filters**: Filters for specific full values, such as `session_id = '1a3f...'`, where '1a' is the prefix. +- **Performance**: Limits scanning to a group based on prefix. +- **Storage impact**: Lower index size and improved compression. +- **Avoid if**: The field has no meaningful prefix structure. + +## Hash partition + +- **Best for**: Fields with many values that are unevenly distributed. +- **Common filters**: Equality filters, such as `user_id = 'u4567'`. +- **Performance**: Distributes records evenly into fixed partitions. +- **Storage impact**: Balanced partition sizes and predictable access. +- **Avoid if**: The field has only a few distinct values. + + + diff --git a/docs/user-guide/streams/extended-retention.md b/docs/user-guide/streams/extended-retention.md index 2b95ee32..af3b2301 100644 --- a/docs/user-guide/streams/extended-retention.md +++ b/docs/user-guide/streams/extended-retention.md @@ -3,16 +3,14 @@ description: >- Retain key stream data beyond default limits with Extended Retention in OpenObserve. Preserve specific time ranges in streams for up to 10 years. --- -## Extended Retention - -The Extended Retention feature in **Stream Details** allows you to retain specific segments of your stream data beyond the configured stream-level or global retention period. - -## When to Use Extended Retention +The **Extended Retention** feature in **Stream Details** allows you to retain specific segments of your stream data beyond the configured stream-level or global retention period. +## When to use extended retention This feature is helpful when you want to preserve logs, metrics, or traces related to specific incidents or investigations. For example, logs from a known incident that occurred last month, you can configure extended retention for those specific time ranges. -## How to Apply Extended Retention +## How to apply extended retention +![Streams Extended Retention](../../images/extended-retention.png) 1. From the **Streams** page, select the **Explore** icon from the **Actions** column. 2. Navigate to [**Stream Details**](../../user-guide/streams/stream-details.md#access-the-stream-details). @@ -23,8 +21,6 @@ For example, logs from a known incident that occurred last month, you can config !!! Note The data within the selected time range will be retained for an additional 3,650 days. -![Streams Extended Retention](../../images/extended-retention.png) - !!! Note - **Flexible Range Selection**: You can add multiple time ranges. This enables retaining non-contiguous segments of data, such as a two-day period this week and another from the previous month. - **Retain Within Allowed Window**: You can only apply extended retention to data that currently exists. You cannot select ranges older than the current retention limit because that data has already been deleted and cannot be restored. diff --git a/docs/user-guide/streams/index.md b/docs/user-guide/streams/index.md index 445294f0..222df8fd 100644 --- a/docs/user-guide/streams/index.md +++ b/docs/user-guide/streams/index.md @@ -1,9 +1,11 @@ Streams define how observability data is ingested, stored, indexed, and queried in OpenObserve. This guide introduces key concepts related to streams and explains how to create and use them. -Learn more: - -- [Streams in OpenObserve](streams-in-openobserve.md) -- [Stream Details](stream-details.md) -- [Schema Settings](schema-settings.md) -- [Extended Retention](extended-retention.md) -- [Summary Streams](summary-streams.md) \ No newline at end of file +!!! note "Learn more:" + - ### [Streams in OpenObserve](streams-in-openobserve.md) + - ### [Stream Details](stream-details.md) + - ### [Schema Settings](schema-settings.md) + - ### [Extended Retention](extended-retention.md) + - ### [Summary Streams](summary-streams.md) + - ### [Data Type and Index Types in Streams](data-type-and-index-type-in-streams.md) + - ### [Field and Index Types in Streams](fields-and-index-in-streams.md) + - ### [Query Recommendations Stream](query-recommendations.md) \ No newline at end of file diff --git a/docs/user-guide/streams/query-recommendations.md b/docs/user-guide/streams/query-recommendations.md new file mode 100644 index 00000000..c3388e0b --- /dev/null +++ b/docs/user-guide/streams/query-recommendations.md @@ -0,0 +1,69 @@ +--- +title: Query Recommendations Stream in OpenObserve +description: Understand the purpose, structure, and usage of the query_recommendations stream in the _meta organization in OpenObserve. +--- + +This document explains the function and application of the query_recommendations stream within the _meta organization in OpenObserve. It provides guidance for users who want to optimize query performance using system-generated recommendations based on observed query patterns. + +!!! info "Availability" + This feature is available in Enterprise Edition. + +## Overview +OpenObserve continuously analyzes user queries across streams to identify optimization opportunities. These suggestions are stored in the `query_recommendations` stream under the `_meta` organization. The recommendations focus on improving performance by suggesting secondary indexes when patterns in field access indicate consistent and potentially costly lookups. + + +!!! note "Where to find it" + The query recommendations are published into the `query_recommendations` stream under the `_meta` organization. + ![select-query-recommendations](../../images/select-query-recommendations.png) + +!!! note "Who can access it" + All Enterprise Edition users with access to the `_meta` organization can access the `query_recommendations` stream. + +!!! note "When to use it" + Use this stream when: + + - You notice slow query performance for specific fields or patterns. + - You are planning schema-level optimizations. + - You want to validate whether frequently queried fields would benefit from indexing. + +## How to use it +1. Switch to the `_meta` organization in OpenObserve. +2. Go to the **Logs** section. +3. From the stream selection dropdown, select the `query_recommendations` stream. +4. Select the desired time range. +5. Click **Run query**. +![use-query-recommendations](../../images/use-query-recommendations.png) + +## Field descriptions +| Field | Description | +|-----------------------|-----------------------------------------------------------------------------| +| `_timestamp` | Time when the recommendation was recorded. | +| `column_name` | Field name in the stream that the recommendation applies to. | +| `stream_name` | The stream where this field was queried. | +| `all_operators` | All operators observed for the field (example: =, >, <). | +| `operator` | Primary operator considered for recommendation. | +| `occurrences` | Number of times the field was queried with the specified operator. | +| `total_occurrences` | Total number of queries examined. | +| `num_distinct_values` | Count of distinct values seen in the field. | +| `duration_hrs` | Duration (in hours) over which this pattern was observed. | +| `reason` | Explanation behind the recommendation. | +| `recommendation` | Specific action suggested (typically, create secondary index). | +| `type` | Always `SecondaryIndexStreamSettings` for this stream. | + +## Examples and how to interpret them + +**Example 1**
+![example-1-query-recommendations](../../images/example-1-query-recommendations.png) +This recommendation indicates that across the last 360000000 hours of query data, the job field in the `default` stream was queried with an equality (`=`) operator 1220 times out of 1220 total queries. Since all queries used this field with the `=` operator, a secondary index could improve performance. + +!!! note "Interpretation" + Add a secondary index on the `job` field in the `default` stream for improved performance. + +
+ +**Example 2**
+![example-2-query-recommendations](../../images/example-2-query-recommendations.png) +This recommendation is for the `status` field in the `alert_test` stream. All 5 queries used `status` with an equality operator. Although the number is small, the uniform pattern indicates a potential for future optimization. + +!!! note "Interpretation" + Consider indexing status if query volume increases or performance becomes a concern. \ No newline at end of file diff --git a/docs/user-guide/streams/schema-settings.md b/docs/user-guide/streams/schema-settings.md index dd4022cb..5ff52bf5 100644 --- a/docs/user-guide/streams/schema-settings.md +++ b/docs/user-guide/streams/schema-settings.md @@ -3,16 +3,18 @@ description: >- Manage stream schemas in OpenObserve to optimize query performance. Define field types, set index types, and enable user-defined schemas for efficiency. --- -The **Schema Settings** tab in [Stream Details](stream-details.md) allows you to inspect and manage the schema used to store and query ingested data. A schema defines the structure of log data in a stream, including: +The **Schema Settings** tab in [Stream Details](stream-details.md) allows you to inspect and manage the schema used to store and query ingested data. A schema defines the structure of log data within a stream and includes: -- The fields present in the logs -- The detected data types for each field -- The index type set for the field by user +- The fields detected during ingestion +- The inferred data types for each field +- The index type assigned to each field +- Optional sensitive data redaction rules Each field represents a key from the JSON log, automatically detected during ingestion. Fields are shown with their name, inferred data type, and any associated index. -## Field Type Detection +![schema settings](../../images/stream-details-schema-settings.png) +## Field type When OpenObserve receives logs, it automatically infers the data type of each field. For example: @@ -20,20 +22,13 @@ For example: - `58.0` as `Float64` - `"58%"` as `Utf8` -![schema settings field type detection](../../images/schema-settings-fieldtype-detection.png) +## Index Type +You can modify or assign an index type to a field to improve search performance. Indexing can reduce the amount of data that must be scanned during queries. -!!! Note - **Update the field type with caution.** - Once a field is detected as a certain type, changing the type in future log entries, for example, from **Utf8** to **Int64**, can result in inconsistent search behavior. It is recommended to maintain consistent field types across all log entries. +To learn more, visit the [Fields and Index in Streams](streams/fields-and-index-in-streams) page. -## Index Types - -You can assign an index type to a field to improve search performance. Indexing can reduce the amount of data that must be scanned during queries. OpenObserve supports multiple index types, such as KeyValue filters and hash partitions. - -![schema settings index types](../../images/schema-settings-index-type.png) !!! Warning - Once an index type is applied to a field and saved, it can only be **disabled.** - You cannot assign a new index type to that field after disabling the current one. This limitation exists because index types affect how data is stored on disk. Changing the index after storing data may lead to inconsistent query results or data retrieval failures. + Changing the index after storing data may lead to inconsistent query results or data retrieval failures. ## User-Defined Schema (UDS) @@ -47,7 +42,7 @@ All other fields will either be ignored or stored in a special `_raw` field if t To enable UDS support, set the following environment variable `ZO_ALLOW_USER_DEFINED_SCHEMAS` to `true` . -## How to Add a User-Defined Schema +### How to add a User-Defined Schema 1. From the **Streams** page, click the **Stream Details** option under the **Actions** column. 2. Go to the **Schema Settings** tab. @@ -55,9 +50,8 @@ To enable UDS support, set the following environment variable `ZO_ALLOW_USER_DEF 4. Click **Add to Defined Schema**. 5. Save your changes using the **Update Settings** button. -![schema settings user defined schema](../../images/schema-settings-user-defined-schemas.png) -Once this is done: +After you save the changes: - The schema interface switches to show **User Defined Schema** and **Other Fields** tabs. - Only fields under **User Defined Schema** will be searchable. @@ -65,6 +59,12 @@ Once this is done: You can also manually add a field to the schema using the **Add Field(s)** button. This is useful when a field may not have appeared in the logs yet but is expected later. For example, an `error_code` field that appears only during failures can be added before the actual error happens using this. + +## Sensitive Data Redaction (SDR) +Sensitive Data Redaction (SDR) lets you redact or drop sensitive data during ingestion or at query time using regex-based rules. + +For detailed steps to create and manage SDR rules, refer to the [Sensitive Data Redaction](https://openobserve.ai/docs/user-guide/management/sensitive-data-redaction/) guide. + ## Next Steps - [Extended Retention](extended-retention.md) diff --git a/docs/user-guide/streams/stream-details.md b/docs/user-guide/streams/stream-details.md index d8f40653..67b2fcb3 100644 --- a/docs/user-guide/streams/stream-details.md +++ b/docs/user-guide/streams/stream-details.md @@ -3,51 +3,81 @@ description: >- View and manage detailed stream settings in OpenObserve, including retention, query limits, schema, and usage stats via the Stream Details panel. --- -After After you complete [data ingestion](streams-in-openobserve.md#ingest-data-into-stream), use **Stream Details** to view configuration and usage information for a stream and update its settings. +After After you complete [data ingestion](streams-in-openobserve.md#ingest-data-into-stream), use **Stream Details** to view stream details and manage stream settings. -This guide explains how to open **Stream Details** and the information it displays. +## Overview +Use **Stream Details** to inspect schemas, apply sensitive-data redaction using regex patterns, set data retention and query limits, and configure ingestion options such as flatten level, storing original data, and enabling distinct values. -## Access the Stream Details -From the **Streams** page, select the **Explore** icon from the **Actions** column. -![stream details access](../../images/stream-details-access.png) +!!! note "Where to find it" + 1. Go to **Streams**. + 2. In the **Actions** column, select **Stream Details**. + ![stream details access](../../images/stream-details-access.png) -## Permission -User roles that have permission to update Streams can modify the stream settings from the **Stream Details** page. The permission needs to be assigned to appropriate user roles using role-based access control (RBAC). +!!! note "Who can access it" + User roles that have permission to update Streams can modify the stream settings from the **Stream Details** page. The permission needs to be assigned to appropriate user roles using role-based access control (RBAC). -![stream details permission](../../images/stream-details-permission.png) -## Stream Details -**Stream Details** shows key configuration and usage information for the selected stream: +## Stream details +The header cards in the **Stream Details** page shows high-level status for the selected stream: -![stream details](../../images/stream-details.png) - -## General Information - -- **Stream Name**: Name of the selected stream. -- **Docs Count**: Total number of ingested events. +- **Stream Name**: Name of the selected stream. +![stream name](../../images/stream-name.png) +- **Events**: Total number of ingested events. +![stream details](../../images/stream-details.png) - **Ingested Data**: Uncompressed size of the stored data. - **Compressed Size**: Storage space used after compression. - **Index Size**: Size of the tantivy files generated for full text search index. Other index types, such as KeyValue filters and hash partitions, do not affect this value. -- **Start Time and End Time**: Start time is the timestamp of the oldest data and end time is the timestamp of the newest data. +![stream-start-end-time](../../images/stream-start-end-time.png) +- **Time range**: Start time is the timestamp of the oldest data and end time is the timestamp of the newest data. If the ingested data has a `_timestamp` field, it will be according to that. If the ingested data does not have a ` _timestamp` field, then the start time will be the oldest time of ingestion and end time will be the newest time of ingestion. + +## Schema Settings +Inspect and manage the schema of a stream.
+This tab allows you to review detected fields, assign index types, define user-defined schemas, and apply sensitive data redaction patterns.
+![schema settings](../../images/stream-details-schema-settings.png) +
+To learn more about Schema Settings, visit the [Schema Settings](https://openobserve.ai/docs/user-guide/streams/schema-settings/) page. + + +## Extended Retention +![extended retention](../../images/extended-retention.png) +Allows you to retain specific segments of your stream data beyond the configured stream-level or global retention period. To learn more, see the [Extended Retention](https://openobserve.ai/docs/user-guide/streams/extended-retention/) page. + +## Configuration +The **Configuration** tab provides options to configure stream-level limits and ingestion behavior. +![stream-details-configuration](../../images/stream-details-configuration.png) + +### Data Retention in days +Sets how long data is retained in this stream. If not configured, the global retention period applies. Default global is 30 days. + +### Max Query Range in hours +Sets the maximum time span allowed per query. This can help reduce query load. Note that this is stream and org specific. + +- You can set a global value as the maximum query range, for all streams across all organizations using the following environment variable: +``` +ZO_DEFAULT_MAX_QUERY_RANGE_DAYS +``` +However, when a non-zero Max query range is set on a stream, the value set through **Stream Details** overrides the global value.
By default, both the environment variable and the **Max Query Range** value are set to zero, which means there is no limit. -!!! Note +### Flatten Level + +Controls how deeply nested JSON is flattened during ingestion for this stream. +Flattening converts nested structures into dot-delimited keys. - - If the ingested data has a `_timestamp` field, it will be according to that. - - If the ingested data does not have a ` _timestamp` field, then the start time will be the oldest time of ingestion and end time will be the newest time of ingestion. +- Streams can have different nesting depths. Set a suitable level here without changing the global default. +- The flatten level accepts non-negative integers. A value of 0 means no limit, all nested fields are flattened. +If left blank, the stream inherits the global ZO_INGEST_FLATTEN_LEVEL. The UI displays the current global value for context. -## Retention and Query Settings +### Use Stream Stats for Partitioning +When you enable this toggle, OpenObserve assumes that all your data present in the stream is equally split and creates equal sized partitions between the start and end time. -- **Data Retention in Days**: Sets how long data is retained in this stream. If not configured, the global retention period applies. By default, this is 30 days. -- **Max Query Range in Hours**: Sets the maximum time span allowed per query. This can help reduce query load. Note that this is stream and org specific. +### Store Original Data +Keeps the raw, original document alongside parsed fields. +Useful for audits and reprocessing. Increases storage because raw payloads are preserved. - You can set a global value as the maximum query range, for all streams across all organizations using the following environment variable: +### Enable Distinct Values +Enables tracking of distinct values for fields to speed up filters and aggregations that rely on unique values. +Improves responsiveness on high-cardinality fields. - ``` - ZO_DEFAULT_MAX_QUERY_RANGE_DAYS - ``` - > However, when a non-zero Max query range is set on a stream, the value set through **Stream Details** overrides the global value.
By default, both the environment variable and the **Max Query Range** value are set to zero, which means there is no limit. - -- **Use Stream Stats for Partitioning**: When you enable this toggle, OpenObserve assumes that all your data present in the stream is equally split and creates equal sized partitions between the start and end time. ## Troubleshooting diff --git a/docs/user-guide/streams/streams-in-openobserve.md b/docs/user-guide/streams/streams-in-openobserve.md index 184e1eef..55637807 100644 --- a/docs/user-guide/streams/streams-in-openobserve.md +++ b/docs/user-guide/streams/streams-in-openobserve.md @@ -6,7 +6,7 @@ description: >- --- Streams define how observability data is ingested, stored, indexed, and queried in OpenObserve. This guide introduces key concepts related to streams and explains how to create and use them. -## What Is a Stream +## What is a stream? A stream is a logical container that holds one type of observability data, such as logs, metrics, or traces. It is the required entry point for data ingestion in OpenObserve. Every log, metric, or trace must be associated with a stream at the time of ingestion. @@ -59,6 +59,8 @@ The following steps vary for **Cloud** and **Self-hosted** deployment: > If RBAC is enabled, ensure that you have required permissions to create streams. === "Create Streams in OpenObserve Cloud" + ### Create streams in OpenObserve Cloud + 1. Select the organization from the top navigation bar. 2. From the left navigation menu, select **Streams**. 3. Click **Add Stream.** @@ -68,13 +70,20 @@ The following steps vary for **Cloud** and **Self-hosted** deployment: - Select the **Stream Type**. - Specify the **Data Retention** in days. For example, enter 14 to keep data for 14 days after ingestion. When the period ends, OpenObserve removes the data automatically. To keep data longer, select **Extended Retention** in the Stream Details sidebar. - - (Optional) Use the **Add Fields**, if you wish to create fields to the **User Defined Schema**. Learn more about [user defined schema](../../user-guide/streams/schema-settings.md#user-defined-schema-uds). + - (Optional) Use the **Add Fields** section if you wish to define fields for your stream: + - **Field Name**: Name of the field + - **Data Type**: Select from utf8, int64, uint64, float64, or boolean + - **Index Type**: Choose an indexing strategy (Secondary Index, Full Text Search, KeyValue Partition, Prefix Partition, or Hash Partition)
+ For detailed information about each field type and index strategy, see [Field and Index Types in Streams](fields-and-index-in-streams.md) + + These fields create a User Defined Schema. Learn more about [user defined schema](../../user-guide/streams/schema-settings.md#user-defined-schema-uds). 5. Click **Create Stream**. The new stream appears on the Streams page. Ingest data into the stream to populate and start using it. -=== "Create Streams in OpenObserve Self-Hosted" +=== "Create streams in OpenObserve self-hosted" + ### Create streams in OpenObserve self-hosted 1. Select the organization from the top navigation bar. 2. From the left navigation menu, select **Streams**. 3. Click **Add Stream.** @@ -84,7 +93,12 @@ The following steps vary for **Cloud** and **Self-hosted** deployment: - Select the **Stream Type**. - Specify the **Data Retention** in days. For example, enter 14 to keep data for 14 days after ingestion. When the period ends, OpenObserve removes the data automatically. To keep data longer, select **Extended Retention** in the Stream Details sidebar. - - In the **Add Fields** section, you must define at least one field name and field type. This creates a user-defined schema at stream creation. + - In the **Add Fields** section, you must define at least one field with: + - **Field Name**: Name of the field + - **Data Type**: Select from utf8, int64, uint64, float64, or boolean + - **Index Type**: Choose an indexing strategy (Secondary Index, Full Text Search, KeyValue Partition, Prefix Partition, or Hash Partition)
+ For detailed information about each field type and index strategy, see [Field and Index Types in Streams](fields-and-index-in-streams.md) + This creates a user-defined schema at stream creation. Learn more about [user defined schema](../../user-guide/streams/schema-settings.md#user-defined-schema-uds). ??? info "Click to see how User-defined Schema works." Let us say you define the following fields while creating a stream: @@ -98,7 +112,7 @@ The following steps vary for **Cloud** and **Self-hosted** deployment: - These fields (job, code, message) will appear under User Defined Schema in the [Stream Details](../../user-guide/streams/stream-details.md) page. - Any additional fields not defined earlier will appear under All Fields. - This creates a user-defined schema at stream creation. Learn more about [user defined schema](../../user-guide/streams/schema-settings.md#user-defined-schema-uds). + 5. Click **Create Stream**. !!! Note diff --git a/overrides/partials/footer.html b/overrides/partials/footer.html index 7b616e4a..181f9d89 100644 --- a/overrides/partials/footer.html +++ b/overrides/partials/footer.html @@ -183,7 +183,8 @@ // Configuration const NEWSLETTER_CONFIG = window.NEWSLETTER_CONFIG || { RECAPTCHA_SITE_KEY: "6LdvUdMrAAAAAJksqV0YEwNBEBGL2SB90Gebun5n", - NEWSLETTER_ENDPOINT: "https://5cciazu22tev222enhrbx6w3y40hscvu.lambda-url.us-east-2.on.aws", + NEWSLETTER_ENDPOINT: + "https://5cciazu22tev222enhrbx6w3y40hscvu.lambda-url.us-east-2.on.aws", FALLBACK_ENDPOINT: "https://5cciazu22tev222enhrbx6w3y40hscvu.lambda-url.us-east-2.on.aws", }; @@ -1062,21 +1063,29 @@ footerData.forEach((section) => { const column = document.createElement("div"); column.innerHTML = ` -
${ - section.title - }
- - `; +
+ ${section.title} +
+ + `; container.appendChild(column); }); diff --git a/overrides/partials/header.html b/overrides/partials/header.html index 5f483430..85ffeb92 100644 --- a/overrides/partials/header.html +++ b/overrides/partials/header.html @@ -152,7 +152,7 @@