diff --git a/docs/images/distinct-values-access.png b/docs/images/distinct-values-access.png new file mode 100644 index 00000000..2dad0bec Binary files /dev/null and b/docs/images/distinct-values-access.png differ diff --git a/docs/images/metadata-distinct-values.png b/docs/images/metadata-distinct-values.png new file mode 100644 index 00000000..708489e7 Binary files /dev/null and b/docs/images/metadata-distinct-values.png differ diff --git a/docs/user-guide/actions/actions-in-openobserve.md b/docs/user-guide/actions/actions-in-openobserve.md index edaa739b..2d2fcd98 100644 --- a/docs/user-guide/actions/actions-in-openobserve.md +++ b/docs/user-guide/actions/actions-in-openobserve.md @@ -6,7 +6,7 @@ description: >- This guide explains what Actions are, their types, and use cases. !!! info "Availability" - This feature is available in Enterprise Edition and Cloud. Not available in Open Source. + This feature is available in Enterprise Edition. Not available in Open Source and Cloud. ## What are Actions Actions in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations. diff --git a/docs/user-guide/streams/distinct-values.md b/docs/user-guide/streams/distinct-values.md index e69de29b..23f780cc 100644 --- a/docs/user-guide/streams/distinct-values.md +++ b/docs/user-guide/streams/distinct-values.md @@ -0,0 +1,46 @@ +--- +title: Distinct Values Stream in OpenObserve +description: ollects unique values during ingestion, stores them in metadata streams, and supports faster distinct queries in OpenObserve. +--- +This document explains how the distinct values feature in OpenObserve works. +## Overview +The distinct values feature automatically collects unique values for a stream when data is ingested. The system writes these values to disk at a defined interval. Distinct values are stored in a special stream named `distinct_values`, which is used to accelerate distinct queries. +!!! note "Who can access it" + By default, the `Root` user has access. Access for other users is managed through **IAM** permissions in the **Metadata** module. + + ![access to distinct values stream](../../images/distinct-values-access.png) +!!! note "Where to find it" + Distinct values are written into automatically created metadata streams. The naming pattern is `distinct_values__`. For example, For example: `distinct_values_logs_default` and `distinct_values_logs_k8s_events`. +## Environment Variables +| Variable | Description | Default | +| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | +| `ZO_DISTINCT_VALUES_INTERVAL` | Defines how often distinct values collected during ingestion are written from memory to the `distinct_values` stream on disk. This prevents frequent small writes by batching distinct values at the configured interval. | `10s` | +| `ZO_DISTINCT_VALUES_HOURLY` | Enables hourly deduplication of distinct values stored in the `distinct_values` stream. When set to true, repeated values within one hour are merged into a single record, and a count of occurrences is logged. | `false` | +## How it works +1. During ingestion, OpenObserve automatically collects distinct values for each stream. +2. These values are stored in memory and written to disk in the `distinct_values__` under **Streams > Metadata** at intervals defined by `ZO_DISTINCT_VALUES_INTERVAL`. +![metadata distinct values](../../images/metadata-distinct-values.png) +3. If `ZO_DISTINCT_VALUES_HOURLY` is enabled, values in the `distinct_values` stream are further deduplicated at the hourly level, with counts aggregated. +- The `distinct_values` streams help accelerate `DISTINCT` queries by using pre-computed distinct values instead of scanning all ingested logs. +## Example +Ingested data: +```json +2025/09/10T10:00:01Z, job=test, level=info, service=test, request_id=123 +2025/09/10T10:00:02Z, job=test, level=info, service=test, request_id=124 +2025/09/10T10:01:03Z, job=test, level=info, service=test, request_id=123 +2025/09/10T10:10:00Z, job=test, level=info, service=test, request_id=123 +2025/09/10T11:10:00Z, job=test, level=info, service=test, request_id=123 +``` +With `ZO_DISTINCT_VALUES_INTERVAL=10s`, the system first collects values in memory and then writes to disk: +```yaml +2025/09/10T10:00:01Z request_id: 123, count: 2 +2025/09/10T10:00:02Z request_id: 124, count: 1 +2025/09/10T10:10:02Z request_id: 123, count: 1 +2025/09/10T11:10:02Z request_id: 123, count: 1 +``` +If `ZO_DISTINCT_VALUES_HOURLY=true`, the system merges values by hour: +```yaml +2025/09/10T10:00:01Z request_id: 123, count: 3 +2025/09/10T10:00:02Z request_id: 124, count: 1 +2025/09/10T11:10:02Z request_id: 123, count: 1 +``` \ No newline at end of file