diff --git a/.github/workflows/deploy-docs-staging.yaml b/.github/workflows/deploy-docs-staging.yaml index 27dc69c9..d9a3c30b 100644 --- a/.github/workflows/deploy-docs-staging.yaml +++ b/.github/workflows/deploy-docs-staging.yaml @@ -49,8 +49,8 @@ jobs: - name: Deploy to S3 run: | - aws s3 sync ./site s3://openobserve-website-staging/docs --exclude=".git/*" + aws s3 sync ./site s3://openobserve-website-staging/docs --exclude=".git/*" --delete - name: Invalidate CloudFront cache run: | - aws cloudfront create-invalidation --distribution-id E2GZJM0TJIDFRM --paths "/docs/*" \ No newline at end of file + aws cloudfront create-invalidation --distribution-id E2GZJM0TJIDFRM --paths "/docs/*" diff --git a/docs/images/config-remote-destination-header.png b/docs/images/config-remote-destination-header.png new file mode 100644 index 00000000..a831376a Binary files /dev/null and b/docs/images/config-remote-destination-header.png differ diff --git a/docs/images/config-remote-destination-headers.png b/docs/images/config-remote-destination-headers.png new file mode 100644 index 00000000..051e3248 Binary files /dev/null and b/docs/images/config-remote-destination-headers.png differ diff --git a/docs/images/config-remote-destination-method.png b/docs/images/config-remote-destination-method.png new file mode 100644 index 00000000..7f4282bf Binary files /dev/null and b/docs/images/config-remote-destination-method.png differ diff --git a/docs/images/config-remote-destination-output-format.png b/docs/images/config-remote-destination-output-format.png new file mode 100644 index 00000000..08f46792 Binary files /dev/null and b/docs/images/config-remote-destination-output-format.png differ diff --git a/docs/images/config-remote-destination.png b/docs/images/config-remote-destination.png new file mode 100644 index 00000000..9a130e34 Binary files /dev/null and b/docs/images/config-remote-destination.png differ diff --git a/docs/images/current-cluster-query-result.png b/docs/images/current-cluster-query-result.png new file mode 100644 index 00000000..752c7e96 Binary files /dev/null and b/docs/images/current-cluster-query-result.png differ diff --git a/docs/images/current-cluster-query.png b/docs/images/current-cluster-query.png new file mode 100644 index 00000000..64137626 Binary files /dev/null and b/docs/images/current-cluster-query.png differ diff --git a/docs/images/federated-search-multi-select.png b/docs/images/federated-search-multi-select.png new file mode 100644 index 00000000..13dd22a5 Binary files /dev/null and b/docs/images/federated-search-multi-select.png differ diff --git a/docs/images/federated-search-result.png b/docs/images/federated-search-result.png new file mode 100644 index 00000000..be0e0867 Binary files /dev/null and b/docs/images/federated-search-result.png differ diff --git a/docs/images/federated-search.png b/docs/images/federated-search.png new file mode 100644 index 00000000..0fd3bc4e Binary files /dev/null and b/docs/images/federated-search.png differ diff --git a/docs/images/remote-destination-config-from-management.png b/docs/images/remote-destination-config-from-management.png new file mode 100644 index 00000000..7893188d Binary files /dev/null and b/docs/images/remote-destination-config-from-management.png differ diff --git a/docs/images/remote-destination-config-from-pipeline-editor.png b/docs/images/remote-destination-config-from-pipeline-editor.png new file mode 100644 index 00000000..4e41f927 Binary files /dev/null and b/docs/images/remote-destination-config-from-pipeline-editor.png differ diff --git a/docs/images/remote-destination-from-pipeline-editor.png b/docs/images/remote-destination-from-pipeline-editor.png new file mode 100644 index 00000000..a93bc4d9 Binary files /dev/null and b/docs/images/remote-destination-from-pipeline-editor.png differ diff --git a/docs/images/sql-reference/not-str-match-with-and.png b/docs/images/sql-reference/not-str-match-with-and.png new file mode 100644 index 00000000..d650b52c Binary files /dev/null and b/docs/images/sql-reference/not-str-match-with-and.png differ diff --git a/docs/images/sql-reference/not-str-match-with-or.png b/docs/images/sql-reference/not-str-match-with-or.png new file mode 100644 index 00000000..1ca4c4f3 Binary files /dev/null and b/docs/images/sql-reference/not-str-match-with-or.png differ diff --git a/docs/images/sql-reference/not-str-match.png b/docs/images/sql-reference/not-str-match.png new file mode 100644 index 00000000..79bd0ead Binary files /dev/null and b/docs/images/sql-reference/not-str-match.png differ diff --git a/docs/images/use-pipeline-destination.png b/docs/images/use-pipeline-destination.png new file mode 100644 index 00000000..99511510 Binary files /dev/null and b/docs/images/use-pipeline-destination.png differ diff --git a/docs/operator-guide/.pages b/docs/operator-guide/.pages index e3b58f20..d5e12518 100644 --- a/docs/operator-guide/.pages +++ b/docs/operator-guide/.pages @@ -7,3 +7,4 @@ nav: - Etcd maintenance: etcd.md - Etcd restore: etcd_restore.md - Nginx proxy: nginx_proxy.md + - Configuration Management: config-management.md \ No newline at end of file diff --git a/docs/operator-guide/config-management.md b/docs/operator-guide/config-management.md new file mode 100644 index 00000000..9b707c74 --- /dev/null +++ b/docs/operator-guide/config-management.md @@ -0,0 +1,113 @@ +--- +title: Custom Configuration File and Dynamic Reloading in OpenObserve +description: Learn how to use custom config paths and dynamic config reloading in OpenObserve to apply changes without restarts. +--- +This guide explains how to use custom configuration file locations and +dynamic configuration reloading in OpenObserve to manage deployments +without system restarts. + +## Overview +Earlier versions of OpenObserve only read the **.env** file from the current +directory with no way to specify a different path. When running +OpenObserve in different deployment modes (Kubernetes, virtual machines, +and systemd services) existing frameworks require config files in +specific locations.

+Additionally, when configuration changes are needed, you need to restart +the whole cluster or the system, which is expensive. They cause +downtime, drop capacity and bandwidth, and you do not want to restart +your whole cluster just to change a simple configuration like increasing +an interval by 2 seconds. +

+OpenObserve now provides a custom config file CLI argument (`-c` or +`--config`) to specify config files at any location, and a Config Watcher +that monitors the config file and automatically reloads specific +configurations every 30 seconds without requiring a restart. + +!!! note "Who should use this" + These features are for you if you: + + - Deploy OpenObserve on virtual machines (VMs) + - Run OpenObserve as systemd services or system daemons + - Use custom deployment frameworks without container orchestration tools like Kubernetes + - Need to avoid expensive restarts that cause downtime + +!!! note "Who does NOT need this" + You do not need these features if you: + + - Deploy OpenObserve in Kubernetes clusters + - Use container orchestration platforms + + **Why Kubernetes users do not need this** +
+ In Kubernetes deployments: + + - Configurations are managed through ConfigMaps, Secrets, and environment variables in YAML manifests + - Any configuration change automatically triggers pod restarts and rollouts + - Kubernetes handles configuration updates through its native mechanisms + - The .env file pattern is not used in containerized deployments + +## Custom config file CLI argument + +### What it does +It adds a CLI argument to the OpenObserve binary that allows you to pass +a path to a config file, adding flexibility to load environment +variables from different files. Earlier versions strictly looked for the +.env file in the current directory. + +### How to use + +```bash linenums="1" +# Short form +./openobserve -c /path/to/your/config/file + +``` + +```bash linenums="1" +# Long form +./openobserve --config /path/to/your/config/file +``` + +## Dynamic configuration reload + +### What it does +It adds a job to watch the config file. The watcher monitors the config +file and reloads the configuration if any changes are detected, allowing +configuration updates without restarting OpenObserve. + +### How it works + +- Runs every 30 seconds by default (configurable via `ZO_CONFIG_WATCHER_INTERVAL` environment variable) +- Watches the active config file (.env by default, or custom file if specified using `-c` flag) +- Detects changes to configuration values +- Automatically reloads changes for reloadable configurations only +- If no config file exists, the watcher remains inactive + +### Workflow example + +1. OpenObserve is running with a config file +2. Edit the config file and change a reloadable configuration (for example, stream stats interval from 2 to 5 seconds) +3. Wait up to 30 seconds +4. Change is applied automatically without any restart and downtime + +### Limitation: Only specific configs are reloadable + +Any configuration not in the list below requires a full OpenObserve +restart to take effect. + +### Reloadable environment variables +| Category | Environment Variables | +|----------|----------------------| +| **Data Processing Intervals** | - ZO_FILE_PUSH_INTERVAL
- ZO_MEM_PERSIST_INTERVAL
- ZO_COMPACT_INTERVAL
- ZO_COMPACT_OLD_DATA_INTERVAL
- ZO_COMPACT_SYNC_TO_DB_INTERVAL
- ZO_COMPACT_JOB_RUN_TIMEOUT
- ZO_COMPACT_JOB_CLEAN_WAIT_TIME
- ZO_COMPACT_PENDING_JOBS_METRIC_INTERVAL | +| **Cache Management** | - ZO_MEMORY_CACHE_GC_INTERVAL
- ZO_MEMORY_CACHE_MAX_SIZE
- ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE
- ZO_DISK_CACHE_GC_INTERVAL
- ZO_DISK_CACHE_MAX_SIZE
- ZO_DISK_CACHE_SKIP_SIZE
- ZO_DISK_CACHE_RELEASE_SIZE
- ZO_DISK_CACHE_GC_SIZE
- ZO_S3_SYNC_TO_CACHE_INTERVAL | +| **Metrics and Monitoring** | - ZO_METRICS_LEADER_PUSH_INTERVAL
- ZO_METRICS_LEADER_ELECTION_INTERVAL
- ZO_CALCULATE_STATS_INTERVAL
- ZO_CALCULATE_STATS_STEP_LIMIT
- ZO_TELEMETRY_HEARTBEAT | +| **Scheduling** | - ZO_ALERT_SCHEDULE_INTERVAL
- ZO_DERIVED_STREAM_SCHEDULE_INTERVAL
- ZO_SCHEDULER_CLEAN_INTERVAL
- ZO_SCHEDULER_WATCH_INTERVAL
- ZO_SEARCH_JOB_SCHEDULE_INTERVAL
- ZO_SEARCH_JOB_SCHEDULER_INTERVAL
- ZO_DISTINCT_VALUES_INTERVAL | +| **Job and Process Timeouts** | - ZO_SEARCH_JOB_TIMEOUT
- ZO_SEARCH_JOB_RUN_TIMEOUT
- ZO_SEARCH_JOB_RETENTION
- ZO_SEARCH_JOB_DELETE_INTERVAL
- ZO_HEALTH_CHECK_TIMEOUT | +| **Memory and Buffer Sizes** | - ZO_MEMORY_CACHE_SKIP_SIZE
- ZO_MEMORY_CACHE_RELEASE_SIZE
- ZO_MEMORY_CACHE_GC_SIZE
- ZO_MEM_TABLE_MAX_SIZE | +| **File Sizes** | - ZO_MAX_FILE_SIZE_ON_DISK
- ZO_MAX_FILE_SIZE_IN_MEMORY
- ZO_MAX_FILE_RETENTION_TIME
- ZO_COMPACT_MAX_FILE_SIZE | +| **Network and Protocol Sizes** | - ZO_STREAMING_RESPONSE_CHUNK_SIZE_MB | +| **Batch Sizes** | - ZO_COMPACT_BATCH_SIZE
- ZO_EVENTS_BATCH_SIZE | +| **Pipeline Configuration** | - ZO_PIPELINE_OFFSET_FLUSH_INTERVAL
- ZO_PIPELINE_REMOTE_REQUEST_TIMEOUT
- ZO_PIPELINE_WAL_SIZE_LIMIT
- ZO_PIPELINE_BATCH_SIZE
- ZO_PIPELINE_BATCH_TIMEOUT_MS
- ZO_PIPELINE_BATCH_SIZE_BYTES
- ZO_PIPELINE_BATCH_RETRY_INITIAL_DELAY_MS
- ZO_PIPELINE_BATCH_RETRY_MAX_DELAY_MS
- ZO_PIPELINE_MAX_FILE_SIZE_ON_DISK_MB | +| **WAL and Buffer Configurations** | - ZO_WAL_WRITE_BUFFER_SIZE
- ZO_WAL_WRITE_QUEUE_SIZE | +| **Search Group Configuration** | - O2_SEARCH_GROUP_BASE_SECS
- O2_SEARCH_GROUP_BASE_SPEED
- O2_SEARCH_GROUP_LONG_MAX_CPU
- O2_SEARCH_GROUP_SHORT_MAX_CPU
- O2_SEARCH_GROUP_LONG_MAX_CONCURRENCY
- O2_SEARCH_GROUP_SHORT_MAX_CONCURRENCY
- O2_SEARCH_GROUP_LONG_MAX_MEMORY
- O2_SEARCH_GROUP_SHORT_MAX_MEMORY
- O2_SEARCH_GROUP_USER_SHORT_MAX_CONCURRENCY
- O2_SEARCH_GROUP_USER_LONG_MAX_CONCURRENCY | +| **AI Configuration** | - O2_AI_ENABLED
- O2_AI_API_URL
- O2_AI_MODEL
- O2_AI_PROVIDER
- O2_AI_API_KEY | +| **Other Configurations** | - ZO_ROUTE_MAX_CONNECTIONS
- ZO_ENRICHMENT_TABLE_MERGE_INTERVAL
- ZO_DOWNSAMPLING_DOWNSAMPLING_INTERVAL
- ZO_DATAFUSION_FILE_STAT_CACHE_MAX_ENTRIES
- ZO_SCHEMA_MAX_FIELDS_TO_ENABLE_UDS | diff --git a/docs/operator-guide/etcd_restore.md b/docs/operator-guide/etcd_restore.md index a0c30630..24f0c729 100644 --- a/docs/operator-guide/etcd_restore.md +++ b/docs/operator-guide/etcd_restore.md @@ -3,7 +3,10 @@ description: >- Restore a broken etcd cluster in OpenObserve by restarting pods, resetting data, and rejoining members using CLI and updated Helm configs. --- -# Etcd Cluster Restore (Deprecated) +# Etcd Cluster Restore (Removed) + +!!! warning "Removal notice" + Etcd support has been removed. Use NATS instead. Many users ran into the case only one of the 3 pods of etcd cluster can works. The other 2 pods always restart and can't back to work. diff --git a/docs/sql-functions/full-text-search.md b/docs/sql-functions/full-text-search.md index 492f51c1..289747ab 100644 --- a/docs/sql-functions/full-text-search.md +++ b/docs/sql-functions/full-text-search.md @@ -25,6 +25,34 @@ This query filters logs from the `default` stream where the `k8s_pod_name` field
![str_match](../images/sql-reference/str-match.png) +--- + +### `not str_match` +**Syntax**: `not str_match(field, 'value')`
+**Description**:
+ +- Filters logs where the specified field does NOT contain the exact string value. +- The match is case-sensitive. +- Only logs that do not include the exact characters and casing specified will be returned. +- Can be combined with other conditions using AND/OR operators. + +**Example**:
+```sql +SELECT * FROM "default" WHERE NOT str_match(k8s_app_instance, 'dev2') +``` +![not str_match](../images/sql-reference/not-str-match.png) + +**Combining multiple NOT conditions with AND:** +```sql +SELECT * FROM "default" WHERE (NOT str_match(k8s_app_instance, 'dev2')) AND (NOT str_match(k8s_cluster, 'dev2')) +``` +![not str_match with AND operator](../images/sql-reference/not-str-match-with-and.png) + +**Combining NOT conditions with OR:** +```sql +SELECT * FROM "default" WHERE NOT ((str_match(k8s_app_instance, 'dev2') OR str_match(k8s_cluster, 'dev2'))) +``` +![not str_match with OR operator](../images/sql-reference/not-str-match-with-or.png) --- ### `str_match_ignore_case` @@ -46,7 +74,7 @@ SELECT * FROM "default" WHERE str_match_ignore_case(k8s_pod_name, 'MAIN-OPENOBSE This query filters logs from the `default` stream where the `k8s_pod_name` field contains any casing variation of `main-openobserve-ingester-1`, such as `MAIN-OPENOBSERVE-INGESTER-1`, `Main-OpenObserve-Ingester-1`, or `main-openobserve-ingester-1`.
![str_match_ignore_case](../images/sql-reference/str-ignore-case.png) - +
--- ### `match_all` @@ -69,6 +97,53 @@ This query returns all logs in the `default` stream where the keyword `openobser
![match_all](../images/sql-reference/match-all.png) + +**More pattern support** +The `match_all` function also supports the following patterns for flexible searching: + +- **Prefix search**: Matches keywords that start with the specified prefix: +```sql +SELECT * FROM "default" WHERE match_all('ab*') +``` +- **Postfix search**: Matches keywords that end with the specified suffix: +```sql +SELECT * FROM "default" WHERE match_all('*ab') +``` +- **Contains search**: Matches keywords that contain the substring anywhere: +```sql +SELECT * FROM "default" WHERE match_all('*ab*') +``` +- **Phrase prefix search**: Matches keywords where the last term uses prefix matching: +```sql +SELECT * FROM "default" WHERE match_all('key1 key2*') +``` +### `not match_all` +**Syntax**: `not match_all('value')` +**Description**:
+ +- Filters logs by excluding records where the keyword appears in any field that has the Index Type set to Full Text Search in the stream settings. +- This function is case-insensitive and excludes matches regardless of the keyword's casing. +- **Important**: Only searches fields configured as Full Text Search fields. Other fields in the record are not evaluated. +- Provides significant performance improvements when used with indexed fields. + +**Example**: +```sql +SELECT * FROM "default" WHERE NOT match_all('foo') +``` +This query returns all logs in the `default` stream where the keyword `foo` does NOT appear in any of the full-text indexed fields. Fields not configured for full-text search are ignored. + +**Combining NOT match_all with NOT str_match**: +```sql +SELECT * FROM "default" WHERE (NOT str_match(f1, 'bar')) AND (NOT match_all('foo')) +``` +This query returns logs where field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition. + +**Using NOT with OR conditions**: +```sql +SELECT * FROM "default" WHERE NOT (str_match(f1, 'bar') OR match_all('foo')) +``` +This query returns logs where BOTH conditions are false: field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition. + --- ### `re_match` diff --git a/docs/sql_reference.md b/docs/sql_reference.md index b1631d52..4eac274a 100644 --- a/docs/sql_reference.md +++ b/docs/sql_reference.md @@ -5,7 +5,7 @@ description: >- --- This guide describes the custom SQL functions supported in OpenObserve for querying and processing logs and time series data. These functions extend the capabilities of standard SQL by enabling full-text search, array processing, and time-based aggregations. -## Full-text Search Functions +## Full-text search functions These functions allow you to filter records based on keyword or pattern matches within one or more fields. ### `str_match(field, 'value')` @@ -26,6 +26,31 @@ This query filters logs from the `default` stream where the `k8s_pod_name` field ![str_match](./images/sql-reference/str-match.png) +### `not str_match(field, 'value')` +**Description**:
+ +- Filters logs where the specified field does NOT contain the exact string value. +- The match is case-sensitive. +- Only logs that do not include the exact characters and casing specified will be returned. +- Can be combined with other conditions using AND/OR operators. + +**Example**:
+```sql +SELECT * FROM "default" WHERE NOT str_match(k8s_app_instance, 'dev2') +``` +![not str_match](./images/sql-reference/not-str-match.png) + +**Combining multiple NOT conditions with AND:** +```sql +SELECT * FROM "default" WHERE (NOT str_match(k8s_app_instance, 'dev2')) AND (NOT str_match(k8s_cluster, 'dev2')) +``` +![not str_match with AND operator](./images/sql-reference/not-str-match-with-and.png) + +**Combining NOT conditions with OR:** +```sql +SELECT * FROM "default" WHERE NOT ((str_match(k8s_app_instance, 'dev2') OR str_match(k8s_cluster, 'dev2'))) +``` +![not str_match with OR operator](./images/sql-reference/not-str-match-with-or.png) --- ### `str_match_ignore_case(field, 'value')` **Alias**: `match_field_ignore_case(field, 'value')` (Available in OpenObserve version 0.15.0 and later)
@@ -65,6 +90,51 @@ This query returns all logs in the `default` stream where the keyword `openobser ![match_all](./images/sql-reference/match-all.png) +**More pattern support** +The `match_all` function also supports the following patterns for flexible searching: + +- **Prefix search**: Matches keywords that start with the specified prefix: +```sql +SELECT * FROM "default" WHERE match_all('ab*') +``` +- **Postfix search**: Matches keywords that end with the specified suffix: +```sql +SELECT * FROM "default" WHERE match_all('*ab') +``` +- **Contains search**: Matches keywords that contain the substring anywhere: +```sql +SELECT * FROM "default" WHERE match_all('*ab*') +``` +- **Phrase prefix search**: Matches keywords where the last term uses prefix matching: +```sql +SELECT * FROM "default" WHERE match_all('key1 key2*') +``` +### `not match_all('value')` +**Description**:
+ +- Filters logs by excluding records where the keyword appears in any field that has the Index Type set to Full Text Search in the stream settings. +- This function is case-insensitive and excludes matches regardless of the keyword's casing. +- **Important**: Only searches fields configured as Full Text Search fields. Other fields in the record are not evaluated. +- Provides significant performance improvements when used with indexed fields. + +**Example**: +```sql +SELECT * FROM "default" WHERE NOT match_all('foo') +``` +This query returns all logs in the `default` stream where the keyword `foo` does NOT appear in any of the full-text indexed fields. Fields not configured for full-text search are ignored. + +**Combining NOT match_all with NOT str_match**: +```sql +SELECT * FROM "default" WHERE (NOT str_match(f1, 'bar')) AND (NOT match_all('foo')) +``` +This query returns logs where field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition. + +**Using NOT with OR conditions**: +```sql +SELECT * FROM "default" WHERE NOT (str_match(f1, 'bar') OR match_all('foo')) +``` +This query returns logs where BOTH conditions are false: field `f1` does NOT contain `bar` AND no full-text indexed field contains `foo`. In other words, it excludes records that match either condition. + --- ### `re_match(field, 'pattern')` **Description**:
@@ -113,7 +183,7 @@ This query returns logs from the `default` stream where the `k8s_container_name` --- -## Array Functions +## Array functions The array functions operate on fields that contain arrays. In OpenObserve, array fields are typically stored as stringified JSON arrays.
For example, in a stream named `default`, there may be a field named `emails` that contains the following value: `["jim@email.com", "john@doe.com", "jene@doe.com"]`
@@ -302,7 +372,7 @@ In this query: --- -## Aggregate Functions +## Aggregate functions Aggregate functions compute a single result from a set of input values. For usage of standard SQL aggregate functions such as `COUNT`, `SUM`, `AVG`, `MIN`, and `MAX`, refer to [PostgreSQL documentation](https://www.postgresql.org/docs/). ### `histogram(field, 'duration')` @@ -324,7 +394,7 @@ FROM "default" GROUP BY key ORDER BY key ``` -**Expected Output**:
+**Expected output**:
This query divides the log data into 30-second intervals. Each row in the result shows: @@ -416,7 +486,7 @@ ORDER BY request_count DESC - Each core maintains hash tables during aggregation across all partitions - Memory usage: 3M entries × 60 cores × 60 partitions = 10.8 billion hash table entries - **Typical Error Message:** + **Typical error message:** ``` Resources exhausted: Failed to allocate additional 63232256 bytes for GroupedHashAggregateStream[20] with 0 bytes already allocated for this reservation - 51510301 bytes remain available for the total pool ``` @@ -434,7 +504,7 @@ ORDER BY request_count DESC **Scenario**
Find the top 10 client IPs by request count from web server logs distributed across 3 follower query nodes. - **Raw Data Distribution**
+ **Raw data distribution**
| Rank | Node 1 | Requests | Node 2 | Requests | Node 3 | Requests | |------|---------|----------|---------|----------|---------|----------| @@ -450,7 +520,7 @@ ORDER BY request_count DESC | 10 | 192.168.1.150 | 440 | 192.168.1.150 | 520 | 192.168.1.150 | 450 | - **Follower Query Nodes Process Data**
+ **Follower query nodes process data**
Each follower node executes the query locally and returns only its top 10 results: @@ -467,7 +537,7 @@ ORDER BY request_count DESC | 9 | 203.0.113.80 | 460 | 10.0.0.25 | 560 | 172.16.0.30 | 490 | | 10 | 192.168.1.150 | 440 | 192.168.1.150 | 520 | 192.168.1.150 | 450 | - **Leader Query Node Aggregates Results**
+ **Leader query node aggregates results**
| Client IP | Node 1 | Node 2 | Node 3 | Total Requests | |-----------|---------|---------|---------|----------------| @@ -482,7 +552,7 @@ ORDER BY request_count DESC | 172.16.0.30 | 480 | 580 | 490 | **1,550** | | 192.168.1.150 | 440 | 520 | 450 | **1,410** | - **Final Top 10 Results:** + **Final top 10 results:** | Rank | Client IP | Total Requests | |------|-----------|----------------| @@ -497,7 +567,7 @@ ORDER BY request_count DESC | 9 | 172.16.0.30 | 1,550 | | 10 | 192.168.1.150 | 1,410 | - **Why Results Are Approximate**
+ **Why results are approximate**
The approx_topk function returns approximate results because it relies on each query node sending only its local top N entries to the leader. The leader combines these partial lists to produce the final result. @@ -599,7 +669,7 @@ ORDER BY distinct_count DESC - Memory usage for distinct counting: Potentially unlimited storage for tracking unique values. - Combined with grouping: Memory requirements become exponentially larger. - **Typical Error Message:** + **Typical error message:** ``` Resources exhausted: Failed to allocate additional 63232256 bytes for GroupedHashAggregateStream[20] with 0 bytes already allocated for this reservation - 51510301 bytes remain available for the total pool ``` @@ -610,7 +680,7 @@ ORDER BY distinct_count DESC SELECT approx_topk_distinct(clientip, clientas, 10) FROM default ``` - **Combined Approach:** + **Combined approach:** - **HyperLogLog**: Handles distinct counting using a fixed **16 kilobytes** data structure per group. - **Space-Saving**: Limits the number of groups returned from each partition to top K. @@ -619,7 +689,7 @@ ORDER BY distinct_count DESC **Example: Web Server User Agent Analysis** Find the top 10 client IPs by unique user agent count from web server logs in the `default` stream. - **Raw Data Distribution** + **Raw data distribution** | Node 1 | Distinct User Agents | Node 2 | Distinct User Agents | Node 3 | Distinct User Agents | |---------|---------------------|---------|---------------------|---------|---------------------| @@ -636,7 +706,7 @@ ORDER BY distinct_count DESC **Note**: Each distinct count is computed using HyperLogLog's 16KB data structure per client IP. - **Follower Query Nodes Process Data** + **Follower query nodes process data** Each follower node executes the query locally and returns only its top 10 results: @@ -653,7 +723,7 @@ ORDER BY distinct_count DESC | 9 | 203.0.113.80 | 220 | 10.0.0.25 | 270 | 172.16.0.30 | 260 | | 10 | 192.168.1.150 | 200 | 192.168.1.150 | 250 | 192.168.1.150 | 240 | - **Leader Query Node Aggregates Results** + **Leader query node aggregates results** | Client IP | Node 1 | Node 2 | Node 3 | Total Distinct User Agents | |-----------|---------|---------|---------|---------------------------| @@ -668,7 +738,7 @@ ORDER BY distinct_count DESC | 172.16.0.30 | 240 | 290 | 260 | **790** | | 192.168.1.150 | 200 | 250 | 240 | **690** | - **Final Top 10 Results:** + **Final top 10 results:** | Rank | Client IP | Total Distinct User Agents | |------|-----------|---------------------------| @@ -684,7 +754,7 @@ ORDER BY distinct_count DESC | 10 | 192.168.1.150 | 690 | - **Why Results Are Approximate** + **Why results are approximate** Results are approximate due to two factors: 1. **HyperLogLog approximation:** Distinct counts are estimated, not exact. diff --git a/docs/storage-management/storage.md b/docs/storage-management/storage.md index e22944d3..d033b43c 100644 --- a/docs/storage-management/storage.md +++ b/docs/storage-management/storage.md @@ -191,7 +191,11 @@ OpenObserve supports multiple metadata store backends, configurable using the `Z - Recommended for production deployments due to reliability and scalability. - The default Helm chart (after February 23, 2024) uses [cloudnative-pg](https://cloudnative-pg.io/) to create a postgres cluster (primary + replica) which is used as the meta store. These instances provide high availability and backup support. -### etcd (Deprecated) +### etcd (Removed) + +!!! warning "Removal notice" + Etcd support has been removed. Use NATS instead. + - Set `ZO_META_STORE=etcd`. - While etcd is used as the cluster coordinator, it was also the default metadata store in Helm charts released before 23 February 2024. This configuration is now deprecated. Helm charts released after 23 February 2024 use PostgreSQL as the default metadata store. diff --git a/docs/user-guide/.pages b/docs/user-guide/.pages index 16c9d0d2..acb26453 100644 --- a/docs/user-guide/.pages +++ b/docs/user-guide/.pages @@ -16,6 +16,7 @@ nav: - Management: management - Profile: profile - Performance: performance + - Federated Search: federated-search - Best Practices: best-practices - Migration: migration diff --git a/docs/user-guide/actions/actions-in-openobserve.md b/docs/user-guide/actions/actions-in-openobserve.md index 20eb164e..edaa739b 100644 --- a/docs/user-guide/actions/actions-in-openobserve.md +++ b/docs/user-guide/actions/actions-in-openobserve.md @@ -5,8 +5,11 @@ description: >- --- This guide explains what Actions are, their types, and use cases. +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. + ## What are Actions -**Actions** in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations. +Actions in OpenObserve are user-defined Python scripts that support custom automation workflows. They can be applied to log data directly from the Logs UI or used as alert destinations. - Previously, OpenObserve supported log transformations only through VRL (Vector Remap Language). Python scripts written for Actions expand the capabilities of log transformations in OpenObserve. - Additionally, earlier, when an alert gets triggered, users used to get notified via email or webhook. But, with Actions as alert destinations, users can take an immediate action by adding an automation workflow using Actions. diff --git a/docs/user-guide/actions/create-and-use-real-time-actions.md b/docs/user-guide/actions/create-and-use-real-time-actions.md index b4ac08d4..9427ca09 100644 --- a/docs/user-guide/actions/create-and-use-real-time-actions.md +++ b/docs/user-guide/actions/create-and-use-real-time-actions.md @@ -8,6 +8,9 @@ description: >- This guide provides instruction on how to create Real-time Actions. +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. + ## Create a Real-time Action ??? info "Prerequisite" Create a Service Account and Assign a Role diff --git a/docs/user-guide/actions/create-and-use-scheduled-actions.md b/docs/user-guide/actions/create-and-use-scheduled-actions.md index 41436e35..b4813701 100644 --- a/docs/user-guide/actions/create-and-use-scheduled-actions.md +++ b/docs/user-guide/actions/create-and-use-scheduled-actions.md @@ -6,6 +6,9 @@ description: >- --- This guide provides step-by-step instructions for creating and using Scheduled Actions in OpenObserve. +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. + **Scheduled Actions** in OpenObserve allow you to execute custom Python scripts at a specific time, either **once** or on a **recurring schedule** defined using a cron expression. Scheduled Actions run based on time, making them suitable for: diff --git a/docs/user-guide/federated-search/.pages b/docs/user-guide/federated-search/.pages new file mode 100644 index 00000000..30ec3685 --- /dev/null +++ b/docs/user-guide/federated-search/.pages @@ -0,0 +1,5 @@ +nav: + +- Federated Search Overview: index.md +- How to Use Federated Search: how-to-use-federated-search.md +- Federated Search Architecture: federated-search-architecture.md diff --git a/docs/user-guide/federated-search/federated-search-architecture.md b/docs/user-guide/federated-search/federated-search-architecture.md new file mode 100644 index 00000000..30cbddf9 --- /dev/null +++ b/docs/user-guide/federated-search/federated-search-architecture.md @@ -0,0 +1,147 @@ +--- +title: Federated Search in OpenObserve - Architecture +description: Technical explanation of OpenObserve deployment modes, normal cluster query execution, and how federated search works across single and multiple clusters. +--- +This document explains the technical architecture of OpenObserve deployments, how queries execute in normal clusters, and how [federated search](../) coordinates queries across clusters in a supercluster. + +!!! info "Availability" + This feature is available in Enterprise Edition. Not available in Open Source and Cloud. + +## Understanding OpenObserve deployments +Before diving into how federated search works, you need to understand how OpenObserve can be deployed. OpenObserve scales from a single machine to a globally distributed infrastructure. + +## Single node deployment +The simplest deployment: one instance of OpenObserve runs all functions on one machine. Data stores locally, and the node processes queries directly. This works for testing or small deployments. + +## Single cluster deployment +When you need scale, multiple specialized nodes work together as a cluster. Each node type has a specific role: + +- **Router**: Entry point that forwards queries to queriers +- **Querier**: Processes queries in parallel with other queriers +- **Ingester**: Receives and stores data in object storage +- **Compactor**: Optimizes files and enforces retention +- **Alertmanager**: Executes alerts and sends notifications + +A single cluster handles more data and provides higher availability than a single node. + +## Supercluster deployment +When you need to operate across multiple geographical regions, multiple clusters connect as a supercluster. This is where federated search becomes relevant. + +!!! note "Key point" + Each cluster in a supercluster operates independently with its own data storage. Data ingested into one cluster stays in that cluster. However, configuration metadata synchronizes across all clusters, allowing unified management. + +## Region and cluster hierarchy +In a supercluster, regions organize clusters geographically. A region may contain one or more clusters. +
+**Example:** +
+ +```bash +Region: us-test-3 + ├─ Cluster: dev3 + └─ Cluster: dev3-backup + +Region: us-test-4 + └─ Cluster: dev4 +``` +Each cluster has independent data storage. Data stays where it was ingested. + +## How queries execute +Understanding query execution helps you understand how federated search works whether querying one cluster or multiple clusters. + +### Normal cluster query execution +This section explains how any OpenObserve cluster processes queries internally, regardless of whether it is a standalone cluster or part of a supercluster. Understanding this internal process is essential because: + +- This is how standalone clusters work +- This is what happens when you query your current cluster in a supercluster without federated search coordination +- During federated search, each individual cluster uses this same internal process to search its own data + +When a cluster receives a query: + +1. Router forwards the query to an available querier. +2. That querier becomes the leader querier. +3. Leader querier parses SQL, identifies data files, creates execution plan. +4. Leader querier distributes work among available queriers. These queriers become worker queriers. +5. All worker queriers search their assigned files in parallel. +6. Worker queriers send results to the leader querier. +7. Leader querier merges results and returns final answer. + +### Query execution for your current cluster in a supercluster +Your current cluster is the cluster you are logged into. When you select your current cluster from the Region dropdown, this is not federated search. +
+For example, if you are logged into Cluster A and you select Cluster A from the Region dropdown, the query executes using the normal cluster query execution process described above. No cross-cluster communication occurs, and no federated search coordination is needed. + +### Federated search for one different cluster in a supercluster +When you select a different cluster from the Region dropdown, not the cluster you are logged into, federated search coordination is used: +
+ +**Step 1: Coordination setup** +
+Your current cluster becomes the leader cluster. +
+ +**Step 2: Query distribution** +
+Leader cluster sends the query to the selected cluster via gRPC. +
+ +**Step 3: Query processing** +
+The selected cluster processes the query using its normal cluster query execution process. +
+ +**Step 4: Result return** +
+The selected cluster sends its results back to the leader cluster. +
+ +**Step 5: Result presentation** +
+The leader cluster displays the results. + +### Federated search for multiple clusters in a supercluster + +When you select no cluster or multiple clusters from the Region dropdown, federated search extends the query across all selected clusters: +
+ +**Step 1: Coordination setup** +
+Your current cluster becomes the leader cluster. The leader cluster identifies all selected clusters, or all clusters if none selected, that contain data for the queried stream. These other clusters become worker clusters. +
+ +**Step 2: Query distribution** +
+The leader cluster sends the query to all worker clusters via gRPC. All clusters now have the same query to execute. +
+ +**Step 3: Parallel processing** +
+Each cluster processes the query using its normal cluster query execution process. The leader cluster searches its own data if it contains data for that stream. Worker clusters search their own data. All processing happens simultaneously. +
+ +**Step 4: Result aggregation** +
+Each cluster aggregates its own results internally using its leader querier and worker queriers. Worker clusters send their aggregated results to the leader cluster. The leader cluster merges all results from all clusters and returns the unified response. + +## Metadata synchronization +In a supercluster, clusters share configuration and schema information in real-time while keeping actual data separate. This synchronization happens via NATS, a messaging system that coordinates communication between clusters. +
+While stream schemas are synchronized across all clusters in real-time, the actual data for a stream only exists in the cluster or clusters where it was ingested. + +| **Synchronized across clusters** | **NOT synchronized (stays local)** | +|----------------------------------|-----------------------------------| +| Schema definitions | Log data | +| User-defined functions | Metric data | +| Dashboards and folders | Trace data | +| Alerts and notifications | Raw ingested data | +| Scheduled tasks and reports | Parquet files and WAL files | +| User and organization settings | Search indices | +| System configurations | | +| Job metadata | | +| Enrichment metadata | | + +This design maintains data residency compliance while enabling unified configuration management. + +## Limitations + +**No cluster identification in results:** Query results do not indicate which cluster provided specific data. To identify the source, query each cluster individually. \ No newline at end of file diff --git a/docs/user-guide/federated-search/how-to-use-federated-search.md b/docs/user-guide/federated-search/how-to-use-federated-search.md new file mode 100644 index 00000000..a319cdee --- /dev/null +++ b/docs/user-guide/federated-search/how-to-use-federated-search.md @@ -0,0 +1,78 @@ +--- +title: Federated Search in OpenObserve - How-to Guide +description: Step-by-step instructions for querying your current cluster and performing federated searches across one or more clusters in a supercluster setup. +--- +This document explains how to query your current cluster and how to perform [federated searches](../) across one or more different clusters in a supercluster setup. + +!!! info "Availability" + This feature is available in Enterprise Edition. Not available in Open Source and Cloud. + +## How to query your current cluster in a supercluster + +Query your current cluster when you know the data is in your cluster or when you need the fastest query performance. + +!!! note "What you need to know:" + + - This is not federated search + - You are querying the current cluster. + - No cross-cluster communication occurs. + - Results will include data from the current cluster only. +
+**Steps:** +![current-cluster-query](../../images/current-cluster-query.png) + +1. Navigate to the **Logs** page. +2. Enter your query in the SQL Query Editor. +3. Select a time range. +4. Select one specific cluster from the **Region** dropdown. +5. Select **Run query**. + +> For detailed explanation, see **Normal cluster query execution** in the [Federated Search Architecture](../federated-search/federated-search-architecture/) page. +
+ +**Result**
+Data from the selected cluster only. +![current-cluster-query-result](../../images/current-cluster-query-result.png) + + +## How to query one or more different clusters in a supercluster + +Use federated search when you need data from multiple clusters. + +!!! note "What you need to know" + + - Multiple clusters will process your query simultaneously. + - Results will combine data from all selected clusters. + +**Steps** +
+![federated-search-multi-select](../../images/federated-search-multi-select.png) + +1. Navigate to the **Logs** page. +2. Enter your query in the SQL Query Editor. +3. Select a time range. +4. Leave the **Region** dropdown unselected, or select multiple clusters. +5. Select **Run query**. + +> For detailed explanation, see **Federated search for one different cluster** and **Federated search for multiple clusters** in the [Federated Search Architecture](../federated-search-architecture/) page. +
+ +**Result**
+Combined data from all selected clusters. +![federated-search-result](../../images/federated-search-result.png) +## Region selection reference + +Use this quick reference to understand how region selection affects query execution: + +| **Region/Cluster Selection** | **Behavior** | **Query Type** | **Communication** | +|------------------------------|--------------|----------------|-------------------| +| None selected | Queries all clusters | Federated search | Cross-cluster via gRPC | +| Your current cluster selected | Queries only your current cluster | Normal cluster query (NOT federated) | Internal only, no cross-cluster | +| One different cluster selected (same region) | Queries only that cluster | Federated search | Cross-cluster via gRPC | +| One different cluster selected (different region) | Queries only that cluster | Federated search | Cross-cluster via gRPC | +| Multiple clusters selected | Queries all selected clusters | Federated search | Cross-cluster via gRPC | + + +**Next step** + +- [Federated Search Architecture](../federated-search-architecture/) \ No newline at end of file diff --git a/docs/user-guide/federated-search/index.md b/docs/user-guide/federated-search/index.md new file mode 100644 index 00000000..b25b2c13 --- /dev/null +++ b/docs/user-guide/federated-search/index.md @@ -0,0 +1,65 @@ +--- +title: Federated Search in OpenObserve - Overview +description: Learn what federated search is, key concepts, prerequisites, and when to use it. +--- +This document provides an overview of federated search in OpenObserve. + +!!! info "Availability" + This feature is available in Enterprise Edition. Not available in Open Source and Cloud. + +## What is federated search? + +Federated search enables querying across multiple OpenObserve clusters that are connected as a supercluster, all from one interface. +
+ +Without federated search, investigating issues across regions requires logging into each cluster separately, running the same query multiple times, and manually combining results. This wastes time during critical incidents. +With federated search, you query once and receive unified results from all clusters. + +!!! note "Prerequisites" + + - OpenObserve Enterprise edition + - Multiple clusters configured as a supercluster + +## How to verify if your environment is in a supercluster +Check whether the Region dropdown appears on the Logs page. If visible, your clusters are configured as a supercluster. +![federated-search](../../images/federated-search.png) + +## Key concepts in federated search + +Before using federated search, understand these core concepts: + +- **Node:** A single instance of OpenObserve running on one machine or server. +- **Cluster:** A group of OpenObserve nodes working together to handle data ingestion, storage, and querying. Each cluster has its own data storage. +- **Region:** A geographical location that contains one or more clusters. For example, Region us-east may contain cluster prod-east-1 and cluster prod-east-2. +- **Supercluster:** Multiple OpenObserve clusters across different geographical regions connected to work as a unified system. This enables federated search capability. +- **Data distribution:** Data ingested into a specific cluster stays in that cluster's storage. It is not replicated to other clusters. This ensures data residency compliance. +- **Metadata synchronization:** Configuration information such as schemas, dashboards, and alerts synchronize across all clusters in a supercluster. This allows unified management while keeping data distributed. +- **Federated search:** The capability to query data across different clusters in a supercluster. Federated search activates when you: + + - Select one or more different clusters, meaning clusters other than your current cluster: The selected clusters' data is searched via federated coordination. + - Select none: All clusters search simultaneously via federated coordination and results are combined. + +> **Important**: Querying your current cluster uses normal cluster query execution, not federated search architecture. + +> For detailed technical explanations of deployment modes, architecture, and how queries execute, see the [Federated Search Architecture](../federated-search-architecture/) page. + +## When to use federated search + +| **Use case** | **Cluster selection** | **Reason** | +|--------------|----------------------|------------| +| Data is in one specific different cluster | Select that different cluster | Access only that cluster's data via federated search | +| Multi-region deployments | Select none or multiple clusters | Query all regions at once via federated search | +| Centralized search across teams | Select none or multiple clusters | Unified visibility across all clusters via federated search | + + +## When not to use federated search + +| **Use case** | **Cluster selection** | **Reason** | +|--------------|----------------------|------------| +| Data is in your current cluster | Select your current cluster | Uses normal cluster query without cross-cluster communication | + + +**Next steps** + +- [How to Use Federated Search](../how-to-use-federated-search/) +- [Federated Search Architecture](../federated-search-architecture/) \ No newline at end of file diff --git a/docs/user-guide/identity-and-access-management/role-based-access-control.md b/docs/user-guide/identity-and-access-management/role-based-access-control.md index d4f86ef0..0aaf3876 100644 --- a/docs/user-guide/identity-and-access-management/role-based-access-control.md +++ b/docs/user-guide/identity-and-access-management/role-based-access-control.md @@ -5,15 +5,16 @@ description: >- --- This guide provides an overview of Role-Based Access Control (RBAC), its features, and how it is implemented in OpenObserve. -## Overview +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. -OpenObserve uses RBAC to manage what actions users can perform based on their assigned roles. Instead of giving all users the same level of access, RBAC ensures that each user can only access the features and data relevant to their role. + - **Enterprise version**: RBAC requires manual configuration using [OpenFGA](https://openfga.dev/api/service). Learn more about [enabling RBAC in OpenObserve Enterprise](enable-rbac-in-openobserve-enterprise.md). + - **Cloud version**: RBAC is preconfigured and does not require setup. + - **Open-source version**: RBAC is not supported. All users have unrestricted access to all features. -RBAC is available in **OpenObserve Enterprise** and **Cloud** versions but is not supported in the open-source version: +## Overview -- **Enterprise version**: RBAC requires manual configuration using [OpenFGA](https://openfga.dev/api/service). Learn more about [enabling RBAC in OpenObserve Enterprise](enable-rbac-in-openobserve-enterprise.md). -- **Cloud version**: RBAC is preconfigured and does not require setup. -- **Open-source version**: RBAC is not supported. All users have unrestricted access to all features. +OpenObserve uses RBAC to manage what actions users can perform based on their assigned roles. Instead of giving all users the same level of access, RBAC ensures that each user can only access the features and data relevant to their role. ## How OpenObserve Implements RBAC diff --git a/docs/user-guide/identity-and-access-management/sso.md b/docs/user-guide/identity-and-access-management/sso.md index 8e2139c4..49b3f9ba 100644 --- a/docs/user-guide/identity-and-access-management/sso.md +++ b/docs/user-guide/identity-and-access-management/sso.md @@ -5,11 +5,13 @@ description: >- --- -> `Applicable to enterprise version` +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. +## SSO in OpenObserve OpenObserve, integrates Single Sign-On (SSO) capabilities using Dex, an OpenID Connect Identity (OIDC) and OAuth 2.0 provider. Dex does not have a user database and instead uses external identity providers like LDAP, Google, GitHub, etc. for authentication. -## Setup OpenObserve +## Configure SSO in OpenObserve You must set following environment variables to enable SSO in OpenObserve. diff --git a/docs/user-guide/management/aggregation-cache.md b/docs/user-guide/management/aggregation-cache.md index dd2c2462..873a04a6 100644 --- a/docs/user-guide/management/aggregation-cache.md +++ b/docs/user-guide/management/aggregation-cache.md @@ -5,7 +5,6 @@ description: Learn how streaming aggregation works in OpenObserve Enterprise. --- This page explains what streaming aggregation is and shows how to use it to improve query performance with aggregation cache in OpenObserve. -> This is an enterprise feature. === "Overview" @@ -148,10 +147,18 @@ This page explains what streaming aggregation is and shows how to use it to impr - [approx_percentile_cont](https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#approx-percentile-cont) - [approx_percentile_cont_with_weight](https://datafusion.apache.org/user-guide/sql/aggregate_functions.html#approx-percentile-cont-with-weight) - [approx_topk](https://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk/) - - [approx_topk_distinct](http://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk-distinct/) + - [approx_topk_distinct](https://openobserve.ai/docs/sql-functions/approximate-aggregate/approx-topk-distinct/) --- + ## Aggregation cache metrics + OpenObserve exposes Prometheus metrics to monitor aggregation cache performance and memory usage. + | Metric | Description | + |--------|-------------| + | `zo_query_aggregation_cache_items` | Monitor to understand cache utilization and verify that streaming aggregation is populating the cache as expected | + | `zo_query_aggregation_cache_bytes` | Monitor memory consumption to ensure the cache stays within acceptable limits and doesn't exhaust system resources | + + --- === "How to use" ## How to use streaming aggregation diff --git a/docs/user-guide/management/audit-trail.md b/docs/user-guide/management/audit-trail.md index f83c2934..030e645e 100644 --- a/docs/user-guide/management/audit-trail.md +++ b/docs/user-guide/management/audit-trail.md @@ -6,8 +6,9 @@ description: >- --- # Audit Trail -> **Note:** This feature is applicable to the Enterprise Edition. - +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. + ## What is Audit Trail Audit Trail records user actions across all organizations in OpenObserve. It captures non-ingestion API calls and helps you monitor activity and improve security. @@ -31,7 +32,7 @@ When audit logging is enabled using the `O2_AUDIT_ENABLED` environment variable, !!! note "Example" The following example shows a captured audit event from the `audit` stream: ![audit-trail](../../images/audit-trail.png) - + !!! note "Use cases" Because audit events are stored in a log stream, you can: diff --git a/docs/user-guide/management/cipher-keys.md b/docs/user-guide/management/cipher-keys.md index 0690d786..14857fdc 100644 --- a/docs/user-guide/management/cipher-keys.md +++ b/docs/user-guide/management/cipher-keys.md @@ -7,7 +7,8 @@ description: >- This page explains how to create and manage **Cipher Keys** in OpenObserve and how to use them to decrypt encrypted log data during search queries. The **Cipher Keys** feature is essential for handling sensitive data stored in encrypted formats while still enabling effective log search and analysis, without storing decrypted data on disk. -> **Note:** This feature is applicable to the OpenObserve [Enterprise Edition](../../../openobserve-enterprise-edition-installation-guide/). +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. ## Create Cipher Keys diff --git a/docs/user-guide/management/sensitive-data-redaction.md b/docs/user-guide/management/sensitive-data-redaction.md index 0190e1e2..195ce602 100644 --- a/docs/user-guide/management/sensitive-data-redaction.md +++ b/docs/user-guide/management/sensitive-data-redaction.md @@ -4,7 +4,9 @@ description: Learn how to redact or drop sensitive data using regex patterns dur --- This document explains how to configure and manage regex patterns for redacting or dropping sensitive data in OpenObserve. -> Note: This feature is applicable to the OpenObserve [Enterprise Edition](../../../openobserve-enterprise-edition-installation-guide/). + +!!! info "Availability" + This feature is available in Enterprise Edition and Cloud. Not available in Open Source. ## Overview The **Sensitive Data Redaction** feature helps prevent accidental exposure of sensitive data by applying regex-based detection to values ingested into streams and to values already stored in streams. Based on this detection, sensitive values can be either **redacted** or **dropped**. This ensures data is protected before it is stored and hidden when displayed in query results. You can configure these actions to run at ingestion time or at query time. diff --git a/docs/user-guide/performance/tantivy-index.md b/docs/user-guide/performance/tantivy-index.md index e963a88f..62881025 100644 --- a/docs/user-guide/performance/tantivy-index.md +++ b/docs/user-guide/performance/tantivy-index.md @@ -2,124 +2,145 @@ title: Tantivy Indexing in OpenObserve description: Learn how Tantivy indexing works in OpenObserve, including full-text and secondary indexes, query behaviors with AND and OR operators, and how to verify index usage. --- -This document explains Tantivy indexing in OpenObserve, the types of indexes it builds, how to use the correct query patterns, and how to verify and configure indexing. +This document explains Tantivy indexing in OpenObserve, the types of indexes it builds, how to use the correct query patterns for both single-stream and multi-stream queries, and how to verify and configure indexing. > Tantivy indexing is an open-source feature in OpenObserve. ## What is Tantivy? Tantivy is the inverted index library used in OpenObserve to accelerate searches. An inverted index keeps a map of values or tokens and the row IDs of the records that contain them. When a user searches for a value, the query can use this index to go directly to the matching rows instead of scanning every log record. +## Index types Tantivy builds two kinds of indexes in OpenObserve: -## Full-text index -For fields such as `body` or `message` that contain sentences or long text. The field is split into tokens, and each token is mapped to the records that contain it. +??? note "Full-text index" + ### Full-text index + For fields such as `body` or `message` that contain sentences or long text. The field is split into tokens, and each token is mapped to the records that contain it. -**Example log records**
+ !!! note "Example log records:" -- Row 1: `body = "POST /api/metrics error"` -- Row 2: `body = "GET /health ok"` -- Row 3: `body = "error connecting to database"` + - Row 1: `body = "POST /api/metrics error"` + - Row 2: `body = "GET /health ok"` + - Row 3: `body = "error connecting to database"` -The log body `POST /api/metrics error` is stored as tokens `POST`, `api`, `metrics`, `error`. A search for `error` looks up that token in the index and immediately finds the matching records. + The log body `POST /api/metrics error` is stored as tokens `POST`, `api`, `metrics`, `error`. A search for `error` looks up that token in the index and immediately finds the matching records. -## Secondary index -For fields that represent a single exact value. For example, `k8s_namespace_name`. In this case, the entire field value is treated as one token and indexed. +??? note "Secondary index" + ### Secondary index + For fields that represent a single exact value. For example, `kubernetes_namespace_name`. In this case, the entire field value is treated as one token and indexed. -**Example log records**
+ !!! note "Example log records:" -- Row 1: `k8s_namespace_name = ingress-nginx` -- Row 2: `k8s_namespace_name = ziox` -- Row 3: `k8s_namespace_name = ingress-nginx` -- Row 4: `k8s_namespace_name = cert-manager` + - Row 1: `kubernetes_namespace_name = ingress-nginx` + - Row 2: `kubernetes_namespace_name = ziox` + - Row 3: `kubernetes_namespace_name = ingress-nginx` + - Row 4: `kubernetes_namespace_name = cert-manager` -For `k8s_namespace_name`, the index might look like: + For `kubernetes_namespace_name`, the index might look like: -- `ingress-nginx` > [Row 1, Row 3] -- `ziox` > [Row 2] -- `cert-manager` > [Row 4] + - `ingress-nginx` > [Row 1, Row 3] + - `ziox` > [Row 2] + - `cert-manager` > [Row 4] -A query for `k8s_namespace_name = 'ingress-nginx'` retrieves those rows directly, without scanning unrelated records. By keeping these indexes, Tantivy avoids full scans across millions or billions of records. This results in queries that return in milliseconds rather than seconds. + A query for `kubernetes_namespace_name = 'ingress-nginx'` retrieves those rows directly, without scanning unrelated records. By keeping these indexes, Tantivy avoids full scans across millions or billions of records. This results in queries that return in milliseconds rather than seconds. -## Configure Environment Variable -To enable Tantivy indexing, configure the following environment variable: -``` -ZO_ENABLE_INVERTED_INDEX = true -``` - -## Query behavior -Tantivy optimizes queries differently based on whether the field is full-text or secondary. Using the right operator for each field type ensures the query is served from the index instead of scanning logs. - -### Full-text index scenarios +## Configure environment variable +??? note "Enable Tantivy indexing" + ### Enable Tantivy indexing + To enable Tantivy indexing, configure the following environment variable: -**Correct usage**
- -- Use `match_all()` for full-text index fields such as `body` or `message`: -```sql --- Return logs whose body contains the token "error" -WHERE match_all('error'); -``` -- Use `NOT` with `match_all()`: -```sql --- Exclude logs whose body contains the token "error" -WHERE NOT match_all('error'); -``` - -**Inefficient usage**
-```sql --- Forces full string equality, bypasses token index -WHERE body = 'error'; -``` + | Environment Variable | Description | Default Value | + |---------------------|-------------|---------------| + | `ZO_ENABLE_INVERTED_INDEX` | Enables or disables Tantivy indexing | `true` | +??? note "Enable Tantivy result cache (optional)" + ### Enable Tantivy result cache (optional) + The [Tantivy result cache](#tantivy-result-cache) feature enhances search performance by storing index query results. It is disabled by default. To enable and configure the cache, set the following environment variables: -### Secondary index scenarios + | Environment Variable | Description | Default Value | + |---------------------|-------------|---------------| + | `ZO_INVERTED_INDEX_RESULT_CACHE_ENABLED` | Enables or disables the Tantivy result cache | `false` | + | `ZO_INVERTED_INDEX_RESULT_CACHE_MAX_ENTRIES` | Maximum number of cache entries | `10000` | + | `ZO_INVERTED_INDEX_RESULT_CACHE_MAX_ENTRY_SIZE` | Maximum size per cache entry in bytes | `20480` (20KB) | -**Correct usage** + For a detailed explanation of how the Tantivy result cache works, memory requirements, and performance impact, refer to the [Tantivy result cache](#tantivy-result-cache) section below. -- Use `=` or `IN (...)` for secondary index fields such as `k8s_namespace_name`, `k8s_pod_name`, or `k8s_container_name`. -```sql --- Single value -WHERE k8s_namespace_name = 'ingress-nginx'; - --- Multiple values -WHERE k8s_namespace_name IN ('ingress-nginx', 'ziox', 'cert-manager'); -``` -- Use NOT with `=` or `IN (...)` -```sql --- Exclude one exact value -WHERE NOT (k8s_namespace_name = 'ingress-nginx'); - --- Exclude multiple values -WHERE k8s_namespace_name NOT IN ('ziox', 'cert-manager'); -``` - -**Inefficient usage** -```sql --- Treated as a token search, no advantage over '=' -WHERE match_all('ingress-nginx'); -``` - -### Mixed scenarios +## Query behavior +Tantivy optimizes queries differently based on whether the field is full-text or secondary, and whether the query operates on a single stream or multiple streams. Using the right operator for each field type ensures the query is served from the index instead of scanning logs. + +!!! note "Note" + Tantivy index supports logs, metrics ,traces, and metadata stream. + +### Single-stream queries +A single-stream query retrieves data from one stream without using JOIN operations or subqueries that involve multiple streams. + +#### Full-text index scenarios + +!!! info "Correct usage:" + - Use `match_all()` for full-text index fields such as `body` or `message`: + ```sql linenums="1" + -- Return logs whose body contains the token "error" + WHERE match_all('error'); + ``` + - Use `NOT` with `match_all()`: + ```sql linenums="1" + -- Exclude logs whose body contains the token "error" + WHERE NOT match_all('error'); + ``` + +!!! warning "Inefficient usage:" + ```sql linenums="1" + -- Forces full string equality, bypasses token index + WHERE body = 'error'; + ``` + +#### Secondary index scenarios + +!!! info "Correct usage:" + - Use `=` or `IN (...)` for secondary index fields such as `kubernetes_namespace_name`, `kubernetes_pod_name`, or `kubernetes_container_name`. + ```sql linenums="1" + -- Single value + WHERE kubernetes_namespace_name = 'ingress-nginx'; + + -- Multiple values + WHERE kubernetes_namespace_name IN ('ingress-nginx', 'ziox', 'cert-manager'); + ``` + - Use NOT with `=` or `IN (...)` + ```sql linenums="1" + -- Exclude one exact value + WHERE NOT (kubernetes_namespace_name = 'ingress-nginx'); + + -- Exclude multiple values + WHERE kubernetes_namespace_name NOT IN ('ziox', 'cert-manager'); + ``` + +!!! warning "Inefficient usage:" + ```sql linenums="1" + -- Treated as a token search, no advantage over '=' + WHERE match_all('ingress-nginx'); + ``` + +#### Mixed scenarios When a query combines full-text and secondary fields, apply the best operator for each part. -**Correct usage** +!!! info "Correct usage:" -```sql -WHERE match_all('error') - AND k8s_namespace_name = 'ingress-nginx'; -``` + ```sql linenums="1" + WHERE match_all('error') + AND kubernetes_namespace_name = 'ingress-nginx'; + ``` -- `match_all('error')` uses full-text index. -- `k8s_namespace_name = 'ingress-nginx'` uses secondary index. + - `match_all('error')` uses full-text index. + - `kubernetes_namespace_name = 'ingress-nginx'` uses secondary index. -**Incorrect usage** +!!! warning "Incorrect usage:" -```sql --- Both operators used incorrectly -WHERE body = 'error' - AND match_all('ingress-nginx'); -``` + ```sql linenums="1" + -- Both operators used incorrectly + WHERE body = 'error' + AND match_all('ingress-nginx'); + ``` -### AND and OR operator behavior +#### AND and OR operator behavior **AND behavior**
@@ -128,9 +149,9 @@ WHERE body = 'error' **Examples** -```sql +```sql linenums="1" -- Fast: both sides indexable -WHERE match_all('error') AND k8s_namespace_name = 'ingress-nginx'; +WHERE match_all('error') AND kubernetes_namespace_name = 'ingress-nginx'; -- Mixed: one side indexable, one not WHERE match_all('error') AND body LIKE '%error%'; @@ -140,22 +161,236 @@ WHERE match_all('error') AND body LIKE '%error%'; - If all branches of the OR are indexable, Tantivy unites the row sets efficiently. - If any branch is not indexable, the entire OR is not indexable. The query runs in DataFusion. - +
**Examples** -```sql +```sql linenums="1" -- Fast: both indexable -WHERE match_all('error') OR k8s_namespace_name = 'ziox'; +WHERE match_all('error') OR kubernetes_namespace_name = 'ziox'; -- Slower: both sides are not indexable WHERE match_all('error') OR body LIKE '%error%'; ``` **NOT with grouped conditions**
-```sql +```sql linenums="1" -- Exclude when either namespace = ziox OR body contains error -WHERE NOT (k8s_namespace_name = 'ziox' OR match_all('error')); +WHERE NOT (kubernetes_namespace_name = 'ziox' OR match_all('error')); +``` + +### Multi-stream queries +A multi-stream query combines data from two or more streams using JOIN operations or subqueries that convert to JOINs internally. OpenObserve applies Tantivy indexing to both sides of a JOIN to accelerate data retrieval. + +#### What are multi-stream queries? +When a subquery converts to a JOIN, OpenObserve combines data from two sources. In a JOIN operation: + +- The left table is the first table in the JOIN operation. It is the base table that the query starts with. +- The right table is the second table in the JOIN operation. It provides additional data that is matched against the left table based on a join condition. + + +The query engine reads rows from the left table, then for each row, it looks up matching rows in the right table using the join condition. +
+**Example:** +```sql linenums="1" +SELECT t1.id FROM t1 JOIN t2 ON t1.id = t2.id +``` +In this query: + +- `t1` is the left table. It is the base table. +- `t2` is the right table. It is the table being matched. +- The join condition `t1.id = t2.id` determines which rows from both tables are combined. + +When a query includes a subquery in a WHERE clause with an IN operator, OpenObserve converts it to a JOIN operation. For example: +```sql linenums="1" +SELECT kubernetes_namespace_name +FROM default +WHERE kubernetes_namespace_name IN ( +SELECT DISTINCT kubernetes_namespace_name +FROM default +WHERE kubernetes_container_name = 'ziox' +); +``` +This query internally converts to a JOIN where: + +- The left table is the outer query, selecting `kubernetes_namespace_name` from `default`. +- The right table is the subquery, selecting distinct `kubernetes_namespace_name` values where `kubernetes_container_name` is `ziox`. + +Tantivy can use indexes on both the left table and the right table to accelerate the query. + +#### How indexing works in multi-stream queries +When OpenObserve executes a multi-stream query: + +1. The query optimizer identifies indexable conditions on both the left table and the right table of the JOIN. +2. Tantivy retrieves row identifiers from the index for each table independently. +3. The query engine combines the results based on the JOIN condition. +4. If both tables use indexes, the query avoids scanning unrelated records entirely. + +For example, +```sql linenums="1" +SELECT DISTINCT kubernetes_namespace_name +FROM default +WHERE kubernetes_pod_name = 'ziox-ingester-0' +AND kubernetes_namespace_name IN ( +SELECT DISTINCT kubernetes_namespace_name +FROM default +WHERE kubernetes_container_name = 'ziox' +) +ORDER BY kubernetes_namespace_name DESC +LIMIT 10; ``` +In this query, the subquery uses the secondary index on `kubernetes_container_name` to find matching namespaces, while the outer query uses the secondary index on `kubernetes_pod_name`. Both sides benefit from Tantivy indexing, eliminating the need for full table scans. + +#### match_all in multi-stream queries +The `match_all()` function is supported in multi-stream queries with specific limitations. OpenObserve checks whether the full-text index field exists in the stream before applying `match_all()`. + +!!! info "Supported scenarios:" + Use `match_all()` in subqueries that filter a single stream: + ```sql linenums="1" + SELECT * + FROM ( + SELECT * + FROM default + WHERE match_all('error') + ) AS filtered_logs; + ``` + Use `match_all()` in both the outer query and a subquery with an IN condition: + ```sql linenums="1" + SELECT * + FROM default + WHERE id IN ( + SELECT id + FROM default + WHERE match_all('error') + ) + AND match_all('critical'); + ``` + In this example, both the subquery and outer query apply full-text search using `match_all()`, and both leverage the full-text index to retrieve matching row identifiers. + +!!! info "Unsupported scenarios:" + Do not use `match_all()` outside a subquery when the subquery contains aggregation or grouping: + + ```sql linenums="1" + SELECT * + FROM ( + SELECT kubernetes_namespace_name, COUNT(*) + FROM default + GROUP BY kubernetes_namespace_name + ORDER BY COUNT(*) + ) AS aggregated + WHERE match_all('error'); + ``` + In this case, `match_all('error')` cannot determine which stream to search because the subquery has already aggregated the data. + +#### Partitioned search with inverted index +OpenObserve searches individual partitions using the inverted index when executing multi-stream queries. This behavior ensures that queries distribute efficiently across partitions and leverage indexing at the partition level. + + +## Index Optimizer + +### What is the Index Optimizer? +OpenObserve includes an index optimizer that accelerates specific query patterns by using Tantivy indexes more efficiently. The optimizer works automatically for both standalone queries and subqueries when certain conditions are met. + +The optimizer handles four query patterns: count, histogram, top N, and distinct queries. + +### Optimized Query Patterns + +#### Count Queries +!!! note "" + The optimizer accelerates queries that count total records. +
+ **Example:** + ```sql linenums="1" + SELECT COUNT(*) FROM stream WHERE match_all('error') + ``` +
+ **Requirements:** + + - All filters in the WHERE clause must be indexable by Tantivy + +#### Histogram Queries +!!! note "" + The optimizer accelerates queries that generate histogram data grouped by time intervals. +
+ **Example:** + ```sql linenums="1" + SELECT histogram(_timestamp, '1m') AS ts, COUNT(*) AS cnt + FROM table + WHERE match_all('error') + GROUP BY ts + ``` +
+ **Requirements:** + + - All filters in the WHERE clause must be indexable by Tantivy + +#### Top N Queries +!!! note "" + The optimizer accelerates queries that retrieve the top N results based on count, ordered in descending order. +
+ **Example:** + ```sql linenums="1" + SELECT kubernetes_namespace_name, COUNT(*) AS cnt + FROM table + WHERE match_all('error') + GROUP BY kubernetes_namespace_name + ORDER BY cnt DESC + LIMIT 10 + ``` +
+ **Requirements:** + + - All filters in the WHERE clause must be indexable by Tantivy + - The field being grouped must be a secondary index field + +#### Distinct Queries +!!! note "" + The optimizer accelerates queries that retrieve distinct values for a field. +
+ **Example:** + ```sql linenums="1" + SELECT kubernetes_namespace_name + FROM table + WHERE str_match(kubernetes_namespace_name, 'prod') + GROUP BY kubernetes_namespace_name + ORDER BY kubernetes_namespace_name ASC + LIMIT 10 + ``` +
+ **Requirements:** + - All filters in the WHERE clause must be indexable by Tantivy + - The field in the SELECT clause must be a secondary index field + - The WHERE clause must use `str_match()` on that same field + +!!! note "General Requirements" + For all four query patterns, every filter condition in the WHERE clause must be indexable by Tantivy. Refer to the [Single-stream queries](#single-stream-queries) and [Multi-stream queries](#multi-stream-queries) sections for details on which operators and conditions are indexable. + + +## Tantivy result cache + +### What is the Tantivy result cache? +The Tantivy result cache stores the output of Tantivy index searches to enhance search performance for repeated queries. When the cache is enabled and a query is executed, OpenObserve checks if identical results already exist in the cache. If found, the query retrieves results from the cache instead of re-executing index lookups, significantly reducing search time. + +The cache is disabled by default. To enable it, configure the environment variables described in the [Configure environment variable](#configure-environment-variable) section. + +### Memory requirements for Tantivy result cache +The Tantivy result cache requires memory based on the number of entries and the size of each entry. Calculate the memory required using this formula: +``` +(MAX_ENTRY_SIZE_in_KB × MAX_ENTRIES) / 1024 = Memory required in MB +``` +**Example calculation with default configuration:** +```bash +MAX_ENTRY_SIZE = 20KB +MAX_ENTRIES = 10000 + +Memory required = (20 × 10000) / 1024 = 195.315 MB +``` + +!!! note "Note" + When adjusting `ZO_INVERTED_INDEX_RESULT_CACHE_MAX_ENTRIES` or `ZO_INVERTED_INDEX_RESULT_CACHE_MAX_ENTRY_SIZE`, use this formula to ensure sufficient memory is available. + +### Performance impact +When the cache is enabled and a query result is found in the cache, search time can be reduced from hundreds of milliseconds to a few milliseconds. The cache is most effective for workloads with repeated queries using identical filters. + ## Verify if a query is using Tantivy To confirm whether a query used the Tantivy inverted index: @@ -164,4 +399,5 @@ To confirm whether a query used the Tantivy inverted index: 3. Under took_detail, check the value of `idx_took`: - If `idx_took` is greater than `0`, the query used the inverted index. - - If `idx_took` is `0`, the query did not use the inverted index. \ No newline at end of file + - If `idx_took` is `0`, the query did not use the inverted index. + diff --git a/docs/user-guide/pipelines/.pages b/docs/user-guide/pipelines/.pages index be678f61..46684ef9 100644 --- a/docs/user-guide/pipelines/.pages +++ b/docs/user-guide/pipelines/.pages @@ -2,6 +2,7 @@ nav: - Pipelines Overview: index.md - Pipelines in OpenObserve: pipelines.md - Create and Use Pipelines: use-pipelines.md + - Remote Destination: remote-destination.md - Import and Export Pipelines: import-and-export-pipelines.md - Manage Pipelines: manage-pipelines.md - Configurable Delay in Scheduled Pipelines: configurable-delay-in-scheduled-pipelines.md diff --git a/docs/user-guide/pipelines/remote-destination.md b/docs/user-guide/pipelines/remote-destination.md new file mode 100644 index 00000000..86702ee9 --- /dev/null +++ b/docs/user-guide/pipelines/remote-destination.md @@ -0,0 +1,197 @@ +--- +title: Pipeline Remote Destinations +description: Configure and manage remote destinations to send transformed pipeline data to external systems with persistent queuing, retry logic, and high-throughput performance. +--- +This document explains how to configure remote destinations in OpenObserve pipelines to send transformed data to external systems. It covers the setup process, technical architecture of the persistent queue mechanism, Write-Ahead Log (WAL) file operations, failure handling, retry logic, and performance optimization through environment variables. + +=== "How to" + ## What is a remote destination? + A remote destination allows you to send transformed pipeline data to external systems outside your OpenObserve instance. When you select **Remote** as your destination type in a pipeline, the system routes data to an external endpoint of your choice while ensuring data integrity and reliability through a persistent queue mechanism. + + ## Configuring a remote destination + + ??? "Step 1: Access the Management page" + ### Step 1: Access the Management page + + Navigate to the **Pipeline Destination** configuration page using either method: + + - **From the pipeline editor**: While setting up your pipeline, select Remote as the destination type > click the **Create New Destination** toggle. + ![remote-destination-from-pipeline-editor](../../images/remote-destination-from-pipeline-editor.png) + ![remote-destination-config-from-pipeline-editor](../../images/remote-destination-config-from-pipeline-editor.png) + - **From Management**: Click the settings icon in the navigation menu > **Pipeline Destinations** > **Add Destination**. + ![remote-destination-config-from-management](../../images/remote-destination-config-from-management.png) + + ??? "Step 2: Create the destination" + ### Step 2: Create the destination + In the **Add Destination** form: + + 1. **Name**: Provide a descriptive name for the external destination. For example, `remote_destination_dev`. + 2. **URL**: Specify the endpoint where data should be sent. + ![config-remote-destination](../../images/config-remote-destination.png) + !!! note "To send the transformed data to another OpenObserve instance:" + Use the following URL format: `https:///api///_json`
+ **Example**: To send data to a stream called `remote_pipeline` in the `default` organization on a different OpenObserve instance: `https://your-o2-instance.example.com/api/default/remote_pipeline/_json` +
+ After transformation, the transformed data will be sent to the `remote_pipeline` stream under the `default` organization in the destination OpenObserve instance. + !!! note "To send data to an external endpoint:" + Ensure that you provide the complete URL of your external service endpoint. + 3. **Method**: Select the HTTP method based on your requirement. + ![config-remote-destination-method](../../images/config-remote-destination-method.png) + !!! note "To send the transformed data to another OpenObserve instance:" + Select **POST**. + !!! note "To send data to an external endpoint:" + Select the method required by your external service. + 4. **Output Format**: Select the data format for transmission. + ![config-remote-destination-output-format](../../images/config-remote-destination-output-format.png) + !!! note "When to select JSON (default):" + Standard JSON format. Use this when the destination API requires standard JSON arrays or objects. **Use JSON, when you send the transformed data to another OpenObserve instance**. + !!! note "When to select NDJSON (Newline Delimited JSON):" + Each event is sent as a separate JSON object on its own line. Use this when sending the transformed data to observability platforms that expect NDJSON, for example, Datadog and Splunk. + **Important**: Always verify the data format expected by your destination system before selecting. Check the destination's API documentation or ingestion requirements to ensure compatibility. + 5. **Headers**: To send data to an external endpoint, you may need to provide authentication credentials, if required. In the **Header** field, enter Authorization and in the **Value** field, provide the authentication token. + !!! note "To send the transformed data to another OpenObserve instance:" + ![config-remote-destination-header](../../images/config-remote-destination-header.png) + 1. Log in to the destination OpenObserve instance. + 2. Navigate to **Data Sources** > **Databases**. + 3. Copy the authorization token value displayed there. + 4. Paste this token in the **Value** field. + ![config-remote-destination-headers](../../images/config-remote-destination-headers.png) + !!! note "To send data to an external endpoint:" + Add the authentication headers required by your external service. This could be API keys, bearer tokens, or other authentication methods depending on the service. + 6. **Skip TLS Verify**: Use this toggle to enable or disable Transport Layer Security (TLS) verification. Enable this toggle to bypass security and certificate verification checks. **Use with caution, as disabling verification may expose data to security risks.** + 7. Click **Save** to create the destination. + + ??? "Step 3: Use in your pipeline" + ### Step 3: Use in your pipeline + + After creating the remote destination, you can select it from the **Destination** dropdown when configuring the remote destination node in your pipeline. The dropdown displays all previously created remote destinations with their names and URLs for easy identification. + ![use-pipeline-destination](../../images/use-pipeline-destination.png) + + ## Environment variables for remote destination + + | **Environment Variable** | **Description** | + | --- | --- | + | ZO_PIPELINE_REMOTE_STREAM_CONCURRENT_COUNT | • Defines the number of concurrent threads the exporter uses to send data from Write-Ahead Log (WAL) files to the remote destination.
• Controls export parallelism. Higher values increase throughput but also increase CPU usage.
• Set this value to match or slightly exceed the number of CPU cores. Increase when export speed lags behind ingestion, and decrease if CPU usage stays above 80 percent. | + | ZO_PIPELINE_FILE_PUSH_BACK_INTERVAL | • Specifies how long a reader waits before checking the queue again after catching up to the writer.
• Balances latency and CPU utilization. Lower values reduce event latency but raise CPU load; higher values lower CPU usage but increase latency.
• Use 1 second for low-latency pipelines. Increase to 5–10 seconds in resource-limited systems or when small delays are acceptable. | + | ZO_PIPELINE_SINK_TASK_SPAWN_INTERVAL_MS | • Determines how often the scheduler assigns new export tasks to reader threads, measured in milliseconds.
• Controls backlog clearing speed and CPU overhead. Shorter intervals improve responsiveness but raise CPU usage.
• Use 10–50 ms to clear persistent backlogs faster. Use 200–500 ms to reduce CPU load in low-throughput environments. Keep 100 ms for balanced performance. | + | ZO_PIPELINE_MAX_RETRY_COUNT | • Sets the maximum number of retry attempts per WAL file after export failure.
• Prevents endless retries for failed exports and limits disk growth when destinations are unreachable.
• Increase to 10 when the destination is unreliable or often unavailable. Keep the default of 6 for stable networks. | + | ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS | • Defines the longest allowed interval between retry attempts during exponential backoff.
• Ensures failed files are retried at least once in the defined period and prevents retries from spacing out indefinitely.
• Keep the default 24 hours for typical conditions. Increase to 48 hours if the destination experiences long outages. | +=== "Overview" +
+ This section explains the technical architecture and internal mechanisms of remote destinations. After configuring a remote destination, understanding the underlying systems helps with troubleshooting, performance optimization, and operational decisions. + + ## How remote destinations work + The remote destination feature allows you to send pipeline data to external systems. However, the core challenge is that the data must not be lost if the system crashes, restarts, or if the destination becomes temporarily unavailable. +
+ To resolve this issue, OpenObserve writes data to disk first, then sends it to the remote destination. This creates a safety buffer. + + ## How data flows in a pipeline + Data moves through five stages: + Pipeline > Transformations > Disk Storage > Transmission > Cleanup + + !!! note "Stage details:" + + - **Stage 1 - Pipeline Source:** Data enters from the source stream via the source node of the pipeline. + - **Stage 2 - Transformations:** All configured functions and conditions are applied to process the data. + - **Stage 3 - Disk Storage:** After all transformations complete, the processed data is written to Write-Ahead Log files on disk. This write happens before any transmission attempt. + - **Stage 4 - Network Transmission:** Data is read from disk and sent to the remote destination via HTTP. + - **Stage 5 - Cleanup:** After successful transmission and acknowledgment from the destination, the disk files are deleted. + **Note**: Disk storage occurs only after all transformations finish. WAL files contain the final processed version of the data, not the original input. + + ## Write-Ahead Log files + Write-Ahead Log files, or WAL files, are the disk storage mechanism used in stage 3. These are files written to disk that temporarily hold data between processing and transmission. + + - **How many WAL files are created in advance**: The total number of WAL files created depends on how many remote destinations you have configured. For each remote destination, OpenObserve creates the number of files specified in the `ZO_MEMTABLE_BUCKET_NUM` environment variable. + - **Where these files are stored**T: he files are written to the `/data/remote_stream_wal/` directory on disk. + + !!! note "Note" + During normal operation, when data flows through the pipeline, these files are simultaneously being written to and read from. Files in this state are called **active files**. + + ## How WAL files operate + The WAL file system uses a multi-threaded architecture to achieve high throughput. + + ### Writer thread + The system uses multiple writer threads, equal to the `ZO_MEMTABLE_BUCKET_NUM` setting, to add data to WAL files. + Each writer thread: + + - Receives transformed data from the pipeline + - Writes data sequentially to the current active WAL file + - Moves to the next file when the current file reaches capacity + - Operates continuously as long as data flows through the pipeline + + **Important**: A file reaches capacity when either of two conditions is met: the file size reaches `ZO_PIPELINE_MAX_FILE_SIZE_ON_DISK_MB` or the file has been open for `ZO_PIPELINE_MAX_FILE_RETENTION_TIME_SECONDS`. + + ### Reader threads + Multiple reader threads, 30 by default, handle transmission to the remote destination. Each reader thread: + + - Selects a WAL file that contains unsent data + - Reads data from the file + - Sends the data to the remote destination via HTTP + - Tracks successful transmission progress + + Multiple readers enable parallel transmission. While one reader sends data from file A, another reader can simultaneously send data from file B. This parallel processing allows the system to handle high data volumes. In production deployments, the system consistently achieves throughput of 30-40 MB per second. + + ### FIFO ordering + The system maintains First-In-First-Out ordering. The oldest data, meaning the data that was transformed earliest, is always transmitted first. This guarantee ensures that data arrives at the destination in the same temporal order it was processed. +
+ The reader threads coordinate to maintain this ordering even while operating in parallel. Files are assigned to readers based on age, ensuring older files are prioritized. + + ## WAL file lifecycle + WAL files are deleted under four conditions: +

+ **Condition 1: Successful Transmission** +
+ All data in the file has been sent and the destination has acknowledged receipt. The file is immediately deleted. This is the normal deletion path during healthy operation. +

+ **Condition 2: Disk Space Limit** +
+ When remote destination WAL files consume 50% of available disk space (default), the system stops writing new files and deletes the oldest files to free space. Deletion occurs regardless of transmission status. This limit prevents remote destination operations from consuming disk space needed by other OpenObserve components like data ingestion and query processing. The disk space limit is configurable via the `ZO_PIPELINE_WAL_SIZE_LIMIT` environment variable. On a 1 TB disk with the default 50% limit, remote destination files will not exceed approximately 500 GB. +

+ **Condition 3: Data Retention Policy** +
+ WAL files containing data older than the stream's retention period are deleted regardless of transmission status. Each pipeline inherits retention settings from its associated stream. If a stream has 30-day retention, WAL files with data older than 31 days are deleted even if never transmitted. This aligns remote destination data lifecycle with overall retention policy. +

+ **Condition 4: Retry Exhaustion**
+ After repeated transmission failures, the system stops retrying the file. By default, this happens after 6 failed attempts. The file then remains on disk but is no longer scheduled for transmission. + + - This behavior can be changed using the `ZO_PIPELINE_REMOVE_WAL_FILE_AFTER_MAX_RETRY` configuration. When set to true, failed files are permanently deleted instead of being kept on disk. + - The retry limit is configurable via `ZO_PIPELINE_MAX_RETRY_COUNT`. + + ## Failure handling and retry + + When transmission fails, the system waits before retrying. Wait times increase with each failure: 5 minutes after the first failure, 10 minutes after the second, 20 minutes after the third, and so on, doubling each time. This is called exponential backoff. It gives a failed or overloaded destination time to recover instead of immediately retrying, which would consume bandwidth and potentially worsen the problem. + + - **Maximum wait time**: Retry intervals cannot exceed 24 hours (configurable via `ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS`), ensuring files are retried at least once daily. + - **Random variation**: The system adds small random delays to retry times. This prevents many failed files from retrying at the exact same moment and overwhelming the destination. This is known as preventing the "thundering herd" problem, where multiple requests hitting a recovering system simultaneously can cause it to fail again. + - **Retry limit**: After 6 failed attempts (configurable via `ZO_PIPELINE_MAX_RETRY_COUNT`), the system stops retrying. File handling then follows the rules described in Condition 4 of the section. + + ## Persistent queue architecture + The combination of disk-based storage, multi-threaded processing, FIFO ordering, and retry logic implements a system pattern known as a persistent queue. A persistent queue is a queue that stores items on disk so it survives restarts and failures, preserves order, and resumes transmission without duplication. +
+ Internally, OpenObserve achieves this pattern through the same components described earlier. Write-Ahead Log files act as the queue storage, the exporter manages the queue, a single writer thread adds transformed records, and multiple reader threads transmit them to the destination in order. Together, these elements ensure fault-tolerant and consistent data flow across restarts and retries. + + ## Storage organization of WAL files + + OpenObserve stores remote destination Write-Ahead Log (WAL) files separately from the files used in normal data ingestion. This separation ensures that export operations do not interfere with the system’s core ingestion and query processes. +
+ Remote destination WAL files are stored in the `/data/remote_stream_wal/` directory, while the standard ingestion process uses the `/data/wal/` directory. + + + ## Performance + Remote destinations in OpenObserve support high-throughput workloads: + + - **Production-validated**: 30-40 MB/second sustained throughput (tested on 4 vCPU nodes) + - **Peak capacity**: 80+ MB/second during traffic spikes + - **Mixed workloads**: Efficiently handles both low-volume streams (1-2 events/hour) and high-volume streams (30+ MB/second) simultaneously + + The system prevents disk pileup on ingester nodes by matching export rates with ingestion rates under normal operating conditions. + + ## Environment variables for remote destination + + | **Environment Variable** | **Description** | + | --- | --- | + | ZO_PIPELINE_REMOTE_STREAM_CONCURRENT_COUNT | • Defines the number of concurrent threads the exporter uses to send data from Write-Ahead Log (WAL) files to the remote destination.
• Controls export parallelism. Higher values increase throughput but also increase CPU usage.
• Set this value to match or slightly exceed the number of CPU cores. Increase when export speed lags behind ingestion, and decrease if CPU usage stays above 80 percent. | + | ZO_PIPELINE_FILE_PUSH_BACK_INTERVAL | • Specifies how long a reader waits before checking the queue again after catching up to the writer.
• Balances latency and CPU utilization. Lower values reduce event latency but raise CPU load; higher values lower CPU usage but increase latency.
• Use 1 second for low-latency pipelines. Increase to 5–10 seconds in resource-limited systems or when small delays are acceptable. | + | ZO_PIPELINE_SINK_TASK_SPAWN_INTERVAL_MS | • Determines how often the scheduler assigns new export tasks to reader threads, measured in milliseconds.
• Controls backlog clearing speed and CPU overhead. Shorter intervals improve responsiveness but raise CPU usage.
• Use 10–50 ms to clear persistent backlogs faster. Use 200–500 ms to reduce CPU load in low-throughput environments. Keep 100 ms for balanced performance. | + | ZO_PIPELINE_MAX_RETRY_COUNT | • Sets the maximum number of retry attempts per WAL file after export failure.
• Prevents endless retries for failed exports and limits disk growth when destinations are unreachable.
• Increase to 10 when the destination is unreliable or often unavailable. Keep the default of 6 for stable networks. | + | ZO_PIPELINE_MAX_RETRY_TIME_IN_HOURS | • Defines the longest allowed interval between retry attempts during exponential backoff.
• Ensures failed files are retried at least once in the defined period and prevents retries from spacing out indefinitely.
• Keep the default 24 hours for typical conditions. Increase to 48 hours if the destination experiences long outages. |