Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Topics:
File: configuring-lokistack-otlp
- Name: OpenTelemetry data model
File: opentelemetry-data-model
- Name: Loki query performance troubleshooting
File: loki-query-performance-troubleshooting
---
Name: Upgrading logging
Dir: upgrading
Expand Down
1 change: 1 addition & 0 deletions configuring/configuring-the-log-store.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ include::modules/logging-loki-reliability-hardening.adoc[leveloffset=+2]
include::modules/loki-retention.adoc[leveloffset=+2]
include::modules/loki-memberlist-ip.adoc[leveloffset=+2]
include::modules/loki-restart-hardening.adoc[leveloffset=+2]
//include::modules/enabling-automatic-stream-sharding.adoc[leveloffset=+2]

//Advanced deployment and scalability
[id="advanced_{context}"]
Expand Down
22 changes: 22 additions & 0 deletions configuring/loki-query-performance-troubleshooting.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:_newdoc-version: 2.18.4
:_template-generated: 2025-09-22
:_mod-docs-content-type: ASSEMBLY
include::_attributes/common-attributes.adoc[]

:toc:
[id="loki-query-performance-troubleshooting_{context}"]
= Loki query performance troubleshooting

:context: loki-query-performance-troubleshooting

This documentation details methods for optimizing your Logging stack to improve query performance and provides steps for troubleshooting.

include::modules/best-practices-for-loki-query-performance.adoc[leveloffset=+1]

include::modules/best-practices-for-loki-labels.adoc[leveloffset=+1]

include::modules/configuration-of-stream-labels-in-loki-operator.adoc[leveloffset=+1]

include::modules/analyzing-loki-query-performance.adoc[leveloffset=+1]

include::modules/query-performance-analysis.adoc[leveloffset=+1]
68 changes: 68 additions & 0 deletions modules/analyzing-loki-query-performance.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// Module included in the following assemblies:
//
// * configuring/loki-query-performance-troubleshooting.adoc

:_newdoc-version: 2.18.4
:_template-generated: 2025-10-24
:_mod-docs-content-type: PROCEDURE

[id="analyzing-loki-query-performance_{context}"]
= Analyzing Loki query performance

Every query and subquery in Loki generates a `metrics.go` log line with performance statistics. Subqueries emit the log line in the queriers.
Every query has an associated single summary `metrics.go` line emitted by the query-front end.
Use these statistics to calculate the query performance metrics.

.Prerequisites
* You have administrator permissions.
* You have access to the {ocp-product-title} web console.
* You installed and configured {loki-op}.

.Procedure
. In the {ocp-product-title} web console, navigate to the *Metrics* -> *Observe* tab.

. Note the following values:

* *duration*: Denotes the amount of time a query took to run.
* *queue_time*: Denotes the time a query spent in the queue before being processed.
* *chunk_refs_fetch_time*: Denotes the amount of time spent in getting chunk information from the index.
* *store_chunks_download_time*: Denotes the amount of time in getting chunks from cache or storage.

. Calculate the following performance metrics:

** total query time as `total_duration`:
+
[subs=+quotes]
----
total_duration = *duration* + *queue_time*
----

** Percentage of the total duration that a query spent in the queue as `Queue Time`:
+
[subs=+quotes]
----
Queue Time = *queue_time* / total_duration * 100
----

** Calculate the percentage of the total duration that was spent in getting chunk information from the index as `Chunk Refs Fetch Time`:
+
[subs=+quotes]
----
Chunk Refs Fetch Time = *chunk_refs_fetch_time* / total_duration * 100
----

** Calculate the percentage of the total duration that was spent in getting chunks from cache or storage:
+
[subs=+quotes]
----
Chunks Download Time = *store_chunks_download_time* / total_duration * 100
----

** Calculate the percentage of the total duration that was spent in executing the query:
+
[subs=+quotes]
----
Execution Time = (*duration* - *chunk_refs_fetch_time* - *store_chunks_download_time*) / total_duration * 100
----

. Refer to https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/about_openshift_logging/index/analyze-query-performance_loki-query-performance-troubleshooting[Query performance analysis] to understand the reason for each metric and how each metric affects query performance.
20 changes: 20 additions & 0 deletions modules/best-practices-for-loki-labels.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// Module included in the following assemblies:
//
// * configuring/loki-query-performance-troubleshooting.adoc

:_newdoc-version: 2.18.4
:_template-generated: 2025-09-25
:_mod-docs-content-type: CONCEPT

[id="best-practices-for-loki-labels_{context}"]
= Best practices for Loki labels

Labels in Loki are the keyspace on which Loki shards incoming data. They are also the index used for finding logs at query-time. You can optimize query performance by properly using labels.

Consider the following criteria when creating labels:

* Labels should describe infrastructure. This could include regions, clusters, servers, applications, namespaces, or environments.

* Labels are long-lived. Label values should generate logs perpetually, or at least for several hours.

* Labels are intuitive for querying.
38 changes: 38 additions & 0 deletions modules/best-practices-for-loki-query-performance.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Module included in the following assemblies:
//
// * configuring/loki-query-performance-troubleshooting.adoc


:_newdoc-version: 2.18.4
:_template-generated: 2025-09-25
:_mod-docs-content-type: CONCEPT

[id="best-practices-for-loki-query-performance_{context}"]
= Best practices for Loki query performance

You can take the following steps to improve Loki query performance:

* Ensure that you are running the latest version of the {loki-op}.

* Ensure that you have migrated LokiStack schema to the `v13` version.

* Ensure that you use reliable and fast object storage. Loki places significant demands on object storage.
If you are not using an object store solution from a cloud provider, use solid-state drive (SSD) for your object storage.
By using SSDs you can benefit from the high parallelization capabilities of Loki.
+
To better understand the utilization of object storage by Loki, you can use the following query in the *Metrics* dashboard in the {ocp-product-title} web console:
+
[source]
----
sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))
----

* {loki-op} enables automatic stream sharding by default. The default automatic stream sharding mechanism should be adequate in most cases and users should not need to configure `perStream*` attributes.

* If you use the OpenTelemetry Protocol (OTLP) data model, you can configure additional stream labels in LokiStack. For more information, see link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/configuring/configuring-the-log-store#best-practices-for-loki-labels_loki-query-performance-troubleshooting[Best practices for Loki labels].

* Different types of queries have different performance characteristics. Use simple filter queries instead of regular expressions for better performance.

[role="_additional-resources"]
.Additional resources
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/about_openshift_logging/index/analyzing-loki-query-performance_loki-query-performance-troubleshooting[Analyzing Loki query performance]
59 changes: 59 additions & 0 deletions modules/configuration-of-stream-labels-in-loki-operator.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Module included in the following assemblies:
//
// * configuring/loki-query-performance-troubleshooting.adoc

:_newdoc-version: 2.18.4
:_template-generated: 2025-09-25
:_mod-docs-content-type: CONCEPT

[id="configuration-of-stream-labels-in-loki-operator_{context}"]
= Configuration of stream labels in {loki-op}

Configuring which labels the {loki-op} will use as stream labels depends on the data model you are using: ViaQ or OpenTelemetry Protocol (OTLP).

Both models come with a predefined set of stream labels, for more information, see link:https://docs.redhat.com/en/documentation/red_hat_openshift_logging/latest/html/configuring_logging/opentelemetry-data-model[OpenTelemetry data model].

ViaQ model::
ViaQ does not support structured metadata.
To configure stream labels for the ViaQ model, add the configuration in the `ClusterLogForwarder` resource. For example:
+
[source,yaml]
----
apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
serviceAccount:
name: logging-collector
outputs:
- name: lokistack-out
type: lokiStack
lokiStack:
target:
name: logging-loki
namespace: openshift-logging
labelKeys:
application:
ignoreGlobal: <true_or_false>
labelKeys: []
audit:
ignoreGlobal: <true_or_false>
labelKeys: []
infrastructure:
ignoreGlobal: <true_or_false>
labelKeys: []
global: []
----
+
`lokiStack.labelKeys` field contains the configuration that maps log record keys to Loki labels used to identify streams.

OTLP model::
In the OTLP model all labels that are not specified as stream labels are attached as structured metadata.

The following are the best practices for creating stream labels:

* have a low cardinality with at most tens of values.
* The values are long lived. For example, the first level of an HTTP path: `/load`, `/save`, and `/update`.
* The labels can be used in queries to improve query performance.
49 changes: 49 additions & 0 deletions modules/query-performance-analysis.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Module included in the following assemblies:
//
// * configuring/loki-query-performance-troubleshooting.adoc

:_newdoc-version: 2.18.4
:_template-generated: 2025-09-22
:_mod-docs-content-type: CONCEPT

[id="query-performance-analysis_{context}"]
= Query performance analysis

For best query performance, you want to see as much time as possible spent in execution time, denoted by the `Execution Time` metric.
See the table below for the reason other performance metrics might be higher and the steps you can take to improve them.
You can also reduce the execution time by modifying your queries, thereby improving the overall performance.

[options="header",cols="2,5,5"]
|====
|Issue
|Reason
|Fix
.2+|High `Execution Time`
|Queries might be doing many CPU-intensive operations such as regular expression processing.

a| You can make the following changes:

* Change your queries to reduce or remove regular expressions.
* Add more CPU resources.

|Your queries have many small log lines.

|If your queries have many small lines, execution becomes dependent on how fast Loki can iterate the lines themselves. This becomes a CPU clock frequency bottleneck. To make things faster you need a faster CPU.


|High `Queue Time`
|You do not have enough queriers running.
|The only fix is to increase the number of queriers replicas in the `LokiStack` spec.

|High `Chunk Refs Fetch Time`
|Insufficient number of index-gateway replicas in the `LokiStack` spec.
|Increase the number of index-gateway replicas or ensure they have enough CPU resources.

|High `Chunks Download Time`
|The chunks might be too small
|Check the average chunk size by dividing `total_bytes` value by `cache_chunk_req` value. The average represents the average uncompressed bytes per chunk. The value for best performance should be in the order of magnitude of megabytes. If the chunks are only a few hundred bytes or kilobytes in size, revisit labels to ensure that you are not splitting your data into very small chunks.

|Query timing out
|Query timeout value might be too low
|Increase the `queryTimeout` value in the LokiStack spec.
|====