Skip to content

Commit

Permalink
[DOC] Updates for caching doc and linting fixes (#3693)
Browse files Browse the repository at this point in the history
* Updates for caching doc and linting fixes

* Apply suggestions from code review

Co-authored-by: Joe Elliott <joe.elliott@grafana.com>

---------

Co-authored-by: Joe Elliott <joe.elliott@grafana.com>
  • Loading branch information
knylander-grafana and joe-elliott committed May 22, 2024
1 parent 9c5fc70 commit 41a11b4
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 25 deletions.
39 changes: 21 additions & 18 deletions docs/sources/tempo/operations/caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,24 @@ weight: 65
Caching is mainly used to improve query performance by storing bloom filters of all backend blocks which are accessed on every query.

Tempo uses an external cache to improve query performance.
The supported implementations are [Memcached](https://memcached.org/) and [Redis](https://redis.io/).
Tempo supports [Memcached](https://memcached.org/) and [Redis](https://redis.io/).

For information about search performance, refer to [Tune search performance](https://grafana.com/docs/tempo/latest/operations/backend_search/).

## Memcached

Memcached is one of the cache implementations supported by Tempo.
It is used by default in the Tanka and Helm examples, see [Deploying Tempo]({{< relref "../setup/deployment" >}}).
It's used by default in the Tanka and Helm examples.
Refer to [Deploying Tempo]({{< relref "../setup/deployment" >}}).

### Connection limit

As a cluster grows in size, the number of instances of Tempo connecting to the cache servers also increases.
By default, Memcached has a connection limit of 1024. If this limit is surpassed new connections are refused.
This is resolved by increasing the connection limit of Memcached.
By default, Memcached has a connection limit of 1024.
Memcached refuses new connections when this limit is surpassed.
You can resolve this issue by increasing the connection limit of Memcached.

These errors can be observed using the `tempo_memcache_request_duration_seconds_count` metric.
You can use the `tempo_memcache_request_duration_seconds_count` metric to observe these errors.
For example, by using the following query:

```promql
Expand All @@ -36,9 +40,9 @@ This metric is also shown in [the monitoring dashboards]({{< relref "./monitor"

<p align="center"><img src="../caching_memcached_connection_limit.png" alt="QPS and latency of requests to memcached"></p>

Note that the already open connections continue to function, just new connections are refused.
Note that the already open connections continue to function. New connections are refused.

Additionally, Memcached will log the following errors when it can't accept any new requests:
Additionally, Memcached logs the following errors when it can't accept any new requests:

```
accept4(): No file descriptors available
Expand All @@ -47,16 +51,16 @@ accept4(): No file descriptors available
Too many open connections
```

When using the [memcached_exporter](https://github.com/prometheus/memcached_exporter), the number of open connections can be observed at `memcached_current_connections`.
When using the [memcached_exporter](https://github.com/prometheus/memcached_exporter), you can observe the number of open connections at `memcached_current_connections`.

## Cache size control

Tempo querier accesses bloom filters of all blocks while searching for a trace. This essentially mandates the size
of cache to be at-least the total size of the bloom filters (the working set) . However, in larger deployments, the
working set might be larger than the desired size of cache. When that happens, eviction rates on the cache grow high,
and hit rate drop. Not nice!
Tempo querier accesses bloom filters of all blocks while searching for a trace.
This essentially mandates the size of cache to be at-least the total size of the bloom filters (the working set).
However, in larger deployments, the working set might be larger than the desired size of cache.
When that happens, eviction rates on the cache grow high, and hit rate drop.

Tempo provides two config parameters in order to filter down on the items stored in cache.
Tempo provides two configuration parameters to filter down on the items stored in cache.

```
# Min compaction level of block to qualify for caching bloom filter
Expand All @@ -68,13 +72,12 @@ Tempo provides two config parameters in order to filter down on the items stored
[cache_max_block_age: <duration>]
```

Using a combination of these config options, we can narrow down on which bloom filters are cached, thereby reducing our
cache eviction rate, and increasing our cache hit rate. Nice!
Using a combination of these configuration options, you can narrow down on which bloom filters are cached, thereby reducing the
cache eviction rate, and increasing the cache hit rate.

In order to decide the values of these config parameters, you can use a cache summary command in the [tempo-cli]({{< relref "./tempo_cli" >}}) that
In order to decide the values of these configuration parameters, you can use a cache summary command in the [tempo-cli]({{< relref "./tempo_cli" >}}) that
prints a summary of bloom filter shards per day and per compaction level. The result looks something like this:

<p align="center"><img src="../cache-summary.png" alt="Cache summary"></p>

The above image shows the bloom filter shards over 14 days and 6 compaction levels. This can be used to decide the
above configuration parameters.
This image shows the bloom filter shards over 14 days and 6 compaction levels. This can be used to decide the configuration parameters.
6 changes: 3 additions & 3 deletions docs/sources/tempo/release-notes/v2-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ Those issues were corrected [in PR 3300](https://github.com/grafana/tempo/pull/3
### Cache configuration refactored

The major cache refactor allows multiple role-based caches to be configured. [[PR 3166](https://github.com/grafana/tempo/pull/3166)]
This change resulted in the following fields being deprecated.
This change resulted in several fields being deprecated (refer to the old configuration).
These have all been migrated to a top level `cache:` field.

For more information about the configuration, refer to the [Cache]({{< relref "../configuration#cache" >}}) section.
Expand All @@ -188,7 +188,7 @@ storage:
redis:
```

With the new configuration, you create your list of caches,- with either `redis` or `memcached` cluster with your configuration, then define the types of data and roles.
With the new configuration, you create your list of caches with either `redis` or `memcached` cluster with your configuration, then define the types of data and roles.

```yaml
cache:
Expand All @@ -206,7 +206,7 @@ cache:

## Security fixes

The following vulnerabilities have been addressed:
This release addresses the following vulnerabilities:

* Addressed [CVE-2023-5363](https://github.com/advisories/GHSA-xw78-pcr6-wrg8).
* Updated the `memcached` default image in Jsonnet for multiple CVEs. [PR 3310](https://github.com/grafana/tempo/pull/3310)
Expand Down
10 changes: 6 additions & 4 deletions docs/sources/tempo/setup/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ For information on changing the vParquet version, refer to [Choose a different b
### Cache configuration refactored

The major cache refactor to allow multiple role-based caches to be configured. [[PR 3166](https://github.com/grafana/tempo/pull/3166)]
This change resulted in the following fields being deprecated.
These have all been migrated to a top level `cache:` field.
This change resulted in several fields being deprecated (refer to the old configuration).

These fields have all been migrated to a top level `cache:` field.

For more information about the configuration, refer to the [Cache]({{< relref "../configuration#cache" >}}) section.

Expand Down Expand Up @@ -224,7 +225,7 @@ For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo

### Default block format changed to vParquet2

While not a breaking change, upgrading to Tempo 2.2 will by default change Tempo’s block format to vParquet2.
While not a breaking change, upgrading to Tempo 2.2 by default changes Tempo’s block format to vParquet2.

To stay on a previous block format, read the [Parquet configuration documentation]({{< relref "../configuration/parquet#choose-a-different-block-format" >}}).
We strongly encourage upgrading to vParquet2 as soon as possible as this is required for using structural operators in your TraceQL queries and provides query performance improvements, in particular on queries using the `duration` intrinsic.
Expand Down Expand Up @@ -263,7 +264,8 @@ For more information on other enhancements, read the [Tempo 2.1 release notes]({

### Remove support for Search on v2 blocks

Users can no longer search blocks in v2 format. Only vParquet and vParquet2 formats support search. The following search configuration options were removed from the overrides section:
Users can no longer search blocks in v2 format. Only the Parquet formats support search.
These search configuration options were removed from the overrides section:

```
overrides:
Expand Down

0 comments on commit 41a11b4

Please sign in to comment.