From 7da986ff0f2cf8eebd6e614a38a2c4e17b59b76c Mon Sep 17 00:00:00 2001 From: Marc Lopez Rubio Date: Tue, 1 Apr 2025 14:36:28 +0800 Subject: [PATCH 1/2] Add APM Server known issue for TBS Signed-off-by: Marc Lopez Rubio --- docs/en/observability/apm/known-issues.asciidoc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/en/observability/apm/known-issues.asciidoc b/docs/en/observability/apm/known-issues.asciidoc index 94e8828871..53c85cc0af 100644 --- a/docs/en/observability/apm/known-issues.asciidoc +++ b/docs/en/observability/apm/known-issues.asciidoc @@ -21,6 +21,17 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_ // If applicable, link to fix //// +[discrete] +== Tail Sampling may not compact / expired TTLs as quickly as desired, causing increased storage usage. + +_Elastic Stack versions: 8.0.0+ < 9.0**_ + +There are some issues with the Tail Sampling implementation in versions 8.0.0+ < 9.0 that may cause the buffered traces to not be compacted or expired as quickly as desired. This can lead to increased storage usage for longer than the default 30m TTL. + +This may manifest in two ways, increased value log (vlog) file size and increased SST (LSM) file size. LSM growth and late compaction is particularly troublesome given how the underlying K/V database performs compactions on its layers. There is noticeable LSM growth for use-cases where traces are under 1KB in size, since they are written to the LSM layer directly. + +This issue is fixed in 9.0.0, due to a re-implementation of how the underlying tail sampling databases are used. The new implementation uses a more efficient partitioning scheme, allowing more efficient expiration of traces. + [discrete] == APM Server v8.6.x and prior with Elasticsearch v8.15.x and later has broken APM UI From 14c6ab46f7c09636a768430eedcf94382ed98b5e Mon Sep 17 00:00:00 2001 From: Brandon Morelli Date: Wed, 2 Jul 2025 14:21:01 -0700 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: Colleen McGinnis --- docs/en/observability/apm/known-issues.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/observability/apm/known-issues.asciidoc b/docs/en/observability/apm/known-issues.asciidoc index 53c85cc0af..21040409b0 100644 --- a/docs/en/observability/apm/known-issues.asciidoc +++ b/docs/en/observability/apm/known-issues.asciidoc @@ -24,9 +24,9 @@ _Versions: XX.XX.XX, YY.YY.YY, ZZ.ZZ.ZZ_ [discrete] == Tail Sampling may not compact / expired TTLs as quickly as desired, causing increased storage usage. -_Elastic Stack versions: 8.0.0+ < 9.0**_ +_Elastic Stack versions: All 8.x versions_ -There are some issues with the Tail Sampling implementation in versions 8.0.0+ < 9.0 that may cause the buffered traces to not be compacted or expired as quickly as desired. This can lead to increased storage usage for longer than the default 30m TTL. +There are some issues with the tail sampling implementation in all 8.x versions that may prevent buffered traces from being compacted or expired as quickly as expected. This can lead to increased storage usage for longer than the default 30m TTL. This may manifest in two ways, increased value log (vlog) file size and increased SST (LSM) file size. LSM growth and late compaction is particularly troublesome given how the underlying K/V database performs compactions on its layers. There is noticeable LSM growth for use-cases where traces are under 1KB in size, since they are written to the LSM layer directly.