From 81c81eb5a2ba0314fdf03998a36e00f5af66e4e2 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 11:10:27 +0800 Subject: [PATCH 01/14] Update TBS 9.2 numbers --- .../observability/apm/transaction-sampling.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index b73406fce7..1ac848a8a6 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -162,16 +162,19 @@ Terminology: * Event Indexing Rate: The throughput from the APM Server to Elasticsearch, measured in events per second or documents per second. Note that it should roughly be equal to Event Ingestion Rate * Sampling Rate. * Memory Usage: The maximum Resident Set Size (RSS) of APM Server process observed throughout the benchmark. -#### APM Server 9.0 +#### APM Server 9.2 | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6id.2xlarge | TBS disabled | 47220 | 47220 (100% sampling) | 0.98 | 0 | -| c6id.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 21310 | 2360 | 1.41 | 13.1 | -| c6id.2xlarge | TBS enabled, local NVMe SSD from c6id instance | 21210 | 2460 | 1.34 | 12.9 | -| c6id.4xlarge | TBS disabled | 142200 | 142200 (100% sampling) | 1.12 | 0 | -| c6id.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 32410 | 3710 | 1.71 | 19.4 | -| c6id.4xlarge | TBS enabled, local NVMe SSD from c6id instance | 37040 | 4110 | 1.73 | 23.6 | +| c6gd.2xlarge | TBS disabled | 45120 | 45120 (100% sampling) | 0.95 | 0 | +| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 17120 | 1527 | 1.48 | 11.3 | +| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 19490 | 1661 | 1.48 | 12.3 | +| c6gd.4xlarge | TBS disabled | 63460 | 63460 (100% sampling) | 1.45 | 0 | +| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 26340 | 2248 | 2.09 | 17.8 | +| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 36620 | 3041 | 2.22 | 21.8 | +| c6gd.8xlarge | TBS disabled | 119800 | 119800 (100% sampling) | 1.44 | 0 | +| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 27620 | 2485 | 2.49 | 16.6 | +| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 46260 | 3909 | 2.43 | 25.8 | #### APM Server 8.18 From 49a6f9a5b290d7a402f257c53b50dce17b7b15d5 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 11:14:54 +0800 Subject: [PATCH 02/14] Update assumptions --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 1ac848a8a6..d8397582c3 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -150,7 +150,7 @@ In an APM Server implementation, the events are stored temporarily on disk inste It is recommended to use fast disks, ideally Solid State Drives (SSD) with high I/O per second (IOPS), when enabling tail-based sampling. Disk throughput and I/O may become performance bottlenecks for tail-based sampling and APM event ingestion overall. Disk writes are proportional to the event ingest rate, while disk reads are proportional to both the event ingest rate and the sampling rate. -To demonstrate the performance overhead and requirements, here are some reference numbers from a standalone APM Server deployed on AWS EC2 under full load that is receiving APM events containing only traces. These numbers assume no backpressure from Elasticsearch and a **10% sample rate in the tail sampling policy**. +To demonstrate the performance overhead and requirements, here are some reference numbers from a standalone APM Server deployed on AWS EC2 under full load that is receiving APM events containing only traces. These numbers assume no backpressure from Elasticsearch, a **uniform 10% sample rate in the tail sampling policy**, events being sent from 1024 agents concurrently, and sufficient disk space. :::{important} These figures are for reference only and may vary depending on factors such as sampling rate, average event size, and the average number of events per distributed trace. From 5b6b1773b42b4ce12dcf1f4129f6fceb7623e950 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 11:16:26 +0800 Subject: [PATCH 03/14] Add FIXME --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index d8397582c3..1fb93e99b6 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -190,7 +190,7 @@ Terminology: When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. -* The event ingestion rate and event indexing rate competes for disk IO. This is why there is an outlier data point where APM Server version 8.18 with a 32GB NVMe SSD shows a higher ingest rate but a slower event indexing rate than in 9.0. +* The event ingestion rate and event indexing rate competes for disk IO. This is why there is an outlier data point where APM Server version 8.18 with a 32GB NVMe SSD shows a higher ingest rate but a slower event indexing rate than in 9.0. FIXME The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. From b66e7134213c1a82f3c2dc5f1d4415712cabfe09 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 13:57:13 +0800 Subject: [PATCH 04/14] Add 8.19 numbers --- .../observability/apm/transaction-sampling.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 1fb93e99b6..c8b12f0583 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -176,16 +176,19 @@ Terminology: | c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 27620 | 2485 | 2.49 | 16.6 | | c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 46260 | 3909 | 2.43 | 25.8 | -#### APM Server 8.18 +#### APM Server 8.19 | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6id.2xlarge | TBS disabled | 50260 | 50270 (100% sampling) | 0.98 | 0 | -| c6id.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 10960 | 50 | 5.24 | 24.3 | -| c6id.2xlarge | TBS enabled, local NVMe SSD from c6id instance | 11450 | 820 | 7.19 | 30.6 | -| c6id.4xlarge | TBS disabled | 149200 | 149200 (100% sampling) | 1.14 | 0 | -| c6id.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11990 | 530 | 26.57 | 33.6 | -| c6id.4xlarge | TBS enabled, local NVMe SSD from c6id instance | 43550 | 2940 | 28.76 | 109.6 | +| c6gd.2xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | +| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 7.59 | +| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 7.78 | +| c6gd.4xlarge | TBS disabled | 61900 | 61900 (100% sampling) | 1.45 | 0 | +| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 8.31 | +| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 12.24 | +| c6gd.8xlarge | TBS disabled | 122800 | 122800 (100% sampling) | 1.45 | 0 | +| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 13280 | 34.20 | 22.61 | 8.43 | +| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 35810 | 2480 | 30.41 | 19.23 | When interpreting these numbers, note that: From 6a8bcbfcbb9343c3ca9ab62d5e5d90f799a2b75a Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 14:01:09 +0800 Subject: [PATCH 05/14] Fix disk usage --- solutions/observability/apm/transaction-sampling.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index c8b12f0583..1cfbb691bf 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -181,14 +181,14 @@ Terminology: | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| | c6gd.2xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | -| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 7.59 | -| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 7.78 | +| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 30.81 | +| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 27.70 | | c6gd.4xlarge | TBS disabled | 61900 | 61900 (100% sampling) | 1.45 | 0 | -| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 8.31 | -| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 12.24 | +| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 30.98 | +| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 50.99 | | c6gd.8xlarge | TBS disabled | 122800 | 122800 (100% sampling) | 1.45 | 0 | -| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 13280 | 34.20 | 22.61 | 8.43 | -| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 35810 | 2480 | 30.41 | 19.23 | +| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 13280 | 34.20 | 22.61 | 32.01 | +| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 35810 | 2480 | 30.41 | 86.86 | When interpreting these numbers, note that: From 96583c119cc3d841b7ffc5bd3f406a736b27443f Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 14:14:01 +0800 Subject: [PATCH 06/14] Explain poor 8.19 event indexing rate --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 1cfbb691bf..b034380b3d 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -193,7 +193,7 @@ Terminology: When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. -* The event ingestion rate and event indexing rate competes for disk IO. This is why there is an outlier data point where APM Server version 8.18 with a 32GB NVMe SSD shows a higher ingest rate but a slower event indexing rate than in 9.0. FIXME +* The event indexing rate divided by event ingestion rate should be roughly equal to the sampling rate when APM Server is performing normally. However, in 8.19, as the sampling decision handling requiring disk reads are lagging behind, the event indexing rate shows a more significant drop. The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. From 64dcc9b3c00a5626e23a1251af439552f48ad575 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 14:20:09 +0800 Subject: [PATCH 07/14] More terminology --- solutions/observability/apm/transaction-sampling.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index b034380b3d..1f12c98bc2 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -161,6 +161,8 @@ Terminology: * Event Ingestion Rate: The throughput from the APM agent to the APM Server using the Intake v2 protocol (the protocol used by Elastic APM agents), measured in events per second. * Event Indexing Rate: The throughput from the APM Server to Elasticsearch, measured in events per second or documents per second. Note that it should roughly be equal to Event Ingestion Rate * Sampling Rate. * Memory Usage: The maximum Resident Set Size (RSS) of APM Server process observed throughout the benchmark. +* TBS: Tail-based sampling. +* IOPS: Input/Output Operations Per Second, which is a measure of disk performance. #### APM Server 9.2 From e1f56389e670f3d601bcb9fdccc61d7fe0b0dcd0 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 16:19:09 +0800 Subject: [PATCH 08/14] Refine notes --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 1f12c98bc2..45fd6f8bb2 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -195,7 +195,7 @@ Terminology: When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. -* The event indexing rate divided by event ingestion rate should be roughly equal to the sampling rate when APM Server is performing normally. However, in 8.19, as the sampling decision handling requiring disk reads are lagging behind, the event indexing rate shows a more significant drop. +* Under normal operation, the event indexing rate divided by the event ingestion rate should approximate the configured sampling rate (10% in this case). However, in the version 8.19 numbers above, as APM Server is under full load, sampling decision handling lags behind due to disk read operations that compete with ingest path writes for disk I/O resources, resulting in a significantly lower event indexing rate than expected. The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. From 3899d8425b765de125b94fcfe47f322d70e3b8b1 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 16:27:15 +0800 Subject: [PATCH 09/14] Add note about memory usage --- solutions/observability/apm/transaction-sampling.md | 1 + 1 file changed, 1 insertion(+) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 45fd6f8bb2..2786e538cc 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -196,6 +196,7 @@ When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. * Under normal operation, the event indexing rate divided by the event ingestion rate should approximate the configured sampling rate (10% in this case). However, in the version 8.19 numbers above, as APM Server is under full load, sampling decision handling lags behind due to disk read operations that compete with ingest path writes for disk I/O resources, resulting in a significantly lower event indexing rate than expected. +* Memory usage measurements differ between versions: version 9.2 numbers reflect only the APM Server process RSS (excluding OS cache), while version 8.19 numbers include OS cache because the database is memory-mapped. Despite this measurement difference, version 9.0+ typically uses less memory overall due to its significantly smaller database footprint. The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. From bed57de72dc8c3c348b254c377bf71284b0edb86 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 16:28:40 +0800 Subject: [PATCH 10/14] Wording --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 2786e538cc..6b90dd9cee 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -196,7 +196,7 @@ When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. * Under normal operation, the event indexing rate divided by the event ingestion rate should approximate the configured sampling rate (10% in this case). However, in the version 8.19 numbers above, as APM Server is under full load, sampling decision handling lags behind due to disk read operations that compete with ingest path writes for disk I/O resources, resulting in a significantly lower event indexing rate than expected. -* Memory usage measurements differ between versions: version 9.2 numbers reflect only the APM Server process RSS (excluding OS cache), while version 8.19 numbers include OS cache because the database is memory-mapped. Despite this measurement difference, version 9.0+ typically uses less memory overall due to its significantly smaller database footprint. +* Memory usage measurements differ between versions: version 9.2 numbers reflect only the APM Server process RSS (excluding OS cache), while version 8.19 numbers include OS cache because the database is memory-mapped. Despite this measurement difference, version 9.0+ uses significantly less memory overall due to its much smaller database footprint. The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. From 320c7371798715990edfe6cfe61ad6cc87af48ca Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 16:29:19 +0800 Subject: [PATCH 11/14] Update versions --- solutions/observability/apm/transaction-sampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 6b90dd9cee..16f37c136a 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -198,7 +198,7 @@ When interpreting these numbers, note that: * Under normal operation, the event indexing rate divided by the event ingestion rate should approximate the configured sampling rate (10% in this case). However, in the version 8.19 numbers above, as APM Server is under full load, sampling decision handling lags behind due to disk read operations that compete with ingest path writes for disk I/O resources, resulting in a significantly lower event indexing rate than expected. * Memory usage measurements differ between versions: version 9.2 numbers reflect only the APM Server process RSS (excluding OS cache), while version 8.19 numbers include OS cache because the database is memory-mapped. Despite this measurement difference, version 9.0+ uses significantly less memory overall due to its much smaller database footprint. -The tail-based sampling implementation in version 9.0 offers significantly better performance compared to version 8.18, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.18, as the database grows larger, the performance slowdown can become disproportionate. +The tail-based sampling implementation in version 9.0+ offers significantly better performance compared to version 8.x, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.x, as the database grows larger, the performance slowdown can become disproportionate. ## Sampled data and visualizations [_sampled_data_and_visualizations] From 6f6f6de96b4c995ac4d7b34ec1c414cd9a605631 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 18:11:09 +0800 Subject: [PATCH 12/14] Add sampling rate note --- solutions/observability/apm/transaction-sampling.md | 1 + 1 file changed, 1 insertion(+) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 16f37c136a..1d1762167f 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -197,6 +197,7 @@ When interpreting these numbers, note that: * The metrics are inter-related. For example, it is reasonable to see higher memory usage and disk usage when the event ingestion rate is higher. * Under normal operation, the event indexing rate divided by the event ingestion rate should approximate the configured sampling rate (10% in this case). However, in the version 8.19 numbers above, as APM Server is under full load, sampling decision handling lags behind due to disk read operations that compete with ingest path writes for disk I/O resources, resulting in a significantly lower event indexing rate than expected. * Memory usage measurements differ between versions: version 9.2 numbers reflect only the APM Server process RSS (excluding OS cache), while version 8.19 numbers include OS cache because the database is memory-mapped. Despite this measurement difference, version 9.0+ uses significantly less memory overall due to its much smaller database footprint. +* Lower sampling rates result in higher event ingestion rates because less overhead is required for sampling decisions. For example, reducing the sampling rate from 10% to 5% in version 9.2 increases event ingestion rate by 5-10% (data not shown in the tables above). The tail-based sampling implementation in version 9.0+ offers significantly better performance compared to version 8.x, primarily due to a rewritten storage layer. This new implementation compresses data, as well as cleans up expired data more reliably, resulting in reduced load on disk, memory, and compute resources. This improvement is particularly evident in the event indexing rate on slower disks. In version 8.x, as the database grows larger, the performance slowdown can become disproportionate. From 86eacafa0a9d0b01c73207735ccf20594d87269b Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 18:47:33 +0800 Subject: [PATCH 13/14] Fix instance types --- .../observability/apm/transaction-sampling.md | 36 +++++++++---------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 1d1762167f..654cb4a1f8 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -168,29 +168,29 @@ Terminology: | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6gd.2xlarge | TBS disabled | 45120 | 45120 (100% sampling) | 0.95 | 0 | -| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 17120 | 1527 | 1.48 | 11.3 | -| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 19490 | 1661 | 1.48 | 12.3 | -| c6gd.4xlarge | TBS disabled | 63460 | 63460 (100% sampling) | 1.45 | 0 | -| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 26340 | 2248 | 2.09 | 17.8 | -| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 36620 | 3041 | 2.22 | 21.8 | -| c6gd.8xlarge | TBS disabled | 119800 | 119800 (100% sampling) | 1.44 | 0 | -| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 27620 | 2485 | 2.49 | 16.6 | -| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 46260 | 3909 | 2.43 | 25.8 | +| c6gd.xlarge | TBS disabled | 45120 | 45120 (100% sampling) | 0.95 | 0 | +| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 17120 | 1527 | 1.48 | 11.3 | +| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 19490 | 1661 | 1.48 | 12.3 | +| c6gd.2xlarge | TBS disabled | 63460 | 63460 (100% sampling) | 1.45 | 0 | +| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 26340 | 2248 | 2.09 | 17.8 | +| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 36620 | 3041 | 2.22 | 21.8 | +| c6gd.4xlarge | TBS disabled | 119800 | 119800 (100% sampling) | 1.44 | 0 | +| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 27620 | 2485 | 2.49 | 16.6 | +| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 46260 | 3909 | 2.43 | 25.8 | #### APM Server 8.19 | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6gd.2xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | -| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 30.81 | -| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 27.70 | -| c6gd.4xlarge | TBS disabled | 61900 | 61900 (100% sampling) | 1.45 | 0 | -| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 30.98 | -| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 50.99 | -| c6gd.8xlarge | TBS disabled | 122800 | 122800 (100% sampling) | 1.45 | 0 | -| c6gd.8xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 13280 | 34.20 | 22.61 | 32.01 | -| c6gd.8xlarge | TBS enabled, local NVMe SSD from c6gd instance | 35810 | 2480 | 30.41 | 86.86 | +| c6gd.xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | +| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 30.81 | +| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 27.70 | +| c6gd.2xlarge | TBS disabled | 61900 | 61900 (100% sampling) | 1.45 | 0 | +| c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 30.98 | +| c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 50.99 | +| c6gd.4xlarge | TBS disabled | 122800 | 122800 (100% sampling) | 1.45 | 0 | +| c6gd.4xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 13280 | 34.20 | 22.61 | 32.01 | +| c6gd.4xlarge | TBS enabled, local NVMe SSD from c6gd instance | 35810 | 2480 | 30.41 | 86.86 | When interpreting these numbers, note that: From 96c18d35b137986437f16ce335e82eaac64e94c1 Mon Sep 17 00:00:00 2001 From: Carson Date: Wed, 26 Nov 2025 18:50:09 +0800 Subject: [PATCH 14/14] Formatting --- solutions/observability/apm/transaction-sampling.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/solutions/observability/apm/transaction-sampling.md b/solutions/observability/apm/transaction-sampling.md index 654cb4a1f8..c7a764c274 100644 --- a/solutions/observability/apm/transaction-sampling.md +++ b/solutions/observability/apm/transaction-sampling.md @@ -168,9 +168,9 @@ Terminology: | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6gd.xlarge | TBS disabled | 45120 | 45120 (100% sampling) | 0.95 | 0 | -| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 17120 | 1527 | 1.48 | 11.3 | -| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 19490 | 1661 | 1.48 | 12.3 | +| c6gd.xlarge | TBS disabled | 45120 | 45120 (100% sampling) | 0.95 | 0 | +| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 17120 | 1527 | 1.48 | 11.3 | +| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 19490 | 1661 | 1.48 | 12.3 | | c6gd.2xlarge | TBS disabled | 63460 | 63460 (100% sampling) | 1.45 | 0 | | c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 26340 | 2248 | 2.09 | 17.8 | | c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 36620 | 3041 | 2.22 | 21.8 | @@ -182,9 +182,9 @@ Terminology: | EC2 instance size | TBS and disk configuration | Event ingestion rate (events/s) | Event indexing rate (events/s) | Memory usage (GB) | Disk usage (GB) | |-------------------|------------------------------------------------|---------------------------------|--------------------------------|-------------------|-----------------| -| c6gd.xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | -| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 30.81 | -| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 27.70 | +| c6gd.xlarge | TBS disabled | 45480 | 45480 (100% sampling) | 0.95 | 0 | +| c6gd.xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 11420 | 11.55 | 5.92 | 30.81 | +| c6gd.xlarge | TBS enabled, local NVMe SSD from c6gd instance | 12630 | 86.52 | 5.82 | 27.70 | | c6gd.2xlarge | TBS disabled | 61900 | 61900 (100% sampling) | 1.45 | 0 | | c6gd.2xlarge | TBS enabled, EBS gp3 volume with 3000 IOPS | 12920 | 37.31 | 11.31 | 30.98 | | c6gd.2xlarge | TBS enabled, local NVMe SSD from c6gd instance | 23300 | 574 | 13.31 | 50.99 |