write caching - raft - metrics #17179

bharathv · 2024-03-19T00:23:21Z

Adds a couple of simple metrics for visibility into write caching and raft in general

vectorized_raft_replicate_ack_all_requests_no_flush - # of quorum ack requests without flush (aka with write caching)
vectorized_raft_batch_size_bytes - a histogram of batch sizes as flushed by the replicate batcher, gives visibility into batching at raft layer.

Both are internal metrics at partition scope.

Backports Required

Release Notes

Improvements

Adds metric for number of quorum ack requests with write caching.

src/v/raft/probe.cc

vbotbuildovich · 2024-03-19T02:36:16Z

new failures in https://buildkite.com/redpanda/redpanda/builds/46415#018e5449-9ae8-4d83-ab4d-e32efc692a3a:

"rptest.tests.raft_availability_test.RaftAvailabilityTest.test_one_node_down"

new failures in https://buildkite.com/redpanda/redpanda/builds/46415#018e545a-543c-40be-a59c-79faa6c1fd15:

"rptest.tests.raft_availability_test.RaftAvailabilityTest.test_one_node_down"

new failures in https://buildkite.com/redpanda/redpanda/builds/46900#018e80f6-6c3a-4ea1-bc08-90cc59b9502a:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.S3"

vbotbuildovich · 2024-03-19T06:04:15Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46424#018e5506-0524-4f39-8ca7-f0f56ab82c04

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46424#018e5506-0528-4cf4-9d21-cc90302576f1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46424#018e5518-064c-4684-b8d7-297a9d821396

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46900#018e80f6-6c40-448d-9bbb-25bcbb2355ab

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46900#018e80f2-157f-4026-a2a5-ba11e73c21b2

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47092#018e8bee-a99a-4e13-8434-931e6727e865

bharathv · 2024-03-21T01:49:16Z

failure: #16561 unrelated.

dotnwat

lgtm

dotnwat · 2024-03-23T18:13:36Z

src/v/raft/replicate_batcher.cc

                for (auto& b : batches) {
+                    total_batch_size += b.size_bytes();


it might be worth a note in the commit message or here about the difference between "batch"-size and batching efficiency. I think we are measuring how to the batcher accumulates things, but it's a little ambiguous since it is also counting batch-size.

Tried to clarify a bit (renamed metric too), lmk if that makes it better.

but it's a little ambiguous since it is also counting batch-size.

I'm trying to measure how efficient batching is by measuring the size (in bytes) of a batch as that translates to size of a single append. So larger byte size batches translate to larger writes, this was the thought process. Added this info in the commit message.

ztlpn · 2024-03-29T13:35:54Z

src/v/raft/probe.cc

@@ -156,6 +162,14 @@ void probe::setup_metrics(const model::ntp& ntp) {
          [this] { return _full_heartbeat_requests; },
          sm::description("Number of full heartbeats sent by the leader"),
          labels),
+        sm::make_histogram(


I remember there have been warnings about adding new per-ntp histograms because they put undue stress on the metrics-collecting infrastructure (even when aggregated across partitions). As this is more of a performance diagnostics metric, maybe we can get by with a single shard-local histogram without per-ntp granularity?

ok, removing this for now as discussed offline. Perhaps would be nice to have a mode that a user can enable on the fly (temporarily) to collect more granular metrics.

bharathv · 2024-03-29T19:00:03Z

/ci-repeat 1

bharathv requested a review from mmaslankaprv March 19, 2024 00:23

github-actions bot added the area/redpanda label Mar 19, 2024

bharathv requested a review from ztlpn March 19, 2024 00:23

bharathv commented Mar 19, 2024

View reviewed changes

src/v/raft/probe.cc Outdated Show resolved Hide resolved

bharathv self-assigned this Mar 19, 2024

bharathv force-pushed the wc_p3 branch from cb83668 to c811613 Compare March 19, 2024 03:48

dotnwat previously approved these changes Mar 23, 2024

View reviewed changes

bharathv added 2 commits March 27, 2024 08:09

raft: clarify replicate_ack_all_requests

1b513f7

raft/metrics: add replicate_ack_all_requests_no_flush

8e1121f

bharathv dismissed dotnwat’s stale review via e6045e9 March 27, 2024 16:13

bharathv force-pushed the wc_p3 branch from c811613 to e6045e9 Compare March 27, 2024 16:13

bharathv requested a review from dotnwat March 27, 2024 16:13

ztlpn reviewed Mar 29, 2024

View reviewed changes

raft/tests: add a simple test for write caching metric

6dc25b9

bharathv force-pushed the wc_p3 branch from e6045e9 to 6dc25b9 Compare March 29, 2024 15:41

bharathv requested a review from ztlpn March 29, 2024 15:42

ztlpn approved these changes Mar 29, 2024

View reviewed changes

bharathv merged commit d4b24d5 into redpanda-data:dev Mar 29, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write caching - raft - metrics #17179

write caching - raft - metrics #17179

bharathv commented Mar 19, 2024 •

edited

vbotbuildovich commented Mar 19, 2024 •

edited

vbotbuildovich commented Mar 19, 2024 •

edited

bharathv commented Mar 21, 2024

dotnwat left a comment

dotnwat Mar 23, 2024

bharathv Mar 27, 2024

ztlpn Mar 29, 2024

bharathv Mar 29, 2024

bharathv commented Mar 29, 2024

		for (auto& b : batches) {
		total_batch_size += b.size_bytes();

write caching - raft - metrics #17179

write caching - raft - metrics #17179

Conversation

bharathv commented Mar 19, 2024 • edited

Backports Required

Release Notes

Improvements

vbotbuildovich commented Mar 19, 2024 • edited

vbotbuildovich commented Mar 19, 2024 • edited

bharathv commented Mar 21, 2024

dotnwat left a comment

Choose a reason for hiding this comment

dotnwat Mar 23, 2024

Choose a reason for hiding this comment

bharathv Mar 27, 2024

Choose a reason for hiding this comment

ztlpn Mar 29, 2024

Choose a reason for hiding this comment

bharathv Mar 29, 2024

Choose a reason for hiding this comment

bharathv commented Mar 29, 2024

bharathv commented Mar 19, 2024 •

edited

vbotbuildovich commented Mar 19, 2024 •

edited

vbotbuildovich commented Mar 19, 2024 •

edited