metrics: fix kafka_max_offset on read replicas #16263

andrwng · 2024-01-24T00:43:25Z

Read replicas were previously translating their own local log offsets to
return `redpanda_kafka_max_offset` to the metrics endpoint. This is
different from how read replicas calculate the HWM when returning to the
Kafka endpoint, which just goes directly to cloud storage.

This adds the same read replica check that we have in the Kafka layer.

Also fixes the metric_sum filtering in ducktape, as this was required for the included test.

Fixes #16259

Backports Required

Release Notes

Bug Fixes

Fixes a bug that would previously cause read replicas to report the wrong value for the redpand_kafka_max_offset metric.

vbotbuildovich · 2024-01-24T03:00:02Z

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d392c-7c97-49bd-8552-274b3f1dc481:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode"

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d392c-7c9e-4bae-83be-9987280a72a2:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_basic_follower_fetching.read_from_object_store=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d392c-7c94-4fa9-b4b1-50954e766635:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_basic_follower_fetching.read_from_object_store=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d393d-6bea-43fc-91fa-3ddbced39eb6:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode"

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d393d-6bf5-4df2-a9ab-97276f8bd230:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_basic_follower_fetching.read_from_object_store=True"

new failures in https://buildkite.com/redpanda/redpanda/builds/44191#018d393d-6bf2-4a15-83df-4575771b5098:

"rptest.tests.follower_fetching_test.FollowerFetchingTest.test_basic_follower_fetching.read_from_object_store=False"

piyushredpanda · 2024-01-24T03:24:30Z

@andrwng Needs some test updates, it looks like.

Updates the filter used when collecting metrics samples. In some cases, the samples take the form: Sample(name='redpanda_kafka_max_offset', labels={'redpanda_namespace': 'kafka', 'redpanda_partition': '5', 'redpanda_topic': 'panda-topic'}, value=0.0, ...

Read replicas were previously translating their own local log offsets to return `redpanda_kafka_max_offset` to the metrics endpoint. This is different from how read replicas calculate the HWM when returning to the Kafka endpoint, which just goes directly to cloud storage. This adds the same read replica check that we have in the Kafka layer.

abhijat · 2024-01-24T07:02:46Z

tests/rptest/services/redpanda.py

+                    labels = sample.labels
+                    if ns:
+                        if "redpanda_namespace" in labels:
+                            if labels["redpanda_namespace"] != ns:
+                                continue
+                        elif "namespace" in labels:
+                            if labels["namespace"] != ns:
+                                continue
+                        else:
+                            assert False, f"Missing namespace label: {sample}"
+                    if topic:
+                        if "redpanda_topic" in labels:
+                            if labels["redpanda_topic"] != topic:
+                                continue
+                        elif "topic" in labels:
+                            if labels["topic"] != topic:
+                                continue
+                        else:
+                            assert False, f"Missing topic label: {sample}"


nit: these checks are a little hard to read, it looks like they could be simplified a bit and collapsed?

Agreed it's a bit ugly. I spent some time earlier trying to clean it up, but couldn't come to anything simpler. Open to suggestions on how to improve it

Actually maybe I'll do this in a follow up, if it works:

if ns: assert "kafka_namespace" in labels or "namespace" in labels, f"Missing namespace" if labels.get("kafka_namespace", labels.get("namespace")) != ns continue

Done here #16277

vbotbuildovich · 2024-01-24T08:19:40Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44199#018d3a42-e5d6-4e16-b602-29b4fb055df7

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44199#018d3a42-e5dd-4cde-a82b-bc83e4b703c6

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44212#018d3b97-0506-43d2-b2d6-e350e21003ba

vbotbuildovich · 2024-01-24T17:09:05Z

/backport v23.3.x

vbotbuildovich · 2024-01-24T17:09:06Z

/backport v23.2.x

vbotbuildovich · 2024-01-24T17:09:07Z

/backport v23.1.x

vbotbuildovich · 2024-01-24T17:10:00Z

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16263-v23.1.x-69 remotes/upstream/v23.1.x
git cherry-pick -x e567d6387e1c6b37674328ef355d029c30511434 7e33cc70ff4d0bfd59ed16849177480b79968c4a

Workflow run logs.

vbotbuildovich · 2024-01-24T17:10:03Z

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16263-v23.2.x-82 remotes/upstream/v23.2.x
git cherry-pick -x e567d6387e1c6b37674328ef355d029c30511434 7e33cc70ff4d0bfd59ed16849177480b79968c4a

Workflow run logs.

Follow-up to e567d63 (redpanda-data#16263) to simplify the metrics_sum filtering.

github-actions bot added the area/redpanda label Jan 24, 2024

andrwng requested review from Lazin, dotnwat and abhijat January 24, 2024 00:43

andrwng self-assigned this Jan 24, 2024

piyushredpanda added this to the v23.2.24 milestone Jan 24, 2024

andrwng force-pushed the cloud-storage-rrr-metric-hwm branch from 6fb81bb to 7a09ddd Compare January 24, 2024 01:01

andrwng added 2 commits January 23, 2024 21:50

andrwng force-pushed the cloud-storage-rrr-metric-hwm branch from 7a09ddd to 7e33cc7 Compare January 24, 2024 05:50

abhijat reviewed Jan 24, 2024

View reviewed changes

abhijat approved these changes Jan 24, 2024

View reviewed changes

Lazin approved these changes Jan 24, 2024

View reviewed changes

andrwng merged commit 86c0d1d into redpanda-data:dev Jan 24, 2024
18 checks passed

This was referenced Jan 24, 2024

[v23.3.x] read replica partitions report incorrect kafka max offset #16271

Closed

[v23.3.x] metrics: fix kafka_max_offset on read replicas #16272

Merged

vbotbuildovich mentioned this pull request Jan 24, 2024

[v23.1.x] metrics: fix kafka_max_offset on read replicas #16273

Closed

vbotbuildovich mentioned this pull request Jan 24, 2024

[v23.2.x] metrics: fix kafka_max_offset on read replicas #16274

Closed

This was referenced Jan 24, 2024

[v23.2.x] metrics: fix kafka_max_offset on read replicas #16275

Merged

[v23.1.x] metrics: fix kafka_max_offset on read replicas #16276

Open

andrwng added a commit to andrwng/redpanda that referenced this pull request Jan 24, 2024

rptest: simplify metric_sum filtering

4c7ce4d

Follow-up to e567d63 (redpanda-data#16263) to simplify the metrics_sum filtering.

andrwng mentioned this pull request Jan 24, 2024

rptest: simplify metric_sum filtering #16277

Merged

7 tasks

andrwng added a commit to andrwng/redpanda that referenced this pull request Jan 24, 2024

rptest: simplify metric_sum filtering

4bb328b

Follow-up to e567d63 (redpanda-data#16263) to simplify the metrics_sum filtering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: fix kafka_max_offset on read replicas #16263

metrics: fix kafka_max_offset on read replicas #16263

andrwng commented Jan 24, 2024 •

edited

Loading

vbotbuildovich commented Jan 24, 2024 •

edited

Loading

piyushredpanda commented Jan 24, 2024

abhijat Jan 24, 2024

andrwng Jan 24, 2024

andrwng Jan 24, 2024

andrwng Jan 24, 2024

vbotbuildovich commented Jan 24, 2024 •

edited

Loading

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

metrics: fix kafka_max_offset on read replicas #16263

metrics: fix kafka_max_offset on read replicas #16263

Conversation

andrwng commented Jan 24, 2024 • edited Loading

Backports Required

Release Notes

Bug Fixes

vbotbuildovich commented Jan 24, 2024 • edited Loading

piyushredpanda commented Jan 24, 2024

abhijat Jan 24, 2024

Choose a reason for hiding this comment

andrwng Jan 24, 2024

Choose a reason for hiding this comment

andrwng Jan 24, 2024

Choose a reason for hiding this comment

andrwng Jan 24, 2024

Choose a reason for hiding this comment

vbotbuildovich commented Jan 24, 2024 • edited Loading

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

vbotbuildovich commented Jan 24, 2024

andrwng commented Jan 24, 2024 •

edited

Loading

vbotbuildovich commented Jan 24, 2024 •

edited

Loading

vbotbuildovich commented Jan 24, 2024 •

edited

Loading