Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v21.11.x] kafka: probe to send group offset to prometheus #3440

Merged
merged 9 commits into from
Jan 17, 2022

Conversation

ZeDRoman
Copy link
Contributor

@ZeDRoman ZeDRoman commented Jan 11, 2022

Cover letter

Backport of #3181 , #3462 , #3482

To monitor group status we need to push offset for each topic partition to prometheus
According to topic partition offset we can calculate group lag

Fixes: #1275

Release notes

kafka server send group topic partition offset metric to prometheus

To monitor group status we need to push
offset for each topic partition to prometheus
Added probe to send topic partition offset

(cherry picked from commit 9c4ae20)
Added group seek command support to rpk ducktape wrapper

(cherry picked from commit 378ebf7)
ducktape redpanda service wrapper metrics_sample returns None
if none of the samples matches pattern.

(cherry picked from commit 5f9725c)
Added ducktape tests for checking group offsets
in metrics

(cherry picked from commit 66c2131)
@ZeDRoman ZeDRoman changed the title kafka: probe to send group offset to prometheus [v21.11.x] kafka: probe to send group offset to prometheus Jan 11, 2022
@jcsp
Copy link
Contributor

jcsp commented Jan 11, 2022

Would be good to investigate test failures on dev before backporting this, to avoid backporting any test instability.

@ivotron ivotron added this to the v21.11.3 milestone Jan 11, 2022
@jcsp
Copy link
Contributor

jcsp commented Jan 14, 2022

@ZeDRoman please could you pull the commits from #3482 into this backport?

Same for #3394 once the PR for that merges to dev.

It seems that leadership rebalancing was causing some of the rpk
operations to be a bit flaky. There may be an opportunity here to add
some retrying into rpk/franz-go.

rptest.clients.rpk.RpkException: RpkException<command
/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-07ad15334ccea297e-1/vectorized/redpanda/vbuild/debug/clang/dist/local/redpanda/bin/rpk
group --brokers docker_n_14:9092,docker_n_6:9092,docker_n_5:9092 seek g2
--to start returned 1, output: , error: unable to list all offsets
successfully: NOT_LEADER_FOR_PARTITION: This server is not the leader
for that topic-partition.

Fixes: redpanda-data#3443

Signed-off-by: Noah Watkins <noah@vectorized.io>
(cherry picked from commit 54a8589)
After deleting a topic there may be a delay before the group manager on
a broker removes the in-memory state. Add a retry so that we can
tolerate these windows.

Signed-off-by: Noah Watkins <noah@vectorized.io>
(cherry picked from commit 101445e)
Signed-off-by: Noah Watkins <noah@vectorized.io>
(cherry picked from commit 74fcd0b)
Signed-off-by: Noah Watkins <noah@vectorized.io>
(cherry picked from commit d2f6779)
Some timeouts have been occuring in the group metrics leadership
transfer test. The timeout was occuring on a conjunction condition so it
wasn't clear what was timing out. This patch splits those two waits so
we can have finer grained information about what is failing. Also adding
some logging for additional context.

Signed-off-by: Noah Watkins <noah@vectorized.io>
(cherry picked from commit 27349be)
Copy link
Contributor

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeDRoman ZeDRoman merged commit 8a997c5 into redpanda-data:v21.11.x Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants