Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumergroup tests and refactoring #53

Closed
wants to merge 14 commits into from

Conversation

MrTrustworthy
Copy link
Contributor

No description provided.

@MrTrustworthy
Copy link
Contributor Author

I need some input regarding the output of esque describe consumergroup x. I was thinking about having a "simple summary" description per default, and only providing -o yaml etc outputs in case you want to get the full info. Not sure if that's a good idea, but in the spirit of "Kafka Ops for humans" I didn't want to just dump out all info per default.

Short summary currently looks like this:

image

@hfjn
Copy link
Contributor

hfjn commented Sep 18, 2019

@MrTrustworthy I love to make the describe commands more approachable. But we should agree on a level of verbosity that is consistent for all our commands.

@hfjn hfjn mentioned this pull request Sep 18, 2019
@MrTrustworthy
Copy link
Contributor Author

@hfjn exactly why I posted that here :D

Consider the screenshot as my proposal for a level of verbosity for describe. Higher verbosity is provided when you specify -o yaml, which will give you all details.

@hfjn
Copy link
Contributor

hfjn commented Sep 18, 2019

@MrTrustworthy I feel like it doesn't need to be as "speaking" as in your example. I think that might be a pain to maintain for all commands.

@MrTrustworthy
Copy link
Contributor Author

Proposal:

ConsumerGroup pia-mabaya-unit
        active members: 0
        topics: ['shop-product_export_unit'] 
        partitions: 17
        offsets:
               min: 894
               avg: 1908.7
               max: 4052
               total: 32436
        total lag: 90920

@swenzel
Copy link
Contributor

swenzel commented Sep 19, 2019

I'm not sure if min/max/avg over offsets is a useful metric. You'll have to know the watermarks of the corresponding topics for those metrics to be meaningful. I'd propose to use the current relative position in percent between low and high watermark. I.e ((offset-low_water)/(high_water-low_water))*100. This way you can immediately see where the consumer is within the topic. You can then calculate the min/max/avg over those percentages.
Apart from that I also like the less verbose yaml like output 👍

@MrTrustworthy
Copy link
Contributor Author

I really like that idea! Not sure how to call that property yet, but something like Consumergroup Progress: 99.7% seems like a good, compact piece of information.

Still, a relative measure will have to be interpreted differently depending on the total size of the topic. 98% on a topic with 100 messages isn't cause for worry, but on a topic with 100 billion entries over multiple years, it represents a considerable amount of lag. So I'd say we have to pair at least one absolute metric in addition to the progress. Either total lag or topic size (aka sum(high_watermarks)) would be a good choice, or both.

@swenzel
Copy link
Contributor

swenzel commented Sep 19, 2019

Agreed. I'd also keep the lag. Maybe do min/max/avg on that, too. Then you can see if it's only for a few partitions or for all of them.
Could also make sense to do the statistics per topic, since message counts may vary significantly.

@MrTrustworthy
Copy link
Contributor Author

Allright, I'll play around a bit and try to find a combination that makes the most sense when viewed by a user.

One note about the progress calculation (offset-low_water)/(high_water-low_water))*100: I'm not sure how I feel about the inclusion of low_water in the calculation. On one hand, it gives a more precise piece of info because progress is always based on alive messages in the topic. On the other hand, everytime the topic gets retented/compacted, it will cause your progress to move back, which is kinda counter intuitive.

@swenzel
Copy link
Contributor

swenzel commented Sep 19, 2019

I see what you mean...
Not including the low_watermark, however, will lead to very misleading results on very busy topics with retention. Consider low_watermark = 100k and high_watermark=101k.

Maybe progress is the wrong description then... how about relative_offset_location?

We could also invert the calculation (high_water-offset)/(high_water-low_water))*100 and call it relative_lag.
Color coding would also be nice: >=100% (critical, data loss), >=95% (warning, loss imminent), <95% (okay).

Copy link
Contributor

@swenzel swenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, waiting for an update to the pretty_consumergroup_simple_overview, then everything is fine 👍

@hfjn
Copy link
Contributor

hfjn commented Sep 27, 2019

Added new output formatting and color-coded output. Anyone up for a review? @MrTrustworthy @swenzel @garrettthomaskth
image

@hfjn
Copy link
Contributor

hfjn commented Sep 27, 2019

Ah that screenshot has a faked 100.00% lag to show of the color. :P

@hfjn
Copy link
Contributor

hfjn commented Feb 15, 2020

I will close this for now since there haven’t been any updates in months. 🙂

@hfjn hfjn closed this Feb 15, 2020
@swenzel swenzel deleted the consumergroup-tests-and-refactoring branch April 7, 2020 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants