New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support time based lag monitoring #452
Comments
I'd be interested. I've not looked at what burrow is storing, but if you have a consumer with an arbitrary offset in the past of X, burrow would at some point need to ask the broker what the timestamp is for the message stored at X, probably by fetching a message and looking at the timestamp. Having burrow do that for every topic and every partition it cares about would have a performance impact, but if it were configurable enough to not do it too frequently maybe it would be possible? We currently do that at the consumer level where it reports a metric periodically based on the timestamp of a sample of the messages it is processing. But it would be nice to be able to do that inside burrow, in a way that didn't add a ridiculous overhead. :) |
Running some back of the envelop calculations, I'm finding the following: Keeping an (offset(int64), timestamp (int64), partition (int32)) is 20 bytes. Let's assume a "big" cluster would go up to 200k partitions. That is ~4MB of offset/timestamp data for just the current state of a single cluster. Let's be generous, to find our ceiling, and assume topics have a retention of 3 weeks and scrape frequency of 30s, that means we will have 60,480 scrapes to cover the entire partition state. That implies roughly 240GB of scrape history to keep a rough history for all topics. If we move that down to just retaining 1 day of history (let's assume most consumers are up to date), that still implies 11.5GB. This is on top of all the heap that is already used to capture consumer state and history, which could vary in size, but I'm going to ballpark at 1GB. My understanding is that large GoLang heaps are on the order of ~20GB, to ensure reasonable pause times, but even that could be problematic. So lets try to keep it under 10GB for operator sanity's sake. Moving down to 6 hours we get 2.8GB. Probably enough to capture the p90-p99 lag rates. Some operators run multiple big clusters, and need to keep state for all of them. This starts to get tricky to support all of them and not blow up the heap. Let's go down an order of magnitude in partitions (~10k) and retention of a week - that covers probably 90% of use cases - we get to 4GB of history. That looks more reasonable. Does that match with what people are actually doing in real life? |
Keeping state seems like the only way to make this performant. You would have to read the message at the offset to get the timestamp (no key-only fetch request in Kafka), which is be brutal for clients reading old history. Maybe you start to get fancy with caching, etc. but that starts to become fraught and seemingly against the my perceived ethos of the project, where we can just go and look at the collected state at any time. |
Determining lag by time is a frequent discussion. On the face of it, it's not a bad idea. We may be able to use the existing data to interpolate the timestamps for offsets between stored broker offsets. However, for consumers that are behind, you need to store a lot more information. In addition, if your data is coming in on irregular intervals, there's a lot of problems with this approach. That said, it's worth investigating. I think someone needs to do a proof of concept here and show the code, and figure out what the additional storage looks like. |
FWIW we actually looked pretty far into this and it doesn't fit very well with how Burrow works, given how much pulling from different places in the topics you have to do to figure out the real lag. FWIW New Relic (unaffiliated) described the goal well
And internally my day job we ended up creating a separate application that uses burrow to bootstrap consumers and some state hints, but then goes and does that evaluation for each consumer group/partition (also what new relic seems to have done). I hope to open source the app sooner rather than later, but would happily drop if someone smarter than me (not hard) can figure out to shim this into Burrow. |
Any news about it? It would be very interesting to have a burrow module to expose kafka lag in timestamp. |
At my day job, i ended writing something to do this as a secondary application, since it would muck quite a bit with the burrow internals IIRC https://github.com/teslamotors/kafka-helmsman/tree/master/kafka_consumer_freshness_tracker I believe lightbend also made something similar https://github.com/lightbend/kafka-lag-exporter though it has a lot more complexity that IMHO is not necessary in 99% of deployments. |
Hi! First wanted to say thanks for all the great work done here! Made our lives as operators worlds easier. Unfortunately, success has breed more requirements.
Our users are interested in understanding consumer lag not from the offset behind, but actually how much real time the consumer is behind. This could be formulated two ways:
Understandably, this is something that is somewhat dependent on setting timestamps appropriately (or rather, allowing Kafka to set the timestamps).
Burrow already has a recent history of the partition from the broker (in fact, it has the offset and timestamp), so it should be able to figure out either case above. The former is probably more interesting, but the latter is easier to compute; you almost always know the most recent broker offset, but might not have an offset that covers the consumers' commit in the case of new consumers reading the beginning of the topic history.
Is this something you all would be interested in?
The text was updated successfully, but these errors were encountered: