Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side histograms. #480

Closed
beorn7 opened this Issue Jan 27, 2015 · 17 comments

Comments

Projects
None yet
3 participants
@beorn7
Copy link
Member

beorn7 commented Jan 27, 2015

The client-side summaries cannot be aggregated in a meaningful way. To do that, we need the clients to send just the buckets, and the actual histogram generation has to happen on the server. We need a new function for that (quantile or something), and a new metric type (let's call it histogram).

We might want to consider fancy algorithms to efficiently manage aggregatable histograms. But it is quite possible we'll end up with the trivial bucket approach, simply because they are easy to handle. (rate to determine the expiration period, sum to aggregate, ...)

Buckets will be configurable (linear, logarithmic, explicitly set).

@beorn7 beorn7 self-assigned this Jan 27, 2015

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 27, 2015

I'm planning on doing this. I'm thinking the simple bucket approach.

@mwitkow

This comment has been minimized.

Copy link
Contributor

mwitkow commented Feb 3, 2015

We're currently considering writing a bucketed-version of summary. Basically:

mydata_more_0{arbitrary_label="arbitrary_value"} 400
mydata_more_4{arbitrary_label="arbitrary_value"} 300 // number of entries more than 4ms
mydata_more_16{arbitrary_label="arbitrary_value"} 250
mydata_more_32{arbitrary_label="arbitrary_value"} 150
mydata_more_64{arbitrary_label="arbitrary_value"} 100
mydata_more_128{arbitrary_label="arbitrary_value"} 10

This would make it super easy to:

  • draw a fraction serie: Basically (mydata_more_128 - mydata_more_64) / mydata_more_0 would give you the fraction of requests with mydata in [64, 128) range
  • makes it easy to alert on critical thresholds mydata_more_32 / my_data_more_0 gives you a fraction of requests exceeding 32

All this with a bucketer set in the client code. It doesn't give you pretty quantile that seem to be all the range, but is very functional.. if the distribution of your data doesn't change over time (which is extremely rare)

I thought about implementing this in the Java simple_client and use custom console and alerting rules, but I don't want to step ahead of the line if this will be implemented nicer in the backend.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

Yeah, we should probably get this done soon, before everybody implements their own work-around... :)

@mwitkow-io Yes, that's exactly the idea. We just want it as a clean data type in the protobuf format, and then have a couple of server-side function in the expression language to get quantiles.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 3, 2015

I'll likely be implementing this within the next 2 weeks. I'm planning on having the bucket value as a label.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

Let's coordinate the ideas before implementing. I have started a document:
https://docs.google.com/a/soundcloud.com/document/d/1uSenXRDjDaJLV3qnSD09GqgPdEEDPjER0mVsnGaCYF0/edit?usp=sharing

Please comment. There is certainly a lot of things to clarify.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

OK, after the dust has settled, there is a cleaned-up version of the design doc (let's call it that by now... :)

@brian-brazil Have another look. And @juliusv now it might be mature enough for your consumption.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 3, 2015

This generally looks good to me.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

Cool. So whoever grabs part of the work outlined in the doc should mention it here (to avoid redundant work).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 3, 2015

I presume we'll need a 0.0.5 exposition format, compatible with 0.0.4?

I've got the java simpleclient anyway.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 3, 2015

Yeah, I presume so.... we should also come up with 1.0.0 at some point. :)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 12, 2015

prometheus/client_golang#76 has all the infrastructure work

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 12, 2015

I'm working on the java simpleclient now.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 12, 2015

I'm working on client/golang exposition (package prometheus).

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 19, 2015

The only thing missing now is the histogram_quantile function. I'm currently working on it.

(Not yet quantile and quantile_over_time, which has been requested by some, too. But it is unrelated, see design doc.)

And then of course support in other libraries... but Go and JVM is there. Which means you can do APDEX already now.

@beorn7

This comment has been minimized.

Copy link
Member Author

beorn7 commented Feb 23, 2015

I'm closing this and create a separate issue for quantile and quantile_over_time.

@beorn7 beorn7 closed this Feb 23, 2015

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.