New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different counts, documentation? #22
Comments
I was wondering the same exact thing... |
I do not understand the difference between stats and stats_counts. |
I'm in the same boat. I'm using a client increment function, but i see floats in the graph output for stats.mystat. stats_counts.mystat looks like a more sane value. I really would like to know what each of these is. |
From a quick look it seems like stats.counter normalises by the flush interval so it's a counts per second. While stats_counts is the total count received in the flushInterval (default 10s). It's a bit confusing since its not explained anywhere... |
I think the count under stats.timers (i.e. stats.timers.blah.count) is per flush interval. So you have to divide by 10 (the default) the get per second. |
From what I understand: Statsd sends different values for these 'counts': stats.timers.foo.count from issuing a "foo:1|ms" is the absolute count for a flushInterval However the stats count values differs from the counter values when displayed on graphite. Unlike timings. counters are constantly fed into graphite (with a 0 value if nothing happens). However graphite averages values it receives which can be a pain since it really should sum counts. Since it averages by default, a rate works fine since it would just be taking an average of an average. However how points get binned based on the storage schema at longer time ranges messes up with the absolute counts displayed by graphite. So basically with the two values graphed you get:
With stats.timers.foo.count what count you are measuring is slightly different since it doesn't send a 0 every flush interval. Since null values won't be used in the average you only divide by the number of timing measurements you took. So this would be
If you constantly time every interval this will equal counts/flushInterval however that is not always the case. tl;dr How to interpret graphite graphs:
This is just speculation from my usage so correct me if anything seems wrong =). |
@cwu - Your comments were helpful... Let's take the example of user logins. I want to log every time a user logs in. I'd like to be able to view the rate of logins/sec, as well as do aggregate rollups to see things like "how many users logged into the system, every hour, for the last 2 days". I'm unable to do the latter, and I think it's because whisper uses an average function to aggregate data. As a test, write a script that logs a stat every 1 second. View the stat_count.<stat_name> in graphite. Now apply the "Summarize" function to the data ( Apply Function -> Transform -> Summarize). I would expect that summarizing to 1min would result in a straight line at y=60. Instead I see a straight line at y=10. Looks like you can configure Whisper with a different rollup function... Might be worth testing out... http://readthedocs.org/docs/graphite/en/latest/whisper.html#rollup-aggregation |
And the new version of Graphite supports this... see this issue: |
Update: Here is what you need in your storage-aggregation.conf
Which means that everything in stats_counts will get summed when moved to larger retention periods. |
@recursify great info! Thanks. |
This seems resolved. Thanks folks. If there is still something unclear, please reopen the ticket. |
Am I crazy or @recursify 's comment should be written in huge letters in the README? |
An example with different aggregation methods is already in the README. |
yep, but it's not including the default 'pattern = ^stats_counts..*' |
This is why we built that. Stats aggregation as a service. Just use it and it works without configuration / deployment / maintenance. Though it would make sense to comment on this here since we are the biggest fans of statsd (depsite its limits namely the maintenance it incurs) and how it inspired and helped numbers of Devs solve their day to day problems! Of course, already sorry for squatting that closed discussion! |
So there are 3 different counts:
stats.mystat
stats.timers.mystat.count
stats_counts.mystat
What the difference between all of them? Quick look at the source code suggests the first two are "occurrences per second" and the latter is "occurrences per minute", but I'm not so well versed in Javascript or what Graphite expects to be sent.
Is that right? Can someone update the readme to explain the differences between different counts?
Thanks.
The text was updated successfully, but these errors were encountered: