Skip to content
Newer
Older
100644 267 lines (184 sloc) 9.48 KB
6440e22 @mrtazz add travis ci build status image to README
mrtazz authored Apr 14, 2012
1 StatsD [![Build Status](https://secure.travis-ci.org/etsy/statsd.png)](http://travis-ci.org/etsy/statsd)
75097d6 @kastner adding a basic readme
kastner authored Dec 29, 2010
2 ======
3
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
4 A network daemon that runs on the [Node.js][node] platform and
5 listens for statistics, like counters and timers, sent over [UDP][udp]
6 and sends aggregates to one or more pluggable backend services (e.g.,
7 [Graphite][graphite]).
a0636b8 @kastner Edited README.md via GitHub
kastner authored Feb 16, 2011
8
9 We ([Etsy][etsy]) [blogged][blog post] about how it works and why we created it.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
10
11
12 Concepts
13 --------
14
15 * *buckets*
458594b @yayitswei Update README.md
yayitswei authored Jan 5, 2012
16 Each stat is in its own "bucket". They are not predefined anywhere. Buckets can be named anything that will translate to Graphite (periods make folders, etc)
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
17
18 * *values*
4e4ed79 @mrtazz update README mentioning to preferably use ints as values
mrtazz authored Jun 8, 2012
19 Each stat will have a value. How it is interpreted depends on modifiers. In
20 general values should be integer.
845b294 @davedash Fixed a typo in the README.
davedash authored Apr 8, 2011
21
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
22 * *flush*
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
23 After the flush interval timeout (default 10 seconds), stats are
24 aggregated and sent to an upstream backend service.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
25
26 Counting
27 --------
28
29 gorets:1|c
30
99b8a15 @jcsalterego Documented some more config variables in exampleConfig.js and README
jcsalterego authored May 26, 2011
31 This is a simple counter. Add 1 to the "gorets" bucket. It stays in memory until the flush interval `config.flushInterval`.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
32
33
34 Timing
35 ------
36
37 glork:320|ms
38
4e4ed79 @mrtazz update README mentioning to preferably use ints as values
mrtazz authored Jun 8, 2012
39 The glork took 320ms to complete this time. StatsD figures out 90th percentile,
40 average (mean), lower and upper bounds for the flush interval. The percentile
41 threshold can be tweaked with `config.percentThreshold`.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
42
4e4ed79 @mrtazz update README mentioning to preferably use ints as values
mrtazz authored Jun 8, 2012
43 The percentile threshold can be a single value, or a list of values, and will
44 generate the following list of stats for each threshold:
dd789c7 Changed config.percentThreshold to also accept a list of percentiles.
Neil Hooey authored Mar 7, 2012
45
4e4ed79 @mrtazz update README mentioning to preferably use ints as values
mrtazz authored Jun 8, 2012
46 stats.timers.$KEY.mean_$PCT stats.timers.$KEY.upper_$PCT
dd789c7 Changed config.percentThreshold to also accept a list of percentiles.
Neil Hooey authored Mar 7, 2012
47
4e4ed79 @mrtazz update README mentioning to preferably use ints as values
mrtazz authored Jun 8, 2012
48 Where `$KEY` is the key you stats key you specify when sending to statsd, and
49 `$PCT` is the percentile threshold.
dd789c7 Changed config.percentThreshold to also accept a list of percentiles.
Neil Hooey authored Mar 7, 2012
50
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
51 Sampling
52 --------
53
54 gorets:1|c|@0.1
55
3b07b5a @avleen Typo in README.md
avleen authored Apr 1, 2011
56 Tells StatsD that this counter is being sent sampled every 1/10th of the time.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
57
79a4046 @mrtazz add gauges description to README
mrtazz authored Apr 14, 2012
58 Gauges
59 ------
60 StatsD now also supports gauges, arbitrary values, which can be recorded.
61
62 gaugor:333|g
63
c361255 @mrtazz explain metrics batch send in the README
mrtazz authored Jun 29, 2012
64 All metrics can also be batch send in a single UDP packet, separated by a
65 newline character.
66
99b8a15 @jcsalterego Documented some more config variables in exampleConfig.js and README
jcsalterego authored May 27, 2011
67 Debugging
68 ---------
69
70 There are additional config variables available for debugging:
71
72 * `debug` - log exceptions and periodically print out information on counters and timers
73 * `debugInterval` - interval for printing out information on counters and timers
74 * `dumpMessages` - print debug info on incoming messages
75
76 For more information, check the `exampleConfig.js`.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
77
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
78 Supported Backends
79 ------------------
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
80
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
81 StatsD supports multiple, pluggable, backend modules that can publish
82 statistics from the local StatsD daemon to a backend service or data
83 store. Backend services can retain statistics for
84 longer durations in a time series data store, visualize statistics in
85 graphs or tables, or generate alerts based on defined thresholds. A
86 backend can also correlate statistics sent from StatsD daemons running
87 across multiple hosts in an infrastructure.
75097d6 @kastner adding a basic readme
kastner authored Dec 30, 2010
88
679f222 @mrtazz update README to how backends as npm packages work
mrtazz authored May 14, 2012
89 StatsD includes the following backends:
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
90
91 * [Graphite][graphite] (`graphite`): Graphite is an open-source
92 time-series data store that provides visualization through a
93 web-browser interface.
7a5bf89 @mixu Update README.md
mixu authored May 8, 2012
94 * Console (`console`): The console backend outputs the received
95 metrics to stdout (e.g. for seeing what's going on during development).
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
96
97 By default, the `graphite` backend will be loaded automatically. To
98 select which backends are loaded, set the `backends` configuration
679f222 @mrtazz update README to how backends as npm packages work
mrtazz authored May 15, 2012
99 variable to the list of backend modules to load.
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
100
679f222 @mrtazz update README to how backends as npm packages work
mrtazz authored May 15, 2012
101 Backends are just npm modules which implement the interface described in
102 section *Backend Interface*. In order to be able to load the backend, add the
103 module name into the `backends` variable in your config. As the name is also
104 used in the `require` directive, you can load one of the provided backends by
105 giving the relative path (e.g. `./backends/graphite`).
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
106
107 Graphite Schema
108 ---------------
d5c62bd @kastner adding link to the flickr post about their StatsD
kastner authored Feb 12, 2011
109
c14677e @kastner Add the schema config for stats.* in graphite
kastner authored Feb 18, 2011
110 Graphite uses "schemas" to define the different round robin datasets it houses (analogous to RRAs in rrdtool). Here's what Etsy is using for the stats databases:
111
112 [stats]
845b294 @davedash Fixed a typo in the README.
davedash authored Apr 8, 2011
113 priority = 110
c14677e @kastner Add the schema config for stats.* in graphite
kastner authored Feb 19, 2011
114 pattern = ^stats\..*
115 retentions = 10:2160,60:10080,600:262974
116
3009ed8 @kastner more info on our graphite settings for stats.*
kastner authored Feb 18, 2011
117 That translates to:
ed5a1ed @kastner Damn you markdown!
kastner authored Feb 18, 2011
118
3009ed8 @kastner more info on our graphite settings for stats.*
kastner authored Feb 19, 2011
119 * 6 hours of 10 second data (what we consider "near-realtime")
120 * 1 week of 1 minute data
121 * 5 years of 10 minute data
122
123 This has been a good tradeoff so far between size-of-file (round robin databases are fixed size) and data we care about. Each "stats" database is about 3.2 megs with these retentions.
c14677e @kastner Add the schema config for stats.* in graphite
kastner authored Feb 19, 2011
124
d2c7540 Updated readme with details on tcp management interface
Marcus Barczak authored Oct 11, 2011
125 TCP Stats Interface
126 -------------------
127
128 A really simple TCP management interface is available by default on port 8126 or overriden in the configuration file. Inspired by the memcache stats approach this can be used to monitor a live statsd server. You can interact with the management server by telnetting to port 8126, the following commands are available:
129
130 * stats - some stats about the running server
131 * counters - a dump of all the current counters
132 * timers - a dump of the current timers
133
134 The stats output currently will give you:
135
136 * uptime: the number of seconds elapsed since statsd started
137 * messages.last_msg_seen: the number of elapsed seconds since statsd received a message
138 * messages.bad_lines_seen: the number of bad lines seen since startup
139
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
140 Each backend will also publish a set of statistics, prefixed by its
141 module name.
142
143 Graphite:
144
145 * graphite.last_flush: the number of seconds elapsed since the last successful flush to graphite
80374f3 @mheffner Fix botched cut-and-paste.
mheffner authored Apr 3, 2012
146 * graphite.last_exception: the number of seconds elapsed since the last exception thrown whilst flushing to graphite
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
147
72ac121 @ickymettle mention the nagios check exists in the README
ickymettle authored Feb 11, 2012
148 A simple nagios check can be found in the utils/ directory that can be used to check metric thresholds, for example the number of seconds since the last successful flush to graphite.
149
3adbb3d @s0enke added simple install / config instructions
s0enke authored Apr 2, 2011
150 Installation and Configuration
151 ------------------------------
152
153 * Install node.js
154 * Clone the project
155 * Create a config file from exampleConfig.js and put it somewhere
156 * Start the Daemon:
157
158 node stats.js /path/to/config
159
766876b @wickedchicken Add basic test framework
wickedchicken authored Jan 10, 2012
160 Tests
161 -----
162
163 A test framework has been added using node-unit and some custom code to start and manipulate statsd. Please add tests under test/ for any new features or bug fixes encountered. Testing a live server can be tricky, attempts were made to eliminate race conditions but it may be possible to encounter a stuck state. If doing dev work, a `killall node` will kill any stray test servers in the background (don't do this on a production machine!).
164
165 Tests can be executd with `./run_tests.sh`.
3adbb3d @s0enke added simple install / config instructions
s0enke authored Apr 2, 2011
166
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
167 Backend Interface
168 -----------------
169
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
170 Backend modules are Node.js [modules][nodemods] that listen for a
171 number of events emitted from StatsD. Each backend module should
172 export the following initialization function:
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
173
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
174 * `init(startup_time, config, events)`: This method is invoked from StatsD to
175 initialize the backend module. It accepts three parameters:
176 `startup_time` is the startup time of StatsD in epoch seconds,
177 `config` is the parsed config file hash, and `events` is the event
178 emitter that backends can use to listen for events.
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
179
180 The backend module should return `true` from init() to indicate
181 success. A return of `false` indicates a failure to load the module
182 (missing configuration?) and will cause StatsD to exit.
183
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
184 Backends can listen for the following events emitted by StatsD from
185 the `events` object:
186
187 * Event: **'flush'**
188
189 Parameters: `(time_stamp, metrics)`
190
191 Emitted on each flush interval so that backends can push aggregate
192 metrics to their respective backend services. The event is passed
193 two parameters: `time_stamp` is the current time in epoch seconds
194 and `metrics` is a hash representing the StatsD statistics:
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
195
196 ```
197 metrics: {
198 counters: counters,
199 gauges: gauges,
200 timers: timers,
201 pctThreshold: pctThreshold
202 }
203 ```
204
205 Each backend module is passed the same set of statistics, so a
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
206 backend module should treat the metrics as immutable
207 structures. StatsD will reset timers and counters after each
208 listener has handled the event.
209
210 * Event: **'status'**
211
212 Parameters: `(writeCb)`
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
213
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
214 Emitted when a user invokes a *stats* command on the management
215 server port. It allows each backend module to dump backend-specific
216 status statistics to the management port.
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
217
218 The `writeCb` callback function has a signature of `f(error,
fe91370 @mheffner Switch the flush and status commands to an event listener model.
mheffner authored Apr 4, 2012
219 backend_name, stat_name, stat_value)`. The backend module should
220 invoke this method with each stat_name and stat_value that should be
221 sent to the management port. StatsD will prefix each stat name with
222 the `backend_name`. The backend should set `error` to *null*, or, in
223 the case of a failure, an appropriate error.
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
224
d5c62bd @kastner adding link to the flickr post about their StatsD
kastner authored Feb 13, 2011
225 Inspiration
226 -----------
227
228 StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a post where Cal Henderson described it in depth:
1b4d284 @kastner Adding a pointer to the "other" StatsD
kastner authored Feb 12, 2011
229 [Counting and timing](http://code.flickr.com/blog/2008/10/27/counting-timing/). Cal re-released the code recently: [Perl StatsD](https://github.com/iamcal/Flickr-StatsD)
d5c62bd @kastner adding link to the flickr post about their StatsD
kastner authored Feb 13, 2011
230
2dff77f @mrtazz update README
mrtazz authored Jun 18, 2012
231 Meta
232 ---------
233 - IRC channel: `#statsd` on freenode
234 - Mailing list: `statsd@librelist.com`
235
a0636b8 @kastner Edited README.md via GitHub
kastner authored Feb 17, 2011
236
237 Contribute
238 ---------------------
239
240 You're interested in contributing to StatsD? *AWESOME*. Here are the basic steps:
241
242 fork StatsD from here: http://github.com/etsy/statsd
243
244 1. Clone your fork
245 2. Hack away
246 3. If you are adding new functionality, document it in the README
247 4. If necessary, rebase your commits into logical chunks, without errors
248 5. Push the branch up to GitHub
249 6. Send a pull request to the etsy/statsd project.
250
251 We'll do our best to get your changes in!
252
2dff77f @mrtazz update README
mrtazz authored Jun 18, 2012
253
a0636b8 @kastner Edited README.md via GitHub
kastner authored Feb 17, 2011
254 [graphite]: http://graphite.wikidot.com
255 [etsy]: http://www.etsy.com
256 [blog post]: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
257 [node]: http://nodejs.org
da683ef @mheffner Update readme with backend documentation.
mheffner authored Apr 3, 2012
258 [nodemods]: http://nodejs.org/api/modules.html
259 [udp]: http://en.wikipedia.org/wiki/User_Datagram_Protocol
a0636b8 @kastner Edited README.md via GitHub
kastner authored Feb 17, 2011
260
261
262 Contributors
263 -----------------
264
845b294 @davedash Fixed a typo in the README.
davedash authored Apr 8, 2011
265 In lieu of a list of contributors, check out the commit history for the project:
2dff77f @mrtazz update README
mrtazz authored Jun 18, 2012
266 https://github.com/etsy/statsd/graphs/contributors
Something went wrong with that request. Please try again.