f9c8fdf Jul 2, 2014
276 lines (202 sloc) 10.1 KB
######################
### Introduction
######################
This module enables the sending of Statsd[1] statistics directly
from Apache, without the need for a CustomLog processor. It will
send one counter and one timer per request received.
For a request to www.example.com/foo/bar?baz=42, the stat format
would be:
<prefix.>foo.bar.GET.200<.suffix>
Where <prefix> and <suffix> are optionally configured (see below).
The path gets converted from /foo/bar to foo.bar and the HTTP method
(GET) and the response code (200) are also part the stat.
Illegal or undesirable characters as part of the path (like '.',
which is a record separator in graphite, or ':' and '|' which have
special meaning in Statsd), will be replaced by a '_' in the metric.
If you do not want the statname to be inferred from the URL, there
are ways to set the statname via an Apache note, an outgoing header
or staticaly in the configuration. See the StatsdStat directive below
for details.
The module is implemented as a logging hook into Apache's runtime,
meaning it runs after your request backend request is completed
and data is already being sent to the client.
On my very mediocre VM on my laptop, the average overhead per call
was 0.4 milliseconds, so any server grade hardware you have should
be able to do better than that.
I've written a companion module for Varnish[2] as well called
libvmod-statsd[3] in case you're running Varnish instead/also.
[1] http:/github.com/etsy/statsd
[2] http:/varnish-cache.org
[3] https://github.com/jib/libvmod-statsd
######################
### Example
######################
Here's an example configuration. More advanced configuration options
are available as well - see the Configuration section below:
<Location />
Statsd On
StatsdHost statsd.example.com # defaults to localhost
StatsdPort 8155 # defaults to 8125
StatsdTimeUnit microseconds # defaults to milliseconds
StatsdPrefix prod.httpd # defaults to NULL
StatsdSuffix webserver001 # defaults to NULL
</Location>
######################
### Debugging
######################
Because this module is setup as a logging hook, it's not possible to set
an outgoing header with the stat, as the response is already sent. To be
able to discern what the module is doing, it sets a Note[4] that you can
inspect in your logfiles. The Note field is %{statsd}n, is whitespace
separated and will look something like this:
path.from.url.GET.200 942 48 1
The first field is the stat as it is sent to Statsd. The second field is
the request duration (in whatever unit you specified, see below) and the
third field is the amount of bytes sent to statsd. If the third field is
-1, the delivery to Statsd failed and is a telltale sign of a broken
Statsd configuration for this module. The fourth field indicates whether
legacy mode is enabled or not.
Note that if your Statsd server is not on localhost, there's no way for
the kernel to know that the remote host was not available/misconfigured
and can't tell if the delivery failed.
[4] http://httpd.apache.org/docs/current/mod/mod_log_config.html
######################
### Configuration
######################
Note: All the directives can be either set in the Location, Directory or .htaccess
sections of the configuration.
*** Statsd directive
Syntax: Statsd on|off
Default: Statsd off
When mod_statsd is loaded, and Statsd on is set, Apache sends two stats (a
counter and a timer) for every request that is received. For a request to
'www.example.com/foo/bar?baz=42' the stat format would be:
<prefix.>foo.bar.GET.200<.suffix>
Where <prefix> and <suffix> are optionally configured (see below). The path
gets converted from /foo/bar to foo.bar and the HTTP method (GET) and the
response code (200) are also part the stat.
This directive can be used to turn this behavior on or off on a per-location
or per-directory basis. By default, enabling the mod_statsd module will
not trigger this behaviour, unless this directive is set to 'On'.
*** StatsdHost directive
Syntax: StatsdHost hostname
Default: localhost
This directive allows you to set the hostname of your statsd server. By
default it will connect to 'localhost'.
*** StatsdPort directive
Syntax: StatsdPort portnumber
Default: 8125
This directive allows you to set the port that your statsd daemons is
listening on. By default it will connect to 'localhost'.
*** StatsdTimeUnit directive
Syntax: StatsdTimeUnit seconds|milliseconds|microseconds
Default: milliseconds
This directive allows you to set the unit of time that mod_statsd uses
for statsd timers. By default, the timing is set to milliseconds but since
statsd only acceptes integers as its timing numbers, it may be useful to
change this to microseconds for particularly fast services. The option of
seconds is also provided for particularly slow services.
*** StatsdPrefix directive
Syntax: StatsdPrefix prefix
Default: NULL
This directive allows you to set a prefix to be included with every stat
that is sent to statsd. This is particularly useful for identifying the
environment, server, cluster or service that is running this module.
For example, this would be a good way to identify a particular service:
StatsdPrefix production.httpd.apiservice
*** StatsdSuffix directive
Syntax: StatsdSuffix suffix
Default: NULL
This directive allows you to set a suffix to be included with every stat
that is sent to statsd. This is particularly useful for identifying the
environment, server, cluster or service that is running this module.
For example, this would be a good way to aggregate all statistics per
server:
StatsdSuffix apiserver001
*** StatsdStat directive
Syntax: StatsdStat statname
Default: NULL
This directive allows you to unconditionally set the stat name, regardless
of the path that was hit on the webserver. For example, if you host a REST
service, and only care about the top level entry point, this may be a good
default:
<Location /api/userdata>
Statsd On
StatsdStat api.userdata
</Location>
This way, requests to both www.example.com/api/userdata/age/42 as well as
www.example.com/api/userdata/gender/male would be grouped up under the
api.userdata key.
Alternately, you can set the statname via an Apache note from your app
using the 'statsd.stat' note. From PHP, this would work like:
<?php apache_note('statsd.stat', 'set.via.note') ?>
Another option would be to set an outgoing header in your application,
which will be used (but not removed) by mod_statsd to set the statname.
For example:
X-Statsd-Stat: set.via.header
The order of preference for the statname is StatsdStat directive, Apache
note, outgoing header and lastly url inferenence.
Note that both the HTTP method & status code would still be appended to
this key, as would any prefix & suffix you set up.
*** StatsdAggregateStat directive
Syntax: StatsdAggregateStat statname
Default: NULL
Like StatsdStat, this directive allows you to unconditionally set the
aggregate stat name. This is useful to roll up stats for a given endpoint
or server. For example, you may have an API endpoint like this:
<Location /api>
Statsd On
StatsdAggregateStat api._total
</Location>
Now, every request under /api (like, /api/foo, and /api/bar) will get their
individual stats, but additionally, there will be a roll up stat called
'api._total', which will be incremented for any requests under /api.
This makes it possible to do aggregation on the server side, rather than
burdening graphite or other backend systems with either custom aggregation
rules, or complicated query patterns.
*** StatsdExclude directive
Syntax: StatsdExclude regex [, regex, ...]
Default: NULL
This directive allows you to specify a list of regular expressions to
exlude individual part of the URL path from the stat to be sent. For
example, using the following configuration:
<Location />
Statsd On
StatsdExclude ^\d+$ bar
</Location>
A request to www.example.com/foo/42/bar/baz/foobar would produce the
stat foo.baz.GET.200. This is again useful if you have a restful URL
where certain parts of the URL are either dynamic (like IDs) or not
relevant to the statistic gathered.
*** StatsdHTTPVerbs directive
Syntax: StatsdHTTPVerbs verb [, verb, ...]
Default: NULL
This directive allows you to specify a list of HTTP verbs that you want
to log separately. By default, every HTTP verb will get it's own unique
stat assigned. However, badly behaved clients or generally busy services
may mean that your stat space gets very large.
To work around this, you can call out the verbs you want to track explicitly,
and all other verbs will be grouped together under 'OtherVerbs'. Note that
the verbs are case sensitive and should be capitalized.
<Location />
Statsd On
StatsdHTTPVerbs GET POST
</Location>
A GET request to /foo/bar would produce the stat foo.bar.GET.200, but
a HEAD request to /foo/bar would produce foo.bar.OtherVerbs.200
*** StatsdLegacyMode directive
Syntax: StatsdLegacyMode on|off
Default: StatsdLegacyMode on
By default, when mod_statsd is loaded, and Statsd on is set, Apache
sends two stats (a counter and a timer) for every request that is
received.
In newer versions of Statsd[5] (v0.6.0 and up), a new naming scheme was
adopted for stats sent, which, among others, produces counters for every
timer that's sent.
The default in Statsd v0.6.0 is to have 'Legacy Namespace' on by default,
which keeps on sending both the counters & timer, and mod_statsd follows
that default.
If you're using Statsd v0.6.0 or up, and you're NOT using the legacy
namespace feature, you probably want to set this directive to 'off' as
well.
[5] https://github.com/etsy/statsd/blob/v0.6.0/exampleConfig.js#L57