Fast carbon relay+aggregator with admin interfaces for making changes online - production ready
Clone or download
Latest commit e75e3ff Nov 9, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci separate releases vs inbetweens wrt package repo and docker tag Nov 8, 2018
aggregator BREAKING: refactor logging, using logrus Oct 22, 2018
badmetrics bugfix: don't periodically delete *all* bad entries, do too old check Jun 24, 2016
cfg default to 2 minutes, allow 0s to disable Oct 23, 2018
clock simpler way of ticking Oct 16, 2017
cmd/carbon-relay-ng usage: include info that you can call version Oct 26, 2018
destination typo in log msg Nov 8, 2018
docs v0.11.0 Nov 9, 2018
examples reorder things a bit to make more sense Nov 9, 2018
go-whisper include unit tests from forked projects (minus unneeded stuff) Oct 23, 2018
imperatives grafanaNet~compatible validation: validate orgId and interval Aug 16, 2018
input fix tests Oct 23, 2018
logger BREAKING: refactor logging, using logrus Oct 22, 2018
man/man1 Package it up. Jun 6, 2012
matcher BREAKING: refactor logging, using logrus Oct 22, 2018
nsqd can now simply use os.Rename Dec 15, 2017
persister gofmt -s Oct 23, 2018
rewriter rewriter: support "not" clause + DRY constructor Jun 21, 2018
route remove extraneous newlines from log lines Oct 24, 2018
screenshots add dashboard explanation screenshot Aug 10, 2017
spool easier spool tests out of the box Dec 9, 2014
stats update internal stats to use mtype instead of target_type Dec 21, 2016
table remove extraneous newlines from log lines Oct 24, 2018
telnet fix unreachable code Dec 10, 2016
ui remove extraneous newlines from log lines Oct 24, 2018
util simplify & cleanup Mar 5, 2018
validate implement tuneable validation of metrics2.0 messages Aug 18, 2016
vendor dep ensure Oct 23, 2018
.gitattributes fix bug in readDestinations change, mark ui/web/bindata.go as binary May 16, 2017
.gitignore build for windows Dec 15, 2017
CHANGELOG.md v0.11.0 Nov 9, 2018
Dockerfile alpine moved to a new repo Oct 16, 2018
Gopkg.lock dep ensure Oct 23, 2018
Gopkg.toml dep ensure Oct 22, 2018
LICENSE Package it up. Jun 6, 2012
Makefile tag using the correct tag locally as well Nov 8, 2018
README.md copy changelog from GitHub into repo Nov 8, 2018
build_docker.sh needs to be more explicit I think Nov 8, 2018
grafana-dashboard.json update dashboard to recent grafana + add viz for new aggregator metrics May 31, 2018
vendor_health.sh update vendor_health for dep Oct 15, 2018

README.md

Circle CI Go Report Card GoDoc

carbon-relay-ng

A relay for carbon streams, in go. Like carbon-relay from the graphite project, except it:

  • performs better: should be able to do about 100k ~ 1M million metrics per second depending on configuration and CPU speed.
  • you can adjust the routing table at runtime, in real time using the web or telnet interface (though they may have some rough edges)
  • has aggregator functionality built-in for cross-series, cross-time and cross-time-and-series aggregations.
  • supports plaintext and pickle graphite routes (output) and metrics2.0/grafana.net, as well as kafka, Google PubSub and Amazon CloudWatch.
  • graphite routes supports a per-route spooling policy. (i.e. in case of an endpoint outage, we can temporarily queue the data up to disk and resume later)
  • performs validation on all incoming metrics (see below)
  • supported inputs: plaintext, pickle and AMQP (rabbitmq)

This makes it easy to fanout to other tools that feed in on the metrics. Or balance/split load, or provide redundancy, or partition the data, etc. This pattern allows alerting and event processing systems to act on the data as it is received (which is much better than repeated reading from your storage)

screenshot

Documentation

Concepts

You have 1 master routing table. This table contains 0-N routes. Each carbon route can contain 0-M destinations (tcp endpoints)

First: "matching": you can match metrics on one or more of: prefix, substring, or regex. All 3 default to "" (empty string, i.e. allow all). The conditions are AND-ed. Regexes are more resource intensive and hence should - and often can be - avoided.

  • All incoming metrics are validated and go into the table when valid.

  • The table will then check metrics against the blacklist and discard when appropriate.

  • Then metrics pass through the rewriters and are modified if applicable. Rewrite rules wrapped with forward slashes are interpreted as regular expressions.

  • The table sends the metric to:

    • the aggregators, who match the metrics against their rules, compute aggregations and feed results back into the table. see Aggregation section below for details.
    • any routes that matches
  • The route can have different behaviors, based on its type:

    • for grafanaNet / kafkaMdm / Google PubSub routes, there is only a single endpoint so that's where the data goes. For standard/carbon routes you can control how data gets routed into destinations:
    • sendAllMatch: send all metrics to all the defined endpoints (possibly, and commonly only 1 endpoint).
    • sendFirstMatch: send the metrics to the first endpoint that matches it.
    • consistentHashing: the algorithm is the same as Carbon's consistent hashing.
    • round robin: the route is a RR pool (not implemented)

carbon-relay-ng (for now) focuses on staying up and not consuming much resources.

For carbon routes:

  • if connection is up but slow, we drop the data
  • if connection is down and spooling enabled. we try to spool but if it's slow we drop the data
  • if connection is down and spooling disabled -> drop the data

kafka, Google PubSub, and grafanaNet have an in-memory buffer and can be configured to blocking or non-blocking mode when the buffer runs full.