Fast carbon relay+aggregator with admin interfaces for making changes online - production ready
Clone or download
Latest commit 5c20a26 Oct 18, 2018
Permalink
Failed to load latest commit information.
.circleci use new go-bindata fork. Aug 22, 2018
aggregator track timestamps coming in to aggregators and being too old May 31, 2018
badmetrics bugfix: don't periodically delete *all* bad entries, do too old check Jun 24, 2016
cfg rewriter: support "not" clause + DRY constructor Jun 21, 2018
clock simpler way of ticking Oct 16, 2017
cmd/carbon-relay-ng make max_procs setting optional, honor GOMAXPROCS env var Oct 12, 2018
destination lint fix, update ui/web/bindata.go Sep 17, 2018
docs use new go-bindata fork. Aug 22, 2018
examples make max_procs setting optional, honor GOMAXPROCS env var Oct 12, 2018
imperatives grafanaNet~compatible validation: validate orgId and interval Aug 16, 2018
input stop leaking go routines Oct 18, 2018
man/man1 Package it up. Jun 6, 2012
matcher fix typos Jan 29, 2017
nsqd can now simply use os.Rename Dec 15, 2017
rewriter rewriter: support "not" clause + DRY constructor Jun 21, 2018
route fix issue with kafkaMdm tests Sep 17, 2018
screenshots add dashboard explanation screenshot Aug 10, 2017
spool easier spool tests out of the box Dec 9, 2014
stats update internal stats to use mtype instead of target_type Dec 21, 2016
table simplify listeners Oct 1, 2018
telnet fix unreachable code Dec 10, 2016
ui simplify listeners Oct 1, 2018
util simplify & cleanup Mar 5, 2018
validate implement tuneable validation of metrics2.0 messages Aug 18, 2016
vendor more trackable toml patch Oct 15, 2018
.gitattributes fix bug in readDestinations change, mark ui/web/bindata.go as binary May 16, 2017
.gitignore build for windows Dec 15, 2017
Dockerfile alpine moved to a new repo Oct 16, 2018
Gopkg.lock more trackable toml patch Oct 15, 2018
Gopkg.toml more trackable toml patch Oct 15, 2018
LICENSE Package it up. Jun 6, 2012
Makefile use new go-bindata fork. Aug 22, 2018
README.md link to stuff Aug 16, 2018
grafana-dashboard.json update dashboard to recent grafana + add viz for new aggregator metrics May 31, 2018
vendor_health.sh update vendor_health for dep Oct 15, 2018

README.md

Circle CI Go Report Card GoDoc

carbon-relay-ng

A relay for carbon streams, in go. Like carbon-relay from the graphite project, except it:

  • performs better: should be able to do about 100k ~ 1M million metrics per second depending on configuration and CPU speed.
  • you can adjust the routing table at runtime, in real time using the web or telnet interface (though they may have some rough edges)
  • has aggregator functionality built-in for cross-series, cross-time and cross-time-and-series aggregations.
  • supports plaintext and pickle graphite routes (output) and metrics2.0/grafana.net, as well as kafka, Google PubSub and Amazon CloudWatch.
  • graphite routes supports a per-route spooling policy. (i.e. in case of an endpoint outage, we can temporarily queue the data up to disk and resume later)
  • performs validation on all incoming metrics (see below)
  • supported inputs: plaintext, pickle and AMQP (rabbitmq)

This makes it easy to fanout to other tools that feed in on the metrics. Or balance/split load, or provide redundancy, or partition the data, etc. This pattern allows alerting and event processing systems to act on the data as it is received (which is much better than repeated reading from your storage)

screenshot

Documentation

Concepts

You have 1 master routing table. This table contains 0-N routes. Each carbon route can contain 0-M destinations (tcp endpoints)

First: "matching": you can match metrics on one or more of: prefix, substring, or regex. All 3 default to "" (empty string, i.e. allow all). The conditions are AND-ed. Regexes are more resource intensive and hence should - and often can be - avoided.

  • All incoming metrics are validated and go into the table when valid.

  • The table will then check metrics against the blacklist and discard when appropriate.

  • Then metrics pass through the rewriters and are modified if applicable. Rewrite rules wrapped with forward slashes are interpreted as regular expressions.

  • The table sends the metric to:

    • the aggregators, who match the metrics against their rules, compute aggregations and feed results back into the table. see Aggregation section below for details.
    • any routes that matches
  • The route can have different behaviors, based on its type:

    • for grafanaNet / kafkaMdm / Google PubSub routes, there is only a single endpoint so that's where the data goes. For standard/carbon routes you can control how data gets routed into destinations:
    • sendAllMatch: send all metrics to all the defined endpoints (possibly, and commonly only 1 endpoint).
    • sendFirstMatch: send the metrics to the first endpoint that matches it.
    • consistentHashing: the algorithm is the same as Carbon's consistent hashing.
    • round robin: the route is a RR pool (not implemented)

carbon-relay-ng (for now) focuses on staying up and not consuming much resources.

For carbon routes:

  • if connection is up but slow, we drop the data
  • if connection is down and spooling enabled. we try to spool but if it's slow we drop the data
  • if connection is down and spooling disabled -> drop the data

kafka, Google PubSub, and grafanaNet have an in-memory buffer and can be configured to blocking or non-blocking mode when the buffer runs full.