Skip to content
Fast carbon relay+aggregator with admin interfaces for making changes online - production ready
Go JavaScript HTML Makefile Shell Dockerfile
Branch: master
Clone or download
Latest commit 70bdf9a Nov 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci separate releases vs inbetweens wrt package repo and docker tag Nov 8, 2018
aggregator awoods feedback: early return Nov 14, 2019
badmetrics bugfix: don't periodically delete *all* bad entries, do too old check Jun 24, 2016
cfg default to 2 minutes, allow 0s to disable Oct 23, 2018
clock process pending aggregations more eagerly Apr 30, 2019
cmd/carbon-relay-ng report aggregator input more accurately. Nov 14, 2019
destination Fix deadlock in keepSafe Jun 19, 2019
docs document kafkaMdm route a bit better Oct 7, 2019
examples reorder things a bit to make more sense Nov 9, 2018
experiments/issue-241 Update experiments/issue-241/README.md May 13, 2019
go-whisper include unit tests from forked projects (minus unneeded stuff) Oct 23, 2018
imperatives bugfix: allow partitionBy bySeriesWithTags Oct 15, 2019
input check pickle version more precisely so that prefix error is accurate Feb 6, 2019
logger BREAKING: refactor logging, using logrus Oct 22, 2018
man/man1 Package it up. Jun 6, 2012
matcher BREAKING: refactor logging, using logrus Oct 22, 2018
nsqd can now simply use os.Rename Dec 15, 2017
persister gofmt -s Oct 23, 2018
rewriter rewriter: support "not" clause + DRY constructor Jun 21, 2018
route update comment to be more accurate Aug 21, 2019
screenshots add dashboard explanation screenshot Aug 10, 2017
spool easier spool tests out of the box Dec 9, 2014
stats update internal stats to use mtype instead of target_type Dec 21, 2016
statsmt make registry and its method publically accessible Nov 13, 2019
table bugfix: allow partitionBy bySeriesWithTags Oct 15, 2019
telnet make it clearer that making changes on the fly is experimental Aug 8, 2019
ui make it clearer that making changes on the fly is experimental Aug 8, 2019
util clearer Apr 23, 2019
validate implement tuneable validation of metrics2.0 messages Aug 18, 2016
vendor use schema from metrictank Aug 15, 2019
.gitattributes fix bug in readDestinations change, mark ui/web/bindata.go as binary May 16, 2017
.gitignore build for windows Dec 15, 2017
CHANGELOG.md fix title Sep 18, 2019
Dockerfile alpine moved to a new repo Oct 16, 2018
Gopkg.lock use schema from metrictank Aug 15, 2019
Gopkg.toml use schema from metrictank Aug 15, 2019
LICENSE Package it up. Jun 6, 2012
Makefile Use $(CURDIR) instead of $(pwd) in Makefile run-docker Apr 15, 2019
README.md make it clearer that making changes on the fly is experimental Aug 8, 2019
build_docker.sh needs to be more explicit I think Nov 8, 2018
grafana-dashboard.json also track number of flushed metrics per aggregator Apr 30, 2019
vendor_health.sh update vendor_health for dep Oct 15, 2018

README.md

Circle CI Go Report Card GoDoc

carbon-relay-ng

A relay for carbon streams, in go. Like carbon-relay from the graphite project, except it:

  • performs better: should be able to do about 100k ~ 1M million metrics per second depending on configuration and CPU speed.
  • you can adjust the routing table at runtime, in real time using the web or telnet interface (this feature has rough edges and is not production ready)
  • has aggregator functionality built-in for cross-series, cross-time and cross-time-and-series aggregations.
  • supports plaintext and pickle graphite routes (output) and metrics2.0/grafana.net, as well as kafka, Google PubSub and Amazon CloudWatch.
  • graphite routes supports a per-route spooling policy. (i.e. in case of an endpoint outage, we can temporarily queue the data up to disk and resume later)
  • performs validation on all incoming metrics (see below)
  • supported inputs: plaintext, pickle and AMQP (rabbitmq)

This makes it easy to fanout to other tools that feed in on the metrics. Or balance/split load, or provide redundancy, or partition the data, etc. This pattern allows alerting and event processing systems to act on the data as it is received (which is much better than repeated reading from your storage)

screenshot

Documentation

Concepts

You have 1 master routing table. This table contains 0-N routes. There's different route types. A carbon route can contain 0-M destinations (tcp endpoints)

First: "matching": you can match metrics on one or more of: prefix, substring, or regex. All 3 default to "" (empty string, i.e. allow all). The conditions are AND-ed. Regexes are more resource intensive and hence should - and often can be - avoided.

  • All incoming metrics are validated and go into the table when valid.

  • The table will then check metrics against the blacklist and discard when appropriate.

  • Then metrics pass through the rewriters and are modified if applicable. Rewrite rules wrapped with forward slashes are interpreted as regular expressions.

  • The table sends the metric to:

    • the aggregators, who match the metrics against their rules, compute aggregations and feed results back into the table. see Aggregation section below for details.
    • any routes that matches
  • The route can have different behaviors, based on its type:

    • for grafanaNet / kafkaMdm / Google PubSub routes, there is only a single endpoint so that's where the data goes. For standard/carbon routes you can control how data gets routed into destinations (note that destinations have settings to match on prefix/sub/regex, just like routes):
    • sendAllMatch: send all metrics to all the defined endpoints (possibly, and commonly only 1 endpoint).
    • sendFirstMatch: send the metrics to the first endpoint that matches it.
    • consistentHashing: the algorithm is the same as Carbon's consistent hashing.
    • round robin: the route is a RR pool (not implemented)

carbon-relay-ng (for now) focuses on staying up and not consuming much resources.

For carbon routes:

  • if connection is up but slow, we drop the data
  • if connection is down and spooling enabled. we try to spool but if it's slow we drop the data
  • if connection is down and spooling disabled -> drop the data

kafka, Google PubSub, and grafanaNet have an in-memory buffer and can be configured to blocking or non-blocking mode when the buffer runs full.

You can’t perform that action at this time.