Added Prometheus Plugin #692

pradeepchhetri · 2016-05-22T12:39:59Z

This pull request adds support for Prometheus backend. Since Prometheus uses pull model for collecting metrics, it provides a Pushgateway server which acts as an intermediate proxy for pushing datapoints.

Sample Screenshot:

pradeepchhetri · 2016-05-26T15:26:06Z

Can someone please review this pull request ?

jamtur01 · 2016-05-26T15:31:04Z

@pradeepchhetri Might be a little while - we're all very part time maintainers! I'll try to look on the weekend. But please be patient. :)

pradeepchhetri · 2016-05-26T15:34:10Z

@jamtur01 James, thank you for the quick reply. :)

mfournier · 2016-05-28T11:02:41Z

This looks promising @pradeepchhetri ! Thanks for working on this !

One thing that strikes me is that the attributes from the riemann event get dropped. Only tags and the "host" field seem to get submitted to the push-gateway. IMHO it's important to also pass (most of) the attributes along to prometheus. We'll typically want to see "service", "state" and friends on the prometheus side too. What do you think ?

pradeepchhetri · 2016-05-28T12:05:16Z

@mfournier Thank you for looking into it. We are actually passing the "service" and "metric" fields as the body of the http post request (https://github.com/pradeepchhetri/riemann/blob/prometheus/src/riemann/prometheus.clj#L18-L22). I wasn't able to figure out how to pass "state" field into each datapoint.

The reason i am not attaching the timestamp is because prometheus overwrites the timestamp with the time it scrapes pushgateway. For more details: https://github.com/prometheus/pushgateway#about-timestamps

Let me know if you feel we can improve the plugin in some way.

mfournier · 2016-05-28T12:37:31Z

Oh, just noticed this problem when submitting an event with an empty metric field:

clojure.lang.ExceptionInfo: clj-http: status 500 {:status 500, :headers {"Content-Type" "text/plain; charset=utf-8", "X-Content-Type-Options" "nosniff", "Date" "Sat, 28 May 2016 12:32:59 GMT", "Content-Length" "84", "Connection" "close"}, :body "text format parsing error in line 1: expected float as value, got \"5940577/1000000\"\n", :request-time 3, :trace-redirects ["http://localhost:9091/metrics/job/riemann/host/lonquimay"], :orig-content-encoding nil}

Here is what the event looked like when dumped to the logs:

INFO [2016-05-28 14:35:22,268] defaultEventExecutorGroup-2-1 - riemann.config - #riemann.codec.Event{:host toto, :service foobar, :state nil, :description nil, :metric nil, :tags nil, :time 1.464438922266E9, :ttl 60, :x-client riemann-c-client}

jamtur01 · 2016-05-28T13:42:51Z

src/riemann/prometheus.clj

+    (fn [event]
+      (let [url (generate-url opts event)
+            datapoint (generate-datapoint event)]
+        (when (and (:metric event) (:service event))


Is this logic repeated? Don't you already check for :service and :metric in generate-datapoint?

Nice catch. I will fix it.

jamtur01 · 2016-05-28T13:47:29Z

@pradeepchhetri Can you pass :state et al as a label? Like you did tags?

pradeepchhetri · 2016-05-28T14:00:54Z

@mfournier I can reproduce the error. Looking into it.

mfournier · 2016-05-28T20:43:03Z

Looking at the implementation more in details, I must say I'm not really sold to the tagN, tagN+1, etc, idea. On the prometheus side, building a graph/alert rule based on this implies knowning in advance how many tags there are, and then searching each of them for the element we're looking for. Wouldn't it be better to join them together in a single label, seperated by a comma for example (better: configurable character) ? ie: some_event{tags="foo,bar,baz"}. Then it would simply be a matter of using the =~ operator in the prometheus query.

About the attributes, again I think it's really important to keep them all. You want as much info as possible about your events in prometheus too. Cleaning up events from extraneous tags/attributes before submission to prometheus is easy in riemann. Having to work with incomplete data in prometheus, less so.

So here's a sample event as seen from riemann:
#riemann.codec.Event{:host toto, :service example, :state ok, :description this is some description, :metric 123, :tags [foo bar blah], :time 1.464465272033E9, :ttl 60, :x-client riemann-c-client, :attr1 val1, :attr2 val2}
...and here is what it looks like in prometheus:
example{host="toto",instance="",job="riemann",tag0="foo",tag1="bar",tag2="blah"} 123
My suggestion is that it should look like:
example{host="toto",instance="",job="riemann",state="ok",description="this is some description",x-client="riemann-c-client",attr1="val1",attr2="val2",tags="foo,bar,blah"} 123

Finally, a couple of thoughts/wishlist for later (ie: I non-requirements for an initial implementation imho):

ability to delete metrics. typically to call from the (expired) function in riemann.
add the riemann server's hostname in the "instance" field.
ability to add the timestamp & ttl to outgoing events, as an option. We're basically at the interface between a push-based and a pull-based system here, so there is a risk of missing ephemeral state changes, detecting stale data will be tricky, etc. By providing this, we give users a key element to make it less painful.
also wondering if firing an http request for every event, at high rate, won't be harmful to the pushgateway. But I'm not sure it's really supposed to be used with frequently updated metrics anyways.

faxm0dem · 2016-05-29T18:47:21Z

👍 for keeping all attributes

brian-brazil · 2016-05-30T11:36:39Z

Prometheus developer here. I'm not too familiar with Riemann, I've only read through the docs a few times.

Wouldn't it be better to join them together in a single label, seperated by a comma for example (better: configurable character)

This is our standard way of dealing with lists of items for service discovery. We also prefix and suffix a comma so that it's easier to write the regex so tags=",foo,bar,baz,". http://www.robustperception.io/little-things-matter/ explains why.

It provides a Pushgateway server which acts as an intermediate proxy for pushing datapoints.

This is explicitly not what the pushgateway is for, see https://prometheus.io/docs/practices/pushing/

I think what you want here is two fold. First something similar to the influxdb and graphite exporters that can take in data in the Riemann format and expose it in the Prometheus data format. Secondly an integration in Riemann itself to talk out to the Prometheus Alertmanager (https://prometheus.io/docs/alerting/clients/) as you would something like Pagerduty.

We're basically at the interface between a push-based and a pull-based system here, so there is a risk of missing ephemeral state changes, detecting stale data will be tricky, etc

Push vs pull is not the issue here, there's ways to handle that. The real challenge is that this is an interface between an event logging system and a metric system. I'm not sure it's sane to try an create a generic zero-configuration link between such systems, something more query based is probably the most you can do (https://github.com/chop-dbhi/prometheus-sql is probably the closest example, and https://github.com/prometheus/nagios_plugins from the other direction).

You want your time series to be continuous over time, which doesn't work with a dynamic label such as state as every time it changes you get a new time series which will not work out well either semantically (staleness) or performance (churn) wise.

But I'm not sure it's really supposed to be used with frequently updated metrics anyways.

No, the intention is relativity rarely run service-level batch jobs that might have say a few pushes a minute in aggregate. Accordingly we haven't done any tuning or benchmarking of the pushgateway.

brian-brazil · 2016-05-30T11:42:57Z

Taking a bit of a look at your data format, your host maps to our instance and your service probably maps to our job label.

pradeepchhetri · 2016-05-30T12:50:49Z

@brian-brazil Thank you for joining the discussion and providing great suggestions in improving the plugin.

One quick question: Is pushgateway the only way to push metrics to prometheus. Since riemann pushes streams of metrics while pushgateway is for batch kind of jobs, I personally feel that there is high possiblity of missing some datapoints while pushing through pushgateway. Can i push datapoints directly to prometheus in realtime.

brian-brazil · 2016-05-30T13:15:42Z

I personally feel that there is high possiblity of missing some datapoints while pushing through pushgateway. Can i push datapoints directly to prometheus in realtime.

We don't and won't support this. However if you look at things like the influxdb, collectd and graphite exporters they all take in realtime pushes of metrics.

You can lose information with exporter style approaches depending on exactly what you're doing, which is why we recommend using the Prometheus client libraries which are designed to be more resilient (you could even get them to output data in the Riemann format).

Since riemann pushes streams of metrics while pushgateway is for batch kind of jobs

As far as I can tell Riemann pushes events, not metrics. This mismatch is what makes this trickier than usual, as it's not just the usual question of how to map your tags and attributes to ours.

How has this problem been solved with graphite, opentsdb and kariosdb for riemann? They also deal in metrics rather than events.

pradeepchhetri · 2016-05-30T13:41:00Z

How has this problem been solved with graphite, opentsdb and kariosdb for riemann? They also deal in metrics rather than events.

Graphite, opentsdb, kairosdb & other timeseries databases pre-define the attributes for each datapoint while riemann provides the flexibility of adding additional attributes apart from the pre-defined ones. Hence a riemann event is just a superset of each of these timeseries database metric.

pradeepchhetri · 2016-05-30T14:35:05Z

@jamtur01 @mfournier I have updated the plugin with the following fixes:

Fixed the issue when there is an empty riemann attribute.
Fixed the tags.
Added extra test for the case with tags.
Any extra riemann attributes will be available as prometheus label.

pradeepchhetri · 2016-06-04T15:13:57Z

Bumping this.

jamtur01 · 2016-06-04T15:22:42Z

src/riemann/prometheus.clj

+
+(def special-fields
+  "A set of event fields in Riemann with special handling logic."
+  #{:service :metric :tags :time :ttl})


I feel like there needs to be some way to exclude fields from labels. Let's say I create a new field with a value, like a metric, for which a label doesn't make sense?

special-fields is the variable for excluding riemann fields from prometheus labels (probably should give better name). Also should expose it as configuration variable so that anyone can overwrite it. @jamtur01 Does that sounds right ?

Yes I meant exposing it.

pradeepchhetri · 2016-06-07T06:27:51Z

Updated plugin with the above feedbacks.

pradeepchhetri · 2016-06-09T11:57:25Z

Bumping this :)

jamtur01 · 2016-06-09T12:02:47Z

@pradeepchhetri You can keep bumping it but I am afraid it's still going to be "we'll get to it soon"!

jamtur01 · 2016-07-28T12:30:18Z

Thanks @pradeepchhetri !

Added Prometheus Plugin (https://prometheus.io)

815632b

pradeepchhetri changed the title ~~Added Prometheus Plugin (https://prometheus.io)~~ Added Prometheus Plugin May 22, 2016

jamtur01 reviewed May 28, 2016
View reviewed changes

Feedbacks incorporated

83597e5

pradeepchhetri added 2 commits May 31, 2016 11:45

Minor cleanups

51d89b3

Minor cleanups

86b900e

jamtur01 reviewed Jun 4, 2016
View reviewed changes

pradeepchhetri added 2 commits June 7, 2016 11:15

Minor Cleanup

85a8d84

Incorporated feedbacks

604ee43

jamtur01 merged commit 9d522ce into riemann:master Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Prometheus Plugin #692

Added Prometheus Plugin #692

pradeepchhetri commented May 22, 2016

pradeepchhetri commented May 26, 2016

jamtur01 commented May 26, 2016

pradeepchhetri commented May 26, 2016

mfournier commented May 28, 2016

pradeepchhetri commented May 28, 2016 •

edited

Loading

mfournier commented May 28, 2016

jamtur01 May 28, 2016

pradeepchhetri May 28, 2016

jamtur01 commented May 28, 2016

pradeepchhetri commented May 28, 2016

mfournier commented May 28, 2016

faxm0dem commented May 29, 2016

brian-brazil commented May 30, 2016

brian-brazil commented May 30, 2016

pradeepchhetri commented May 30, 2016

brian-brazil commented May 30, 2016

pradeepchhetri commented May 30, 2016 •

edited

Loading

pradeepchhetri commented May 30, 2016 •

edited

Loading

pradeepchhetri commented Jun 4, 2016

jamtur01 Jun 4, 2016

pradeepchhetri Jun 4, 2016

jamtur01 Jun 4, 2016

pradeepchhetri commented Jun 7, 2016

pradeepchhetri commented Jun 9, 2016

jamtur01 commented Jun 9, 2016

jamtur01 commented Jul 28, 2016

Added Prometheus Plugin #692

Added Prometheus Plugin #692

Conversation

pradeepchhetri commented May 22, 2016

pradeepchhetri commented May 26, 2016

jamtur01 commented May 26, 2016

pradeepchhetri commented May 26, 2016

mfournier commented May 28, 2016

pradeepchhetri commented May 28, 2016 • edited Loading

mfournier commented May 28, 2016

jamtur01 May 28, 2016

Choose a reason for hiding this comment

pradeepchhetri May 28, 2016

Choose a reason for hiding this comment

jamtur01 commented May 28, 2016

pradeepchhetri commented May 28, 2016

mfournier commented May 28, 2016

faxm0dem commented May 29, 2016

brian-brazil commented May 30, 2016

brian-brazil commented May 30, 2016

pradeepchhetri commented May 30, 2016

brian-brazil commented May 30, 2016

pradeepchhetri commented May 30, 2016 • edited Loading

pradeepchhetri commented May 30, 2016 • edited Loading

pradeepchhetri commented Jun 4, 2016

jamtur01 Jun 4, 2016

Choose a reason for hiding this comment

pradeepchhetri Jun 4, 2016

Choose a reason for hiding this comment

jamtur01 Jun 4, 2016

Choose a reason for hiding this comment

pradeepchhetri commented Jun 7, 2016

pradeepchhetri commented Jun 9, 2016

jamtur01 commented Jun 9, 2016

jamtur01 commented Jul 28, 2016

pradeepchhetri commented May 28, 2016 •

edited

Loading

pradeepchhetri commented May 30, 2016 •

edited

Loading

pradeepchhetri commented May 30, 2016 •

edited

Loading