New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics for docker registry & haproxy #3916

Closed
jimmidyson opened this Issue Jul 27, 2015 · 24 comments

Comments

Projects
None yet
7 participants
@jimmidyson
Contributor

jimmidyson commented Jul 27, 2015

Most components expose metrics for ingestion into Prometheus. I'd like to see the same for haproxy & the docker registry.

HAProxy should be simple enough, running a container in the same pod using the Prometheus HAProxy exporter (https://github.com/prometheus/haproxy_exporter/).

Does the docker registry expose prometheus metrics natively?

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Aug 9, 2015

For the routers we'd also like to expose tenant metrics so we have a central gathering point. @ramr I would prefer to expose a stats endpoint via prometheus so that we don't have two stat gathering technologies at play. If you think there is something that would work better let me know - raw stats are not as effective and I'd like to correlate service+namespace to the traffic metric.

@ncdc on the registry, i think the answer is no but I think we should do so.

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Aug 9, 2015

We should be able to use Prometheus relabelling to add service & namespace as labels, as long as we can parse out that info from backed/frontend config. Alternatively it would be quite simple to write a custom bridge to add this metadata in.

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Aug 9, 2015

With routers we need to be sensitive to scale - we may have a hundred
thousand or more routes to a single router pair, and in HA setups we'll
want gather from both. We also need to be able to support metrics for
other kinds of front ends like Apache and Nginx, even if we don't do that
initial implementation. It seems like the router manager proc is going to
sample the stats endpoint anyway.

Any alternative solution will have to be scalable and flexible in a similar
way. I know there was a simple HAProxy scraper for Prometheus but I have
no idea what it's gaps would be.

On Aug 9, 2015, at 12:21 PM, Jimmi Dyson notifications@github.com wrote:

We should be able to use Prometheus relabelling to add service & namespace
as labels, as long as we can parse out that info from backed/frontend
config. Alternatively it would be quite simple to write a custom bridge to
add this metadata in.


Reply to this email directly or view it on GitHub
#3916 (comment).

@ncdc

This comment has been minimized.

Contributor

ncdc commented Aug 10, 2015

Correct, the registry doesn't have any Prometheus integration at the moment. It currently has reporting integration points with bugsnag and newrelic. What is needed to add support for Prometheus? What sort of data are you looking for?

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Aug 10, 2015

@ncdc Details of how to add Prometheus metrics & expose them are at https://godoc.org/github.com/prometheus/client_golang/prometheus. As for what data, I'm not really sure... Anything that can be used to monitor the performance of the registry - that requires knowledge of the internals of the registry I guess. Stuff like response times, number of images per namespace, storage used, etc sound like good candidates, but as I said anything that could be used to monitor the registry, both for alerting on issues & to build trends over time.

@ncdc

This comment has been minimized.

Contributor

ncdc commented Aug 10, 2015

@jimmidyson ok, we'll want to ultimately turn this into an upstream proposal for docker/distribution. At the very least, we could probably wrap the main app http.Handler with https://godoc.org/github.com/prometheus/client_golang/prometheus#InstrumentHandler, similar to how they already are doing for bugsnag and newrelic.

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Aug 10, 2015

@ncdc That sounds like a quick (hopefully easy) win.

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Aug 10, 2015

@smarterclayton Here's an example output from Prometheus exporter for HAProxy. There's only one route in there - fabric8 with only one endpoint - 172.17.0.5:9090. You can see that the metrics are labelled appropriately, e.g.:

haproxy_server_bytes_in_total{backend="be_http_default-fabric8",server="172.17.0.5:9090"} 22020

During prometheus relabelling when ingesting metrics, we could roll stats up to namespace (default in this case) & service (fabric8 in the this case), dropping labels we're not interested in, perhaps server (endpoint). We can also aggregate these metrics on ingestion so that we can have stats per namespace, etc. as required.

What do you think? Adding the prometheus haproxy_exporter as a sidecar container in the router pod would be simplest, although can also get it remotely if need be.

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Aug 10, 2015

Sidecar is a good place to start - because that decouples the router
component from the Go code we use (that way you can switch to apache and
you just need to get its own sidecar).

On Mon, Aug 10, 2015 at 3:11 PM, Jimmi Dyson notifications@github.com
wrote:

@smarterclayton https://github.com/smarterclayton Here
http://git.io/v3GTb's an example output from Prometheus exporter for
HAProxy. There's only one route in there - fabric8 with only one endpoint

  • 172.17.0.5:9090. You can see that the metrics are labelled
    appropriately, e.g.:

haproxy_server_bytes_in_total{backend="be_http_default-fabric8",server="172.17.0.5:9090"} 22020

During prometheus relabelling when ingesting metrics, we could roll stats
up to namespace (default in this case) & service (fabric8 in the this
case), dropping labels we're not interested in, perhaps server
(endpoint). We can also aggregate these metrics on ingestion so that we can
have stats per namespace, etc. as required.

What do you think? Adding the prometheus haproxy_exporter as a sidecar
container in the router pod would be simplest, although can also get it
remotely if need be.


Reply to this email directly or view it on GitHub
#3916 (comment).

Clayton Coleman | Lead Engineer, OpenShift

@ramr

This comment has been minimized.

Contributor

ramr commented Aug 10, 2015

Just saw this - bad filter rules!! Yeah given that there maybe different router implementations - exposing the metrics via some standard interface ala prometheus is definitely better. Just fyi, we do expose the stats host/port for haproxy today, so collecting the metrics is easy enough with a prometheus ${router-type}.exporter sidecar container.
Though that said, the main router command code is sorta generic and that creates the deployment configuration, so adding a sidecar container for one type of router (haproxy) and not for the other might be somewhat klunky. An alternative might be to run the infra router (which runs as the docker container watching for routes/endpoints and launches/reconfigures haproxy) with the collection sidecar code - for the specific plugin type - running in-process rather than outside as a sidecar. That might work better from a process management standpoint as well.

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Aug 10, 2015

Ultimately the router command probably should just be a template. It was
kind of a bridge until we had service accounts and some other tools.

Where possible, I would prefer not to have to have code plugins for the
router, because it requires a much higher bar for 3rd parties.

On Aug 10, 2015, at 4:33 PM, ramr notifications@github.com wrote:

Just saw this - bad filter rules!! Yeah given that there maybe different
router implementations - exposing the metrics via some standard interface
ala prometheus is definitely better. Just fyi, we do expose the stats
host/port for haproxy today, so collecting the metrics is easy enough with
a prometheus ${router-type}.exporter sidecar container.
Though that said, the main router command code is sorta generic and that
creates the deployment configuration, so adding a sidecar container for one
type of router (haproxy) and not for the other might be somewhat klunky. An
alternative might be to run the infra router (which runs as the docker
container watching for routes/endpoints and launches/reconfigures haproxy)
with the collection sidecar code - for the specific plugin type - running
in-process rather than outside as a sidecar. That might work better from a
process management standpoint as well.


Reply to this email directly or view it on GitHub
#3916 (comment).

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Aug 11, 2015

@ramr Using the stats HAProxy endpoint & Pormetheus haproxy_exporter as sidecar is exactly how I ingested metrics into Pormetheus - worked nicely & allows us to re-label metrics with namespace & service which is nice.

I prefer the idea of running the exporter as a sidecar container - for one thing, it allows us to swap/upgrade impls if need be without affecting the core infra router code. Also getting fixes/features into exporters as required (which I'm sure there will be) without vendoring & carrying in infra router is going to be simpler.

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Sep 22, 2015

@ramr @smarterclayton Any news on this? I'd like to get this in, but with the current implementation of oadm router cmd this is pretty tricky.

I could make the addition of the prometheus exporter sidecar optional via a flag (defaulted to true?). Also could only add the sidecar if there's a compatible sidecar for the router type so only for haproxy & nginx to begin with.

Thoughts?

@ramr

This comment has been minimized.

Contributor

ramr commented Sep 24, 2015

@jimmidyson - I can only look at it sometime towards the end of next week. But that plan does that sound good - doing it only for the compatible router and a flag to add it in (the default am on the fence about - but true should be ok I think).

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Nov 24, 2015

We have metrics for the router now.

@ncdc Any thoughts on registry metrics?

@ncdc

This comment has been minimized.

Contributor

ncdc commented Nov 30, 2015

@pweil- @miminar for registry metrics ideas

@ramr

This comment has been minimized.

Contributor

ramr commented Dec 17, 2015

@danmcp the router bits are complete - I guess the registry bits are pending, so can you please assign to @pweil- or @miminar Thx

@pweil- pweil- assigned miminar and unassigned ramr Dec 17, 2015

@miminar

This comment has been minimized.

Contributor

miminar commented Dec 18, 2015

There is already upstream request for providing prometheus metrics. Which was turned down. The upstream prefers to stay metrics backend agnostic and suggests to process registry log which contains all the information needed.

Registry's logging framework supports a wide variety of logging sinks. We could use another sidetrack container inside the registry pod to process the log and provide the metrics.

Also there are webhooks that could be used to gather metrics. I would have to make a deeper analysis because I'm not sure if it provides all the data needed.

Other ideas?

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Dec 18, 2015

Using logs sounds fine - might be worth looking at https://github.com/google/mtail?

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Feb 4, 2016

Can we easily convert expvar to prometheus? If not, let's just expose a simple prometheus endpoint and collect the metrics we do have.

@jimmidyson

This comment has been minimized.

Contributor

jimmidyson commented Feb 4, 2016

Prometheus does have an expvar collector (https://godoc.org/github.com/prometheus/client_golang/prometheus#ExpvarCollector):

ExpvarCollector collects metrics from the expvar interface. It provides a quick way to expose numeric values that are already exported via expvar as Prometheus metrics. Note that the data models of expvar and Prometheus are fundamentally different, and that the ExpvarCollector is inherently slow. Thus, the ExpvarCollector is probably great for experiments and prototying, but you should seriously consider a more direct implementation of Prometheus metrics for monitoring production systems.

I guess we'd need to quantify what slow means & what the impact is. It's a shame we can't do more direct instrumentation of course.

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Feb 4, 2016

We could simply have the prometheus expvar collector shim inside of the
registry code.

On Thu, Feb 4, 2016 at 3:17 AM, Jimmi Dyson notifications@github.com
wrote:

Prometheus does have an expvar collector (
https://godoc.org/github.com/prometheus/client_golang/prometheus#ExpvarCollector
):

ExpvarCollector collects metrics from the expvar interface. It provides a
quick way to expose numeric values that are already exported via expvar as
Prometheus metrics. Note that the data models of expvar and Prometheus are
fundamentally different, and that the ExpvarCollector is inherently slow.
Thus, the ExpvarCollector is probably great for experiments and prototying,
but you should seriously consider a more direct implementation of
Prometheus metrics for monitoring production systems.

I guess we'd need to quantify what slow means & what the impact is. It's a
shame we can't do more direct instrumentation of course.


Reply to this email directly or view it on GitHub
#3916 (comment).

@smarterclayton

This comment has been minimized.

Member

smarterclayton commented Apr 12, 2017

Router now has metrics as of v3.6.0-alpha.1. Registry is in the process of getting some.

@pweil-

This comment has been minimized.

Member

pweil- commented Jun 26, 2017

registry metrics implemented in #12711

@pweil- pweil- closed this Jun 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment