Skip to content

Commit

Permalink
Create a public registry interface and separate out HTTP exposition
Browse files Browse the repository at this point in the history
General context and approch
===========================

This is the first part of the long awaited wider refurbishment of
`client_golang/prometheus/...`. After a lot of struggling, I decided
to not go for one breaking big-bang, but cut things into smaller steps
after all, mostly to keep the changes manageable and easy to
review. I'm aiming for having the invasive breaking changes
concentrated in as few steps as possible (ideally one). Some steps
will not be breaking at all, but typically there will be breaking
changes that only affect quite special cases so that 95+% of users
will not be affected. This first step is an example for that, see
details below.

What's happening in this commit?
================================

This step is about finally creating an exported registry
interface. This could not be done by simply export the existing
internal implementation because the interface would be _way_ too
fat. This commit introduces a very lean `Registry` interface. Most of
the existing functionality that is not part of that interface is
provided by helper functions, not by methods
(e.g. `MustRegisterWith`). The functions that act on the default
registry are retained (with very few exceptions) so that most use
cases won't see a change.

The default registry is kept in the public variable
`DefaultRegistry`. This follows the example of the http package in the
standard library (cf. `http.DefaultServeMux`, `http.DefaultClient`)
with the same implications. (This pattern is somewhat disputed within
the Go community but I chose to go with the devil you know instead of
creating something more complex or even disallowing any changes to the
default registry. The current approach gives everybody the freedom to
not touch DefaultRegistry or to do everything with a custom registry
to play save.)

Another important part in making the registry lean is the extraction
of the HTTP exposition, which also allows for customization of the
HTTP exposition. Note that the separation of metric collection and
exposition has the side effect that managing the MetricFamily and
Metric protobuf objects in a free-list or pool isn't really feasible
anymore. By now (with better GC in more recent Go versions), the
returns were anyway dimisishing. To be effective at all, scrapes had
to happen more often than GC cycles, and even then most elements of
the protobufs (everything excetp the MetricFamily and Metric structs
themselves) would still cause allocation churn. In a future breaking
change, the signature of the Write method in the Metric interface will
be adjusted accordingly. In this commit, avoiding breakage is more
important.

The following issues are fixed by this commit (some solved "on the
fly" now that I was touching the code anyway and it would have been
stupid to port the bugs):

#46
#100
#170
#205

Documentation including examples have been amended as required.

What future changes does this commit enable?
============================================

The following items are not yet implemented, but this commit opens the
possibility of implementing these independently.

- The separation of the HTTP exposition allows the implementation of
  other exposition methods based on the Registry interface, as known
  from other Prometheus client libraries, e.g. sending the metrics to
  Graphite.
  Cf. #197

- The public `Registry` interface allows the implementation of
  convenience tools for testing metrics collection. Those tools can
  inspect the collected MetricFamily protobufs and compare them to
  expectation. Also, tests can use their own testing instance of a
  registry.
  Cf. #58

Notable non-goals of this commit
================================

Non-goals that will be tackled later
------------------------------------

The following two issues are quite closely connected to the changes in
this commit but the line has been drawn deliberately to address them
in later steps of the refurbishment:

- `InstrumentHandler` has many known problems. The plan is to create a
  saner way to conveniently intrument HTTP handlers and remove the old
  `InstrumentHandler` altogether. To keep breakage low for now, even
  the default handler to expose metrics is still using the old
  `InstrumentHandler`. This leads to weird naming inconsistencies but
  I have deemed it better to not break the world right now but do it
  in the change that provides better ways of instrumenting HTTP
  handlers.
  Cf. #200

- There is work underway to make the whole handling of metric
  descriptors (`Desc`) more intuitive and transparent for the user
  (including an ability for less strict checking,
  cf. #47). That's
  quite invasive from the perspective of the internal code, namely the
  registry. I deliberately kept those changes out of this commit.

- While this commit adds new external dependency, the effort to vendor
  anything within the library that is not visible in any exported
  types will have to be done later.

Non-goals that _might_ be tackled later
---------------------------------------

There is a strong and understandable urge to divide the `prometheus`
package into a number of sub-packages (like `registry`, `collectors`,
`http`, `metrics`, …). However, to not run into a multitude of
circular import chains, this would need to break every single existing
usage of the library. (As just one example, if the ubiquitious
`prometheus.MustRegister` (with more than 2,000 uses on GitHub alone)
is kept in the `prometheus` package, but the other registry concerns
go into a new `registry` package, then the `prometheus` package would
import the `registry` package (to call the actual register method),
while at the same time the `registry` package needs to import the
`prometheus` package to access `Collector`, `Metric`, `Desc` and
more. If we moved `MustRegister` into the `registry` package,
thousands of code lines would have to be fixed (which would be easy if
the world was a mono repo, but it is not). If we moved everything else
the proposed registry package needs into packages of their own, we
would break thousands of other code lines.)

The main problem is really the top-level functions like
`MustRegister`, `Handler`, `Push`, …, which effectively pull
everything into one package. Those functions are however very
convenient for the easy and very frequent use-cases.

This problem has to be revisited later.

For now, I'm trying to keep the amount of exported names in the
package as low as possible (e.g. I unexported expvarCollector in this
commit because the NewExpvarCollector constructor is enough to export,
and it is now consistent with other collectors, like the goCollector).

Non-goals that won't be tackled anytime soon
--------------------------------------------

Something that I have played with a lot is "streaming collection",
i.e. allow an implementation of the `Registry` interface that collects
metrics incrementally and serves them while doing so. As it has turned
out, this has many many issues and makes the `Registry` interface very
clunky. Eventually, I made the call that it is unlikely we will really
implement streaming collection; and making the interface more clunky
for something that might not even happen is really a big no-no. Note
that the `Registry` interface only creates the in-memory
representation of the metric family protobufs in one go. The
serializaton onto the wire can still be handled in a streaming fashion
(which hasn't been done so far, without causing any trouble, but might
be done in the future without breaking any interfaces).

What are the breaking changes?
==============================

- Signature of functions pushing to Pushgateway has changed to allow
  arbitrary grouping (which was planned for a long time anyway, and
  now that I had to work on the Push code anyway for the registry
  refurbishment, I finally did it,
  cf. #100).

- The registry is doing more consistency checks by default now. Past
  creators of inconsistent metrics could have masked the problem by
  not setting `EnableCollectChecks`. Those inconsistencies will now be
  detected. (But note that a "best effort" metrics collection is now
  possible with `HandlerOpts.ErrorHandling = ContinueOnError`.)

- `EnableCollectChecks` is gone. The registry is now performing some
  of those checks anyway (see previous item), and a registry with all
  of those checks can now be created with `NewPedanticRegistry` (it is
  only ever needed to test custom Collectors).

- `PanicOnCollectError` is gone. This behavior can now be configured
  when creating a custom HTTP handler.

- `SetMetricFamilyInjectionHook` is gone. A registry with a
  MetricFamily injection hook has to be created now with
  `NewRegistryWithInjectionHook`.
  • Loading branch information
beorn7 committed Jul 31, 2016
1 parent 28be158 commit 78b3a87
Show file tree
Hide file tree
Showing 19 changed files with 781 additions and 742 deletions.
5 changes: 0 additions & 5 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ SoundCloud Ltd. (http://soundcloud.com/).

The following components are included in this product:

goautoneg
http://bitbucket.org/ww/goautoneg
Copyright 2011, Open Knowledge Foundation Ltd.
See README.txt for license details.

perks - a fork of https://github.com/bmizerany/perks
https://github.com/beorn7/perks
Copyright 2013-2015 Blake Mizerany, Björn Rabenstein
Expand Down
54 changes: 1 addition & 53 deletions prometheus/README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1 @@
# Overview
This is the [Prometheus](http://www.prometheus.io) telemetric
instrumentation client [Go](http://golang.org) client library. It
enable authors to define process-space metrics for their servers and
expose them through a web service interface for extraction,
aggregation, and a whole slew of other post processing techniques.

# Installing
$ go get github.com/prometheus/client_golang/prometheus

# Example
```go
package main

import (
"net/http"

"github.com/prometheus/client_golang/prometheus"
)

var (
indexed = prometheus.NewCounter(prometheus.CounterOpts{
Namespace: "my_company",
Subsystem: "indexer",
Name: "documents_indexed",
Help: "The number of documents indexed.",
})
size = prometheus.NewGauge(prometheus.GaugeOpts{
Namespace: "my_company",
Subsystem: "storage",
Name: "documents_total_size_bytes",
Help: "The total size of all documents in the storage.",
})
)

func main() {
http.Handle("/metrics", prometheus.Handler())

indexed.Inc()
size.Set(5)

http.ListenAndServe(":8080", nil)
}

func init() {
prometheus.MustRegister(indexed)
prometheus.MustRegister(size)
}
```

# Documentation

[![GoDoc](https://godoc.org/github.com/prometheus/client_golang?status.png)](https://godoc.org/github.com/prometheus/client_golang)
See [![go-doc](https://godoc.org/github.com/prometheus/client_golang/prometheus?status.svg)](https://godoc.org/github.com/prometheus/client_golang/prometheus).
20 changes: 10 additions & 10 deletions prometheus/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,16 @@ type Collector interface {
// executing this method, it must send an invalid descriptor (created
// with NewInvalidDesc) to signal the error to the registry.
Describe(chan<- *Desc)
// Collect is called by Prometheus when collecting metrics. The
// implementation sends each collected metric via the provided channel
// and returns once the last metric has been sent. The descriptor of
// each sent metric is one of those returned by Describe. Returned
// metrics that share the same descriptor must differ in their variable
// label values. This method may be called concurrently and must
// therefore be implemented in a concurrency safe way. Blocking occurs
// at the expense of total performance of rendering all registered
// metrics. Ideally, Collector implementations support concurrent
// readers.
// Collect is called by the Prometheus registry when collecting
// metrics. The implementation sends each collected metric via the
// provided channel and returns once the last metric has been sent. The
// descriptor of each sent metric is one of those returned by
// Describe. Returned metrics that share the same descriptor must differ
// in their variable label values. This method may be called
// concurrently and must therefore be implemented in a concurrency safe
// way. Blocking occurs at the expense of total performance of rendering
// all registered metrics. Ideally, Collector implementations support
// concurrent readers.
Collect(chan<- Metric)
}

Expand Down
13 changes: 13 additions & 0 deletions prometheus/desc.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
// Copyright 2016 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package prometheus

import (
Expand Down
43 changes: 30 additions & 13 deletions prometheus/doc.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.

// Package prometheus provides embeddable metric primitives for servers and
// standardized exposition of telemetry through a web services interface.
// Package prometheus provides metrics primitives to instrument code for
// monitoring. It also offers a registry for metrics and ways to expose
// registered metrics via an HTTP endpoint or push them to a Pushgateway.
//
// All exported functions and methods are safe to be used concurrently unless
// specified otherwise.
//
// To expose metrics registered with the Prometheus registry, an HTTP server
// needs to know about the Prometheus handler. The usual endpoint is "/metrics".
// A Basic Example
//
// http.Handle("/metrics", prometheus.Handler())
//
// As a starting point a very basic usage example:
// As a starting point, a very basic usage example:
//
// package main
//
Expand All @@ -44,6 +42,7 @@
// )
//
// func init() {
// // Metrics have to be registered to be exposed:
// prometheus.MustRegister(cpuTemp)
// prometheus.MustRegister(hdFailures)
// }
Expand All @@ -52,6 +51,8 @@
// cpuTemp.Set(65.3)
// hdFailures.Inc()
//
// // The Handler function provides a default handler to expose metrics
// // via an HTTP server. "/metrics" is the usual endpoint for that.
// http.Handle("/metrics", prometheus.Handler())
// http.ListenAndServe(":8080", nil)
// }
Expand All @@ -61,6 +62,8 @@
// It also exports some stats about the HTTP usage of the /metrics
// endpoint. (See the Handler function for more detail.)
//
// TODO: Rework from here on. Use titles
//
// Two more advanced metric types are the Summary and Histogram. A more
// thorough description of metric types can be found in the prometheus docs:
// https://prometheus.io/docs/concepts/metric_types/
Expand All @@ -74,8 +77,8 @@
// Those are all the parts needed for basic usage. Detailed documentation and
// examples are provided below.
//
// Everything else this package offers is essentially for "power users" only. A
// few pointers to "power user features":
// Everything else this package and its sub-packages offer is essentially for
// "power users" only. A few pointers to "power user features":
//
// All the various ...Opts structs have a ConstLabels field for labels that
// never change their value (which is only useful under special circumstances,
Expand All @@ -84,9 +87,6 @@
// The Untyped metric behaves like a Gauge, but signals the Prometheus server
// not to assume anything about its type.
//
// Functions to fine-tune how the metric registry works: EnableCollectChecks,
// PanicOnCollectError, Register, Unregister, SetMetricFamilyInjectionHook.
//
// For custom metric collection, there are two entry points: Custom Metric
// implementations and custom Collector implementations. A Metric is the
// fundamental unit in the Prometheus data model: a sample at a point in time
Expand All @@ -105,7 +105,24 @@
// collection time, MetricVec to bundle custom Metrics into a metric vector
// Collector, SelfCollector to make a custom Metric collect itself.
//
// A good example for a custom Collector is the ExpVarCollector included in this
// A good example for a custom Collector is the expvarCollector included in this
// package, which exports variables exported via the "expvar" package as
// Prometheus metrics.
//
// The functions Register, Unregister, MustRegister, RegisterOrGet, and
// MustRegisterOrGet all act on the default registry. They wrap other calls as
// described in their doc comment. For advanced use cases, you can work with
// custom registries (created by NewRegistry and similar) and call the wrapped
// functions directly.
//
// The functions Handler and UninstrumentedHandler create an HTTP handler to
// serve metrics from the default registry in the default way, which covers most
// of the use cases. With HandlerFor, you can create a custom HTTP handler for
// custom registries.
//
// The functions Push and PushAdd push the metrics from the default registry via
// HTTP to a Pushgateway. With PushFrom and PushAddFrom, you can push the
// metrics from custom registries. However, often you just want to push a
// handfull of Collectors only. For that case, there are the convenience
// functions PushCollectors and PushAddCollectors.
package prometheus
94 changes: 41 additions & 53 deletions prometheus/example_clustermanager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,7 @@

package prometheus_test

import (
"sync"

"github.com/prometheus/client_golang/prometheus"
)
import "github.com/prometheus/client_golang/prometheus"

// ClusterManager is an example for a system that might have been built without
// Prometheus in mind. It models a central manager of jobs running in a
Expand All @@ -29,10 +25,9 @@ import (
// make use of ConstLabels to be able to register each ClusterManager instance
// with Prometheus.
type ClusterManager struct {
Zone string
OOMCount *prometheus.CounterVec
RAMUsage *prometheus.GaugeVec
mtx sync.Mutex // Protects OOMCount and RAMUsage.
Zone string
OOMCountDesc *prometheus.Desc
RAMUsageDesc *prometheus.Desc
// ... many more fields
}

Expand All @@ -55,76 +50,69 @@ func (c *ClusterManager) ReallyExpensiveAssessmentOfTheSystemState() (
return
}

// Describe faces the interesting challenge that the two metric vectors that are
// used in this example are already Collectors themselves. However, thanks to
// the use of channels, it is really easy to "chain" Collectors. Here we simply
// call the Describe methods of the two metric vectors.
// Describe simply sends the two Descs in the struct to the channel.
func (c *ClusterManager) Describe(ch chan<- *prometheus.Desc) {
c.OOMCount.Describe(ch)
c.RAMUsage.Describe(ch)
ch <- c.OOMCountDesc
ch <- c.RAMUsageDesc
}

// Collect first triggers the ReallyExpensiveAssessmentOfTheSystemState. Then it
// sets the retrieved values in the two metric vectors and then sends all their
// metrics to the channel (again using a chaining technique as in the Describe
// method). Since Collect could be called multiple times concurrently, that part
// is protected by a mutex.
// creates constant metrics for each host on the fly based on the returned data.
//
// Note that Collect could be called concurrently, so we depend on
// ReallyExpensiveAssessmentOfTheSystemState to be concurrency-safe.
func (c *ClusterManager) Collect(ch chan<- prometheus.Metric) {
oomCountByHost, ramUsageByHost := c.ReallyExpensiveAssessmentOfTheSystemState()
c.mtx.Lock()
defer c.mtx.Unlock()
for host, oomCount := range oomCountByHost {
c.OOMCount.WithLabelValues(host).Set(float64(oomCount))
ch <- prometheus.MustNewConstMetric(
c.OOMCountDesc,
prometheus.CounterValue,
float64(oomCount),
host,
)
}
for host, ramUsage := range ramUsageByHost {
c.RAMUsage.WithLabelValues(host).Set(ramUsage)
ch <- prometheus.MustNewConstMetric(
c.RAMUsageDesc,
prometheus.GaugeValue,
ramUsage,
host,
)
}
c.OOMCount.Collect(ch)
c.RAMUsage.Collect(ch)
// All metrics in OOMCount and RAMUsage are sent to the channel now. We
// can safely reset the two metric vectors now, so that we can start
// fresh in the next Collect cycle. (Imagine a host disappears from the
// cluster. If we did not reset here, its Metric would stay in the
// metric vectors forever.)
c.OOMCount.Reset()
c.RAMUsage.Reset()
}

// NewClusterManager creates the two metric vectors OOMCount and RAMUsage. Note
// NewClusterManager creates the two Descs OOMCountDesc and RAMUsageDesc. Note
// that the zone is set as a ConstLabel. (It's different in each instance of the
// ClusterManager, but constant over the lifetime of an instance.) The reported
// values are partitioned by host, which is therefore a variable label.
// ClusterManager, but constant over the lifetime of an instance.) Then there is
// a variable label "host", since we want to partition the collected metrics by
// host. Since all Descs created in this way are consistent across instances,
// with a guaranteed distinction by the "zone" label, we can register different
// ClusterManager with the same registry.
func NewClusterManager(zone string) *ClusterManager {
return &ClusterManager{
Zone: zone,
OOMCount: prometheus.NewCounterVec(
prometheus.CounterOpts{
Subsystem: "clustermanager",
Name: "oom_count",
Help: "number of OOM crashes",
ConstLabels: prometheus.Labels{"zone": zone},
},
OOMCountDesc: prometheus.NewDesc(
"clustermanager_oom_count",
"Number of OOM crashes.",
[]string{"host"},
prometheus.Labels{"zone": zone},
),
RAMUsage: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Subsystem: "clustermanager",
Name: "ram_usage_bytes",
Help: "RAM usage as reported to the cluster manager",
ConstLabels: prometheus.Labels{"zone": zone},
},
RAMUsageDesc: prometheus.NewDesc(
"clustermanager_ram_usage_bytes",
"RAM usage as reported to the cluster manager.",
[]string{"host"},
prometheus.Labels{"zone": zone},
),
}
}

func ExampleCollector_clustermanager() {
workerDB := NewClusterManager("db")
workerCA := NewClusterManager("ca")
prometheus.MustRegister(workerDB)
prometheus.MustRegister(workerCA)

// Since we are dealing with custom Collector implementations, it might
// be a good idea to enable the collect checks in the registry.
prometheus.EnableCollectChecks(true)
// be a good idea to try it out with a pedantic registry.
reg := prometheus.NewPedanticRegistry()
prometheus.MustRegisterWith(reg, workerDB)
prometheus.MustRegisterWith(reg, workerCA)
}
Loading

0 comments on commit 78b3a87

Please sign in to comment.