Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support additional '/metrics' endpoints #8242

Merged
merged 3 commits into from
Jul 14, 2017
Merged

*: support additional '/metrics' endpoints #8242

merged 3 commits into from
Jul 14, 2017

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Jul 11, 2017

Fix #8060.

https://github.com/prometheus/prometheus/wiki/Default-port-allocations

9378 - etcd gRPC Proxy Exporter
9379 - etcd Exporter

@gyuho gyuho added this to the v3.3.0 milestone Jul 11, 2017
@gyuho
Copy link
Contributor Author

gyuho commented Jul 11, 2017

Note: /metrics uses the same client TLS, and v2 proxy still uses the same port

TODO: update migration guide to highlight this breaking change, add new flags to https://github.com/coreos/container-linux-config-transpiler.

@gyuho gyuho changed the title *: expose '/metrics' in separate ports (9378 for etcd, 9379 for gRPC proxy) *: expose '/metrics' in separate ports (9379 for etcd, 9378 for gRPC proxy) Jul 11, 2017
@gyuho gyuho added the WIP label Jul 11, 2017
@gyuho gyuho force-pushed the ppp branch 6 times, most recently from 0dc9c80 to aa9cfe8 Compare July 11, 2017 19:55
@gyuho gyuho removed the WIP label Jul 11, 2017
etcdmain/help.go Outdated
Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
--expose-metrics 'false'
Additionally expose '/metrics' endpoints via '/metrics-urls'.
--metrics-urls 'http://localhost:9379'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--listen-metrics-urls and default to "" so expose-metrics isn't needed? blank list => no listen

embed/etcd.go Outdated
// Start a client server goroutine for each listen address
var h http.Handler
if e.Config().EnableV2 {
h = v2http.NewClientHandler(e.Server, e.Server.Cfg.ReqTimeout())
h = v2http.NewClientHandler(e.Server, metricsMux, e.Server.Cfg.ReqTimeout())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewClientHandler and HandleBasic shouldn't need to change. I think all that's needed is:

var metricsMux *http.ServeMux
if len(e.cfg.MetricsUrls) > 0 {
    metricsMux = http.NewServeMux()
    metricsMux.Handle("metrics", prometheus.Handler()) // maybe etcdhttp.MetricsPath or get the path from the user-provided URL
}

@@ -79,7 +82,8 @@ func newGRPCProxyStartCommand() *cobra.Command {
}

cmd.Flags().StringVar(&grpcProxyListenAddr, "listen-addr", "127.0.0.1:23790", "listen address")
cmd.Flags().StringVar(&grpcProxyDNSCluster, "discovery-srv", "", "DNS domain used to bootstrap initial cluster")
cmd.Flags().BoolVar(&grpcProxyExposeMetrics, "expose-metrics", false, "Additionally expose '/metrics' endpoints via '/metrics-url'.")
cmd.Flags().StringVar(&grpcProxyMetricsListenAddr, "metrics-url", "http://127.0.0.1:9378", "'/metrics' listen address (secured via client TLS)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics-addr to be consistent with listen-addr and default to nothing?

@@ -5,7 +5,8 @@ Each etcd server exports metrics under the `/metrics` path on its client port.
The metrics can be fetched with `curl`:

```sh
$ curl -L http://localhost:2379/metrics | grep -v debugging # ignore unstable debugging metrics
# OR http://localhost:9379/metrics when etcd is run with '--expose-metrics' flag
$ curl -L http://localhost:9379/metrics | grep -v debugging # ignore unstable debugging metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should stick to the default 2379 here since if the user is changing the ports, they probably don't need this guide any more. Instead of hiding an explanation in a comment, the doc can directly say that optionally metrics can be exported to other listen addresses with --listen-metrics-urls where it's talking about the client port

Procfile Outdated
etcd1: bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof
etcd2: bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof
etcd3: bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof
etcd1: bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --expose-metrics --metrics-urls http://127.0.0.1:9379
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change? would like to keep the procfile from being too cluttered

@gyuho gyuho added the WIP label Jul 12, 2017
@gyuho gyuho changed the title *: expose '/metrics' in separate ports (9379 for etcd, 9378 for gRPC proxy) *: optionally duplicate '/metrics' in separate ports Jul 12, 2017
@gyuho gyuho force-pushed the ppp branch 2 times, most recently from cd82e05 to 893a76a Compare July 12, 2017 17:14
@gyuho gyuho removed the WIP label Jul 12, 2017
embed/etcd.go Outdated
@@ -400,6 +409,17 @@ func (e *Etcd) serve() (err error) {
e.errHandler(s.serve(e.Server, &e.cfg.ClientTLSInfo, h, e.errHandler))
}(sctx)
}
for _, murl := range e.cfg.ListenMetricsUrls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will Etcd.Close() shut this down?

@@ -131,6 +133,7 @@ func newConfig() *config {
fs.StringVar(&cfg.WalDir, "wal-dir", cfg.WalDir, "Path to the dedicated wal directory.")
fs.Var(flags.NewURLsValue(embed.DefaultListenPeerURLs), "listen-peer-urls", "List of URLs to listen on for peer traffic.")
fs.Var(flags.NewURLsValue(embed.DefaultListenClientURLs), "listen-client-urls", "List of URLs to listen on for client traffic.")
fs.StringVar(&cfg.ListenMetricsUrlsJSON, "listen-metrics-urls", "", "List of URLs to listen on for metrics (secured via client TLS if any).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this smart about http/https schemes instead of defaulting to TLS? there may be cases where it's OK to expose the metrics as plaintext but access to the kv store is locked down with tls authentication

@@ -79,7 +81,7 @@ func newGRPCProxyStartCommand() *cobra.Command {
}

cmd.Flags().StringVar(&grpcProxyListenAddr, "listen-addr", "127.0.0.1:23790", "listen address")
cmd.Flags().StringVar(&grpcProxyDNSCluster, "discovery-srv", "", "DNS domain used to bootstrap initial cluster")
cmd.Flags().StringVar(&grpcProxyMetricsListenAddr, "metrics-addr", "", "Additional '/metrics' listen address (secured via client TLS)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listen for /metrics requests on an additional interface?

@gyuho gyuho added the WIP label Jul 12, 2017
@gyuho gyuho force-pushed the ppp branch 3 times, most recently from 4a4d627 to 21588e0 Compare July 12, 2017 20:46
@gyuho gyuho changed the title *: optionally duplicate '/metrics' in separate ports *: support additional '/metrics' endpoints with non-TLS Jul 12, 2017
@gyuho gyuho removed the WIP label Jul 12, 2017
Copy link
Contributor

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few last nits

@@ -131,6 +133,7 @@ func newConfig() *config {
fs.StringVar(&cfg.WalDir, "wal-dir", cfg.WalDir, "Path to the dedicated wal directory.")
fs.Var(flags.NewURLsValue(embed.DefaultListenPeerURLs), "listen-peer-urls", "List of URLs to listen on for peer traffic.")
fs.Var(flags.NewURLsValue(embed.DefaultListenClientURLs), "listen-client-urls", "List of URLs to listen on for client traffic.")
fs.StringVar(&cfg.ListenMetricsUrlsJSON, "listen-metrics-urls", "", "List of URLs to listen on for metrics (non-secured ignoring client TLS).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just List of URLs to listen on for metrics? The parenthetical will probably cause confusion

@@ -79,7 +81,7 @@ func newGRPCProxyStartCommand() *cobra.Command {
}

cmd.Flags().StringVar(&grpcProxyListenAddr, "listen-addr", "127.0.0.1:23790", "listen address")
cmd.Flags().StringVar(&grpcProxyDNSCluster, "discovery-srv", "", "DNS domain used to bootstrap initial cluster")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this?

etcdmain/help.go Outdated
Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
--listen-metrics-urls ''
List of URLs to listen on for metrics (non-secured ignoring client TLS).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove parenthetical bit here too?

embed/etcd.go Outdated
@@ -205,6 +208,11 @@ func (e *Etcd) Close() {
e.Clients[i].Close()
}
}
for i := range e.metricsListeners {
if e.metricsListeners[i] != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the nil check? metricsListeners is only appended with non-nil listeners

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@gyuho gyuho merged commit d283348 into etcd-io:master Jul 14, 2017
@gyuho gyuho deleted the ppp branch July 14, 2017 22:06
@etcd-io etcd-io deleted a comment from codecov-io Jul 17, 2017
@owenthereal
Copy link

The changes are not available in the latest stable release v3.2.9. When will a newer version be cut with --listen-metrics-urls?

@gyuho
Copy link
Contributor Author

gyuho commented Nov 6, 2017

This will be in v3.3, which is planned for late this year, or early next year.

@Quentin-M
Copy link
Contributor

Quentin-M commented Jan 3, 2018

I think it may be a bit unfortunate to couple the ClientTLSInfo with the new /metrics endpoint.

  1. Users may want TLS on the /metrics endpoint.
  2. Users may run the etcd clients (i.e. kube-apiserver) on the same nodes as Prometheus
  3. Users may protect the the etcd client endpoint with TLS Client Cert auth.

This means that both client port and metrics port must be reachable from the the same nodes. If Prometheus is given a TLS Client Certs to access the /metrics endpoint, it will be also be able to reach the client endpoint with the same TLS Client Cert, and with full privileges.

It is also generally a burden and a security risk to carry TLS Client Certs to the monitoring system, as it often run in a separate environment. For instance, in the case of Prometheus on Kubernetes:

  • While the TLS Client Certs are stored on the master nodes, with host network and host mounts forbidden, scheduler constraints preventing anyone but the administrator to schedule there, and good chmod - these certificates are pretty much unreachable from tenants' pods (unless there is container escape).
  • To get Prometheus to scrape etcd, the TLS Client Certs will have to be stored in a Secret, now potentially reachable by any tenant of Kubernetes, who manage to read that secret (e.g. bad RBAC permission, or hack of a pod that has permissions for it).

I am of the opinion that if etcd listens over TLS, and the metrics URLs are https://, the /metrics endpoint could also listen over TLS, but without the TLS Client Cert auth. This could be a flag though if opinions are strong that this use case of auth'd /metrics should be auth'd.

Ideally tho, and as mentioned in kubernetes/kubernetes#53405, we could just have another TLS configuration for TLS Client Cert Auth with different certs, to the /metrics endpoint. But I am not sure if that's something you would like to support.

/cc @fabxc @xiang90

@brancz
Copy link
Contributor

brancz commented Jan 3, 2018

@Quentin-M we've already talked to @xiang90 about this and he mentioned it is possible to have a separate set of client certs for Prometheus that only has permission to access the /metrics endpoint of etcd, using etcd's authorization. I imagine that's what you are looking for, as then you are serving the metrics via https, and have them authorized, but not full access to etcd.

Quentin-M added a commit to Quentin-M/etcd-cloud-operator that referenced this pull request Jan 7, 2018
Quentin-M added a commit to Quentin-M/etcd-cloud-operator that referenced this pull request Feb 1, 2018
CyberDem0n pushed a commit to zalando-stups/stups-etcd-cluster that referenced this pull request Mar 7, 2018
etcd v3.3 introduced a new flag to allow serving `/metrics` and `/health` under a different port than e.g. `/v2/keys`. This allows us to protect etcd's data via firewall rules but still let monitoring tools to access the monitoring information.

See feature request in etcd repo: etcd-io/etcd#8060.
The implementation landed in v3.3: etcd-io/etcd#8242

This PR instructs etcd to serve metrics and health under the additonal port `2381` *unconditionally* **when the used etcd binary is** `>=v3.3.x`. However, if not explicitely set in the `senza.yaml` this port won't be mapped to the outside and therefore isn't accessible. It doesn't expose more information than anything under `2379` already does.
@wenjiaswe
Copy link
Contributor

@gyuho Do you think it would be OK to cherrypick this PR back to v3.2 and v3.1? I took a quick look at the fix and it seems fine to backport to me. Would you please let me know if otherwise?

If OK, I can do the cherry pick work. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants