Improve apiserver storage size metric #118812

serathius · 2023-06-22T09:58:26Z

/kind feature

Replace `apiserver_storage_db_total_size_in_bytes` with `apiserver_storage_size_bytes` metric

/cc @mborsz @logicalhan @dgrisonnet

Most names are just temporary stand-ins, for now I wanted to confirm that making the metric on demand is even possible.

Goal of this PR is make changes to apiserver_storage_db_total_size_in_bytes necessary to graduate it in the future:

Change name to make it compliant with prometheus guidelines.
Calculate it on demand instead of periodic to comply with prometheus standards.

Replace "endpoint" with "instance" label to make it semantically consistent with storage factory

kubernetes/pkg/registry/core/rest/storage_core.go

Lines 421 to 423 in 3e28404

    
           for ix, cfg := range s.storageFactory.Configs() { 
        
           	serversToValidate[fmt.Sprintf("etcd-%d", ix)] = &componentstatus.EtcdServer{Config: cfg} 
        
           }

serathius · 2023-06-22T10:01:43Z

/cc @dgrisonnet

serathius · 2023-06-22T10:09:58Z

staging/src/k8s.io/apiserver/pkg/server/options/etcd.go

@@ -238,6 +239,21 @@ func (s *EtcdOptions) ApplyWithStorageFactoryTo(factory serverstorage.StorageFac
 		return err
 	}

+	metrics.SetStorageMonitorGetter(func() (monitors []metrics.Monitor, err error) {


Need this monster to avoid dependency cycle

staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go

serathius · 2023-06-23T20:30:02Z

/retest

serathius · 2023-06-26T09:57:35Z

/assign @jpbetz

serathius · 2023-07-04T08:35:17Z

ping @deads2k @jpbetz @logicalhan

serathius · 2023-07-04T08:35:34Z

ping @dgrisonnet

dgrisonnet · 2023-07-05T06:36:58Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go

-	dbTotalSize = compbasemetrics.NewGaugeVec(
-		&compbasemetrics.GaugeOpts{
-			Subsystem:      "apiserver",
-			Name:           "storage_db_total_size_in_bytes",
-			Help:           "Total size of the storage database file physically allocated in bytes.",
-			StabilityLevel: compbasemetrics.ALPHA,
-		},
-		[]string{"endpoint"},
-	)


Removing this metric will be breaking change for some users, it would be safer if we leave a 1 release deprecation period before removing it completely.

https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3498-extending-stability#alpha-metrics

Although this is what was decided when api stability was designed, we've been careful when removing old metrics without giving a grace period to the users.

IMO, we will need to continue being careful when removing alpha metrics and apply deprecation periods until we start moving metrics to beta.

I leave it up to @logicalhan to make a decision here.

True, but I would argue that old metrics are treated more like Beta and not Alpha. SIG instrumentation should make it explicit and just move them to Beta to avoid making exceptions. For now I treat this as an Alpha metric, thus removing.

I would deprecate it, people likely depend on this one.

dgrisonnet · 2023-07-05T06:42:56Z

staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go

+	}
+
+	for i, m := range monitors {
+		server := fmt.Sprintf("etcd-%d", i)


I don't think we will be able to correlate this label with any other one from the other metrics since AFAIK this is the only place we have something like that. So wouldn't having an endpoint like before be easier to use?

No, it's third place to do that. It's also done by apiserver health probes and apiserver component status. As for usefulness of endpoints, it's worse as apiserver will expose all etcd members it connects to. For etcd HA clusters thats a 3 endpoints with the same value (+/- time shift). For each API override that's additional 3 endpoints.

The semantic meaning of server is:

etcd-0 the default etcd

etcd-1 the first API Resource override, based on order of flags provided to apiserver.

etcd-2 the second API Resource override, and so forth.

Of cause it should be documented. Maybe I can include it in metric description.

Makes sense, thank you for the clarification 👍

serathius · 2023-07-11T07:30:34Z

ping @deads2k @jpbetz @logicalhan

Change name to make it compliant with prometheus guidelines. Calculate it on demand instead of periodic to comply with prometheus standards. Replace "endpoint" with "server" label to make it semantically consistent with storage factory

serathius · 2023-07-12T13:34:16Z

/retest

logicalhan

/lgtm
/approve

k8s-ci-robot · 2023-07-12T15:12:56Z

LGTM label has been added.

Git tree hash: 7676176268460083184f1d9be59c3ab83caff518

jpbetz · 2023-07-12T15:14:54Z

/approve

k8s-ci-robot · 2023-07-12T15:15:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, logicalhan, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/OWNERS~~ [jpbetz]
~~test/instrumentation/OWNERS~~ [logicalhan,serathius]
~~test/integration/metrics/OWNERS~~ [logicalhan,serathius]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from logicalhan and mborsz June 22, 2023 09:58

k8s-ci-robot requested a review from dgrisonnet June 22, 2023 10:01

serathius commented Jun 22, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/etcd3/metrics/metrics.go Outdated Show resolved Hide resolved

serathius force-pushed the storage-metric branch 11 times, most recently from da7ffff to 64d761e Compare June 22, 2023 10:26

k8s-ci-robot assigned jpbetz Jun 26, 2023

dgrisonnet reviewed Jul 5, 2023

View reviewed changes

serathius force-pushed the storage-metric branch from 158a056 to 78ad99b Compare July 12, 2023 08:50

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 12, 2023

serathius force-pushed the storage-metric branch from 78ad99b to d7b9dca Compare July 12, 2023 08:54

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 12, 2023

serathius force-pushed the storage-metric branch 2 times, most recently from 03a59b0 to 70312fe Compare July 12, 2023 12:31

serathius force-pushed the storage-metric branch from 70312fe to 7a63997 Compare July 12, 2023 12:33

logicalhan approved these changes Jul 12, 2023

View reviewed changes

k8s-ci-robot assigned logicalhan Jul 12, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2023

k8s-ci-robot merged commit 2ec4e14 into kubernetes:master Jul 12, 2023
13 checks passed

k8s-ci-robot added this to the v1.28 milestone Jul 12, 2023

philipgough mentioned this pull request Aug 31, 2023

Update compatibility matrix to include 1.28 prometheus-operator/kube-prometheus#2209

Merged

5 tasks

oliver-goetz mentioned this pull request Sep 19, 2023

Support for Kubernetes v1.28 gardener/gardener#8479

Merged

simonpasquier mentioned this pull request Sep 25, 2023

[bot] Update jsonnet dependencies openshift/cluster-monitoring-operator#2076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve apiserver storage size metric #118812

Improve apiserver storage size metric #118812

serathius commented Jun 22, 2023 •

edited

serathius commented Jun 22, 2023

serathius Jun 22, 2023

serathius commented Jun 23, 2023

serathius commented Jun 26, 2023

serathius commented Jul 4, 2023

serathius commented Jul 4, 2023

dgrisonnet Jul 5, 2023

serathius Jul 5, 2023

dgrisonnet Jul 5, 2023

serathius Jul 5, 2023

logicalhan Jul 11, 2023

serathius Jul 12, 2023

dgrisonnet Jul 5, 2023

serathius Jul 5, 2023 •

edited

dgrisonnet Jul 5, 2023

serathius commented Jul 11, 2023

serathius commented Jul 12, 2023

logicalhan left a comment

k8s-ci-robot commented Jul 12, 2023

jpbetz commented Jul 12, 2023

k8s-ci-robot commented Jul 12, 2023

	for ix, cfg := range s.storageFactory.Configs() {
	serversToValidate[fmt.Sprintf("etcd-%d", ix)] = &componentstatus.EtcdServer{Config: cfg}
	}

Improve apiserver storage size metric #118812

Improve apiserver storage size metric #118812

Conversation

serathius commented Jun 22, 2023 • edited

serathius commented Jun 22, 2023

Choose a reason for hiding this comment

serathius commented Jun 23, 2023

serathius commented Jun 26, 2023

serathius commented Jul 4, 2023

serathius commented Jul 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Jul 5, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Jul 11, 2023

serathius commented Jul 12, 2023

logicalhan left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 12, 2023

jpbetz commented Jul 12, 2023

k8s-ci-robot commented Jul 12, 2023

serathius commented Jun 22, 2023 •

edited

serathius Jul 5, 2023 •

edited