Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide configuration to include custom node and pod labels on metrics #859

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Expand Up @@ -53,6 +53,7 @@ Table of Contents
- [High Availability](#high-availability)
- [Configure HA Mode](#configure-ha-mode)
- [Metrics](#metrics)
- [Metrics Config](#metrics-config)
- [Compatibility Matrix](#compatibility-matrix)
- [Getting Involved and Contributing](#getting-involved-and-contributing)
- [Communicating With Contributors](#communicating-with-contributors)
Expand Down Expand Up @@ -136,6 +137,7 @@ The policy includes a common configuration that applies to all the strategies:
| `maxNoOfPodsToEvictPerNode` | `nil` | maximum number of pods evicted from each node (summed through all strategies) |
| `maxNoOfPodsToEvictPerNamespace` | `nil` | maximum number of pods evicted from each namespace (summed through all strategies) |
| `evictFailedBarePods` | `false` | allow eviction of pods without owner references and in failed phase |
| `metricsConfig` | `nil` | configuration to instruct Descheduler's metrics to include node and pod labels |

As part of the policy, the parameters associated with each strategy can be configured.
See each strategy for details on available parameters.
Expand All @@ -151,6 +153,7 @@ evictLocalStoragePods: true
evictSystemCriticalPods: true
maxNoOfPodsToEvictPerNode: 40
ignorePvcPods: false
metricsConfig: {}
strategies:
...
```
Expand Down Expand Up @@ -821,6 +824,29 @@ To get best results from HA mode some additional configurations might require:
The metrics are served through https://localhost:10258/metrics by default.
The address and port can be changed by setting `--binding-address` and `--secure-port` flags.

## Metrics Config
Eviction metrics can be enriched by including node and pod labels.

| name | type | description |
|-------|-------|----------------|
| `nodeLabels` | list(string) | list of node labels to include in metrics |
| `podLabels` | list(string) | list of pod labels to include in metrics |

```yaml
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
metricsConfig:
nodeLabels:
- "topology.kubernetes.io/zone"
podLabels:
- "app.kubernetes.io/name"
```

In order to conform to [metrics names and labels data model](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels),
all labels will be converted to only contain ASCII letters, numbers, and underscores.
In other words, node label `topology.kubernetes.io/zone` will become metric label `topology_kubernetes_io_zone`.
The metric value, however, will be left untouched.

## Compatibility Matrix
The below compatibility matrix shows the k8s client package(client-go, apimachinery, etc) versions that descheduler
is compiled with. At this time descheduler does not have a hard dependency to a specific k8s release. However a
Expand Down
10 changes: 10 additions & 0 deletions examples/custom-metrics-labels.yaml
@@ -0,0 +1,10 @@
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
metricsConfig:
nodeLabels:
- "topology.kubernetes.io/zone"
podLabels:
- "app.kubernetes.io/name"
strategies:
"RemovePodsViolatingTopologySpreadConstraint":
enabled: true
2 changes: 1 addition & 1 deletion go.mod
Expand Up @@ -4,6 +4,7 @@ go 1.18

require (
github.com/client9/misspell v0.3.4
github.com/prometheus/client_golang v1.12.1
github.com/spf13/cobra v1.4.0
github.com/spf13/pflag v1.0.5
k8s.io/api v0.24.0
Expand Down Expand Up @@ -68,7 +69,6 @@ require (
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/prometheus/client_golang v1.12.1 // indirect
github.com/prometheus/client_model v0.2.0 // indirect
github.com/prometheus/common v0.32.1 // indirect
github.com/prometheus/procfs v0.7.3 // indirect
Expand Down
50 changes: 32 additions & 18 deletions metrics/metrics.go
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package metrics

import (
"regexp"
"sync"

"k8s.io/component-base/metrics"
Expand All @@ -30,14 +31,6 @@ const (
)

var (
PodsEvicted = metrics.NewCounterVec(
&metrics.CounterOpts{
Subsystem: DeschedulerSubsystem,
Name: "pods_evicted",
Help: "Number of evicted pods, by the result, by the strategy, by the namespace, by the node name. 'error' result means a pod could not be evicted",
StabilityLevel: metrics.ALPHA,
}, []string{"result", "strategy", "namespace", "node"})

buildInfo = metrics.NewGauge(
&metrics.GaugeOpts{
Subsystem: DeschedulerSubsystem,
Expand All @@ -48,25 +41,46 @@ var (
},
)

metricsList = []metrics.Registerable{
PodsEvicted,
buildInfo,
}
metricsNameRegex, _ = regexp.Compile("[^a-zA-Z0-9_]+")

PodsEvicted *metrics.CounterVec
)

var registerMetrics sync.Once

// Register all metrics.
func Register() {
func Register(customLabels []string) {
// Register the metrics.
registerMetrics.Do(func() {
RegisterMetrics(metricsList...)
metrics := initialize(customLabels)
for _, metric := range metrics {
legacyregistry.MustRegister(metric)
}
})
}

// RegisterMetrics registers a list of metrics.
func RegisterMetrics(extraMetrics ...metrics.Registerable) {
for _, metric := range extraMetrics {
legacyregistry.MustRegister(metric)
// ensure that labels conform to https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
func ConvertToMetricLabel(label string) string {
return metricsNameRegex.ReplaceAllString(label, "_")
}

func initialize(customLabels []string) []metrics.Registerable {
labels := []string{"result", "strategy", "namespace", "node"}
for _, customLabel := range customLabels {
labels = append(labels, ConvertToMetricLabel(customLabel))
}

PodsEvicted = metrics.NewCounterVec(
&metrics.CounterOpts{
Subsystem: DeschedulerSubsystem,
Name: "pods_evicted",
Help: "Number of evicted pods, by the result, by the strategy, by the namespace, by the node name. 'error' result means a pod could not be evicted",
StabilityLevel: metrics.ALPHA,
},
labels,
)

return []metrics.Registerable{
PodsEvicted,
}
}
8 changes: 8 additions & 0 deletions pkg/api/types.go
Expand Up @@ -49,6 +49,14 @@ type DeschedulerPolicy struct {

// MaxNoOfPodsToEvictPerNamespace restricts maximum of pods to be evicted per namespace.
MaxNoOfPodsToEvictPerNamespace *uint

// MetricsConfig to control information included in metrics
MetricsConfig *MetricsConfig
}

type MetricsConfig struct {
NodeLabels []string
PodLabels []string
}

type StrategyName string
Expand Down
8 changes: 8 additions & 0 deletions pkg/api/v1alpha1/types.go
Expand Up @@ -49,6 +49,14 @@ type DeschedulerPolicy struct {

// MaxNoOfPodsToEvictPerNamespace restricts maximum of pods to be evicted per namespace.
MaxNoOfPodsToEvictPerNamespace *int `json:"maxNoOfPodsToEvictPerNamespace,omitempty"`

// MetricsConfig to control information included in metrics
MetricsConfig *MetricsConfig `json:"metricsConfig"`
}

type MetricsConfig struct {
NodeLabels []string `json:"nodeLabels"`
PodLabels []string `json:"podLabels"`
}

type StrategyName string
Expand Down
34 changes: 34 additions & 0 deletions pkg/api/v1alpha1/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 31 additions & 0 deletions pkg/api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 31 additions & 0 deletions pkg/api/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 15 additions & 3 deletions pkg/descheduler/descheduler.go
Expand Up @@ -19,6 +19,7 @@ package descheduler
import (
"context"
"fmt"
"sigs.k8s.io/descheduler/metrics"

v1 "k8s.io/api/core/v1"
policy "k8s.io/api/policy/v1beta1"
Expand All @@ -36,7 +37,6 @@ import (
schedulingv1informers "k8s.io/client-go/informers/scheduling/v1"

"sigs.k8s.io/descheduler/cmd/descheduler/app/options"
"sigs.k8s.io/descheduler/metrics"
"sigs.k8s.io/descheduler/pkg/api"
"sigs.k8s.io/descheduler/pkg/descheduler/client"
"sigs.k8s.io/descheduler/pkg/descheduler/evictions"
Expand All @@ -49,8 +49,6 @@ import (
)

func Run(ctx context.Context, rs *options.DeschedulerServer) error {
metrics.Register()

rsclient, err := client.CreateClient(rs.KubeconfigFile)
if err != nil {
return err
Expand Down Expand Up @@ -276,7 +274,20 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
podEvictorClient = rs.Client
}

if !rs.DisableMetrics {
klog.V(3).Infof("Registering metrics")

var customLabels []string
if deschedulerPolicy.MetricsConfig != nil {
customLabels = append(customLabels, deschedulerPolicy.MetricsConfig.NodeLabels...)
customLabels = append(customLabels, deschedulerPolicy.MetricsConfig.PodLabels...)
}

metrics.Register(customLabels)
}

klog.V(3).Infof("Building a pod evictor")

podEvictor := evictions.NewPodEvictor(
podEvictorClient,
evictionPolicyGroupVersion,
Expand All @@ -285,6 +296,7 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer
deschedulerPolicy.MaxNoOfPodsToEvictPerNamespace,
nodes,
!rs.DisableMetrics,
deschedulerPolicy.MetricsConfig,
)

for name, strategy := range deschedulerPolicy.Strategies {
Expand Down