Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cluster IP allocator metrics #110027

Merged
merged 1 commit into from
May 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions pkg/registry/core/rest/storage_core.go
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ func (c LegacyRESTStorageProvider) NewLegacyRESTStorage(apiResourceConfigSource
if err != nil {
return LegacyRESTStorage{}, genericapiserver.APIGroupInfo{}, fmt.Errorf("cannot create cluster IP allocator: %v", err)
}
serviceClusterIPAllocator.EnableMetrics()
restStorage.ServiceClusterIPAllocator = serviceClusterIPRegistry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to remember how this ends in the repair loops

repairClusterIPs := servicecontroller.NewRepair(c.ServiceClusterIPInterval, c.ServiceClient, c.EventClient, &c.ServiceClusterIPRange, c.ServiceClusterIPRegistry, &c.SecondaryServiceClusterIPRange, c.SecondaryServiceClusterIPRegistry)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing

func (m *Instance) InstallLegacyAPI(c *completedConfig, restOptionsGetter generic.RESTOptionsGetter) error {
legacyRESTStorageProvider := corerest.LegacyRESTStorageProvider{
StorageFactory: c.ExtraConfig.StorageFactory,
ProxyTransport: c.ExtraConfig.ProxyTransport,
KubeletClientConfig: c.ExtraConfig.KubeletClientConfig,
EventTTL: c.ExtraConfig.EventTTL,
ServiceIPRange: c.ExtraConfig.ServiceIPRange,
SecondaryServiceIPRange: c.ExtraConfig.SecondaryServiceIPRange,
ServiceNodePortRange: c.ExtraConfig.ServiceNodePortRange,
LoopbackClientConfig: c.GenericConfig.LoopbackClientConfig,
ServiceAccountIssuer: c.ExtraConfig.ServiceAccountIssuer,
ExtendExpiration: c.ExtraConfig.ExtendExpiration,
ServiceAccountMaxExpiration: c.ExtraConfig.ServiceAccountMaxExpiration,
APIAudiences: c.GenericConfig.Authentication.APIAudiences,
}
legacyRESTStorage, apiGroupInfo, err := legacyRESTStorageProvider.NewLegacyRESTStorage(c.ExtraConfig.APIResourceConfigSource, restOptionsGetter)
if err != nil {
return fmt.Errorf("error building core storage: %v", err)
}
if len(apiGroupInfo.VersionedResourcesStorageMap) == 0 { // if all core storage is disabled, return.
return nil
}
controllerName := "bootstrap-controller"
coreClient := corev1client.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
eventsClient := eventsv1client.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
bootstrapController, err := c.NewBootstrapController(legacyRESTStorage, coreClient, coreClient, eventsClient, coreClient.RESTClient())


// allocator for secondary service ip range
Expand All @@ -233,6 +234,7 @@ func (c LegacyRESTStorageProvider) NewLegacyRESTStorage(apiResourceConfigSource
if err != nil {
return LegacyRESTStorage{}, genericapiserver.APIGroupInfo{}, fmt.Errorf("cannot create cluster secondary IP allocator: %v", err)
}
secondaryServiceClusterIPAllocator.EnableMetrics()
restStorage.SecondaryServiceClusterIPAllocator = secondaryServiceClusterIPRegistry
}

Expand Down
53 changes: 33 additions & 20 deletions pkg/registry/core/service/ipallocator/allocator.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ type Interface interface {
IPFamily() api.IPFamily
Has(ip net.IP) bool
Destroy()
EnableMetrics()

// DryRun offers a way to try operations without persisting them.
DryRun() Interface
Expand Down Expand Up @@ -86,12 +87,12 @@ type Range struct {
family api.IPFamily

alloc allocator.Interface
// metrics is a metrics recorder that can be disabled
metrics metricsRecorderInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution and seems aligned with the rest of the code base,
@thockin what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgrisonnet (sig-instrumentation) I'd like your opinion too, the fact that metrics are global is 😬 , what do you think about this solution ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is not ideal to have metrics declared globally, but at the same time, most of the codebase/ecosystem is doing that and I don't think we had any issues in the past because of that. The reason is that to "enable" a metric, you have to register it into a registry so you always need an extra step in order for your metric to start being exposed. So I don't really think having them global is a problem as such.

That said, the approach that you've taken is one of the potential ways to make metrics initialization cleaner, but I personally prefer the one taken here https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/operator/operator.go#L180-L272 that creates structures to group similar metrics together. You are then able to initialize and register metrics via a simple call to "New". We might have some precedence of that or something similar in the codebase, but out of my head, I can't think of any that I have seen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that current proposal is good enough, maybe is a TODO for sig-instrumentation to work on standardize this ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is good enough, we haven't standardized anything for now, so the implementation is up to the component owners.

That's a good point, I'll bring it to the group.

}

// New creates a Range over a net.IPNet, calling allocatorFactory to construct the backing store.
func New(cidr *net.IPNet, allocatorFactory allocator.AllocatorWithOffsetFactory) (*Range, error) {
registerMetrics()

max := netutils.RangeSize(cidr)
base := netutils.BigForIP(cidr.IP)
rangeSpec := cidr.String()
Expand All @@ -116,10 +117,11 @@ func New(cidr *net.IPNet, allocatorFactory allocator.AllocatorWithOffsetFactory)
max--

r := Range{
net: cidr,
base: base,
max: maximum(0, int(max)),
family: family,
net: cidr,
base: base,
max: maximum(0, int(max)),
family: family,
metrics: &emptyMetricsRecorder{}, // disabled by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, should we registerMetrics() in L96, or only when we enable the metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

offset := 0
Expand Down Expand Up @@ -201,8 +203,10 @@ func (r *Range) allocate(ip net.IP, dryRun bool) error {
label := r.CIDR()
ok, offset := r.contains(ip)
if !ok {
// update metrics
clusterIPAllocationErrors.WithLabelValues(label.String(), "static").Inc()
if !dryRun {
// update metrics
r.metrics.incrementAllocationErrors(label.String(), "static")
}
return &ErrNotInRange{ip, r.net.String()}
}
if dryRun {
thockin marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -214,20 +218,20 @@ func (r *Range) allocate(ip net.IP, dryRun bool) error {
allocated, err := r.alloc.Allocate(offset)
if err != nil {
// update metrics
clusterIPAllocationErrors.WithLabelValues(label.String(), "static").Inc()
r.metrics.incrementAllocationErrors(label.String(), "static")

return err
}
if !allocated {
// update metrics
clusterIPAllocationErrors.WithLabelValues(label.String(), "static").Inc()
r.metrics.incrementAllocationErrors(label.String(), "static")

return ErrAllocated
}
// update metrics
clusterIPAllocations.WithLabelValues(label.String(), "static").Inc()
clusterIPAllocated.WithLabelValues(label.String()).Set(float64(r.Used()))
clusterIPAvailable.WithLabelValues(label.String()).Set(float64(r.Free()))
r.metrics.incrementAllocations(label.String(), "static")
r.metrics.setAllocated(label.String(), r.Used())
r.metrics.setAvailable(label.String(), r.Free())

return nil
}
Expand All @@ -249,20 +253,20 @@ func (r *Range) allocateNext(dryRun bool) (net.IP, error) {
offset, ok, err := r.alloc.AllocateNext()
if err != nil {
// update metrics
clusterIPAllocationErrors.WithLabelValues(label.String(), "dynamic").Inc()
r.metrics.incrementAllocationErrors(label.String(), "dynamic")

return nil, err
}
if !ok {
// update metrics
clusterIPAllocationErrors.WithLabelValues(label.String(), "dynamic").Inc()
r.metrics.incrementAllocationErrors(label.String(), "dynamic")

return nil, ErrFull
}
// update metrics
clusterIPAllocations.WithLabelValues(label.String(), "dynamic").Inc()
clusterIPAllocated.WithLabelValues(label.String()).Set(float64(r.Used()))
clusterIPAvailable.WithLabelValues(label.String()).Set(float64(r.Free()))
r.metrics.incrementAllocations(label.String(), "dynamic")
r.metrics.setAllocated(label.String(), r.Used())
r.metrics.setAvailable(label.String(), r.Free())

return netutils.AddIPOffset(r.base, offset), nil
}
Expand All @@ -287,8 +291,8 @@ func (r *Range) release(ip net.IP, dryRun bool) error {
if err == nil {
// update metrics
label := r.CIDR()
clusterIPAllocated.WithLabelValues(label.String()).Set(float64(r.Used()))
clusterIPAvailable.WithLabelValues(label.String()).Set(float64(r.Free()))
r.metrics.setAllocated(label.String(), r.Used())
r.metrics.setAvailable(label.String(), r.Free())
}
return err
}
Expand Down Expand Up @@ -364,6 +368,12 @@ func (r *Range) Destroy() {
r.alloc.Destroy()
}

// EnableMetrics enables metrics recording.
func (r *Range) EnableMetrics() {
registerMetrics()
r.metrics = &metricsRecorder{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the metrics registerMetrics() be initialized here now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// calculateIPOffset calculates the integer offset of ip from base such that
// base + offset = ip. It requires ip >= base.
func calculateIPOffset(base *big.Int, ip net.IP) int {
Expand Down Expand Up @@ -436,3 +446,6 @@ func (dry dryRunRange) Has(ip net.IP) bool {

func (dry dryRunRange) Destroy() {
}

func (dry dryRunRange) EnableMetrics() {
}
55 changes: 55 additions & 0 deletions pkg/registry/core/service/ipallocator/allocator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -434,10 +434,12 @@ func TestClusterIPMetrics(t *testing.T) {
if err != nil {
t.Fatalf("unexpected error creating CidrSet: %v", err)
}
a.EnableMetrics()
// create IPv6 allocator
cidrIPv6 := "2001:db8::/112"
_, clusterCIDRv6, _ := netutils.ParseCIDRSloppy(cidrIPv6)
b, err := NewInMemory(clusterCIDRv6)
b.EnableMetrics()
if err != nil {
t.Fatalf("unexpected error creating CidrSet: %v", err)
}
Expand Down Expand Up @@ -546,6 +548,7 @@ func TestClusterIPAllocatedMetrics(t *testing.T) {
if err != nil {
t.Fatalf("unexpected error creating CidrSet: %v", err)
}
a.EnableMetrics()

em := testMetrics{
free: 0,
Expand Down Expand Up @@ -595,6 +598,58 @@ func TestClusterIPAllocatedMetrics(t *testing.T) {
}
}

func TestMetricsDisabled(t *testing.T) {
// create metrics enabled allocator
cidrIPv4 := "10.0.0.0/24"
_, clusterCIDRv4, _ := netutils.ParseCIDRSloppy(cidrIPv4)
a, err := NewInMemory(clusterCIDRv4)
if err != nil {
t.Fatalf("unexpected error creating CidrSet: %v", err)
}
a.EnableMetrics()

// create metrics disabled allocator with same CIDR
// this metrics should be ignored
b, err := NewInMemory(clusterCIDRv4)
if err != nil {
t.Fatalf("unexpected error creating CidrSet: %v", err)
}

// Check initial state
em := testMetrics{
free: 0,
used: 0,
allocated: 0,
errors: 0,
}
expectMetrics(t, cidrIPv4, em)

// allocate in metrics enabled allocator
for i := 0; i < 100; i++ {
_, err := a.AllocateNext()
if err != nil {
t.Fatal(err)
}
}
em = testMetrics{
free: 154,
used: 100,
allocated: 100,
errors: 0,
}
expectMetrics(t, cidrIPv4, em)

// allocate in metrics disabled allocator
for i := 0; i < 200; i++ {
_, err := b.AllocateNext()
if err != nil {
t.Fatal(err)
}
}
// the metrics should not be changed
expectMetrics(t, cidrIPv4, em)
}

// Metrics helpers
func clearMetrics() {
clusterIPAllocated.Reset()
Expand Down
35 changes: 35 additions & 0 deletions pkg/registry/core/service/ipallocator/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,38 @@ func registerMetrics() {
legacyregistry.MustRegister(clusterIPAllocationErrors)
})
}

// metricsRecorderInterface is the interface to record metrics.
type metricsRecorderInterface interface {
setAllocated(cidr string, allocated int)
setAvailable(cidr string, available int)
incrementAllocations(cidr, scope string)
incrementAllocationErrors(cidr, scope string)
}

// metricsRecorder implements metricsRecorderInterface.
type metricsRecorder struct{}

func (m *metricsRecorder) setAllocated(cidr string, allocated int) {
clusterIPAllocated.WithLabelValues(cidr).Set(float64(allocated))
}

func (m *metricsRecorder) setAvailable(cidr string, available int) {
clusterIPAvailable.WithLabelValues(cidr).Set(float64(available))
}

func (m *metricsRecorder) incrementAllocations(cidr, scope string) {
clusterIPAllocations.WithLabelValues(cidr, scope).Inc()
}

func (m *metricsRecorder) incrementAllocationErrors(cidr, scope string) {
clusterIPAllocationErrors.WithLabelValues(cidr, scope).Inc()
}

// emptyMetricsRecorder is a null object implements metricsRecorderInterface.
type emptyMetricsRecorder struct{}

func (*emptyMetricsRecorder) setAllocated(cidr string, allocated int) {}
func (*emptyMetricsRecorder) setAvailable(cidr string, available int) {}
func (*emptyMetricsRecorder) incrementAllocations(cidr, scope string) {}
func (*emptyMetricsRecorder) incrementAllocationErrors(cidr, scope string) {}