Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA panics #39680

Closed
gmarek opened this issue Jan 10, 2017 · 5 comments
Closed

HPA panics #39680

gmarek opened this issue Jan 10, 2017 · 5 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Milestone

Comments

@gmarek
Copy link
Contributor

gmarek commented Jan 10, 2017

I just run into panic in controller-manager:

/usr/local/go/src/runtime/asm_amd64.s:2086
panic: runtime error: integer divide by zero [recovered]
        panic: runtime error: integer divide by zero [recovered]
        panic: runtime error: integer divide by zero

goroutine 327 [running]:
panic(0x229cb80, 0xc420010040)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126
panic(0x229cb80, 0xc420010040)
        /usr/local/go/src/runtime/panic.go:458 +0x243
k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126
panic(0x229cb80, 0xc420010040)
        /usr/local/go/src/runtime/panic.go:458 +0x243
k8s.io/kubernetes/pkg/controller/podautoscaler/metrics.GetResourceUtilizationRatio(0xc4218cb1a0, 0xc421064ce0, 0xc40000003c, 0x26, 0xc4217c7408, 0x38cc801, 0xc421995740)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/metrics/utilization.go:37 +0x196
k8s.io/kubernetes/pkg/controller/podautoscaler.(*ReplicaCalculator).GetResourceReplicas(0xc420907f60, 0x3c00000002, 0x26fdc1a, 0x3, 0xc42090ae54, 0x7, 0x38cc8e0, 0xc421995740, 0xc4212da480, 0x26ffbca, ...)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/replica_calculator.go:97 +0x958
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).computeReplicasForCPUUtilization(0xc4208fbc70, 0xc4203d2f08, 0xc4212da480, 0x37, 0x39460e0, 0xc4212da400, 0x0, 0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:167 +0x3e1
k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).reconcileAutoscaler(0xc4208fbc70, 0xc4203d2f08, 0x27855ae, 0x37)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:315 +0xc42
k8s.io/kubernetes/pkg/controller/podautoscaler.newInformer.func3(0x26b07a0, 0xc4203d2f08)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:95 +0xc3
k8s.io/kubernetes/pkg/client/cache.ResourceEventHandlerFuncs.OnAdd(0xc4208d2200, 0xc4208d2210, 0x0, 0x26b07a0, 0xc4203d2f08)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/controller.go:173 +0x49
k8s.io/kubernetes/pkg/client/cache.(*ResourceEventHandlerFuncs).OnAdd(0xc4208fe0c0, 0x26b07a0, 0xc4203d2f08)
        <autogenerated>:50 +0x78
k8s.io/kubernetes/pkg/client/cache.NewInformer.func1(0x22f5240, 0xc4219952e0, 0xc4219952e0, 0x22f5240)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/controller.go:251 +0x270
k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Pop(0xc420864630, 0xc42087bd10, 0x0, 0x0, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:420 +0x22a
k8s.io/kubernetes/pkg/client/cache.(*Controller).processLoop(0xc42018ae00)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/controller.go:131 +0x3c
k8s.io/kubernetes/pkg/client/cache.(*Controller).(k8s.io/kubernetes/pkg/client/cache.processLoop)-fm()
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/controller.go:102 +0x2a
k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc421065f60)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 +0x5e
k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc421065f60, 0x3b9aca00, 0x0, 0x2164401, 0xc420126a20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 +0xad
k8s.io/kubernetes/pkg/util/wait.Until(0xc421065f60, 0x3b9aca00, 0xc420126a20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 +0x4d
k8s.io/kubernetes/pkg/client/cache.(*Controller).Run(0xc42018ae00, 0xc420126a20)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/cache/controller.go:102 +0x1af
created by k8s.io/kubernetes/pkg/controller/podautoscaler.(*HorizontalController).Run
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/controller/podautoscaler/horizontal.go:133 +0xc0

This clearly shouldn't happen.

cc @fgrzadkowski @jszczepkowski @kubernetes/sig-autoscaling-misc

@gmarek gmarek added kind/bug Categorizes issue or PR as related to a bug. priority/P0 sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. labels Jan 10, 2017
@gmarek
Copy link
Contributor Author

gmarek commented Jan 10, 2017

The bug is here:

 22 func GetResourceUtilizationRatio(metrics PodResourceInfo, requests map[string]int64, targetUtilization int32) (float64, int32, error) {
 23   metricsTotal := int64(0)
 24   requestsTotal := int64(0)
 25 
 26   for podName, metricValue := range metrics {
 27     request, hasRequest := requests[podName]
 28     if !hasRequest {
 29       // we check for missing requests elsewhere, so assuming missing requests == extraneous metrics
 30       continue
 31     }
 32 
 33     metricsTotal += metricValue
 34     requestsTotal += request
 35   }
 36 
 37   currentUtilization := int32((metricsTotal * 100) / requestsTotal)
 38 
 39   return float64(currentUtilization) / float64(targetUtilization), currentUtilization, nil
 40 }

@DirectXMan12
Copy link
Contributor

I'll get a fix in this afternoon.

@gmarek
Copy link
Contributor Author

gmarek commented Jan 10, 2017

@DirectXMan12 please do. We want it to get into 1.5.2 release.

@saad-ali - fix for this is important to cherry-pick.

@j3ffml j3ffml added this to the v1.5 milestone Jan 10, 2017
@fgrzadkowski
Copy link
Contributor

Reassigned to @DirectXMan12

DirectXMan12 added a commit to DirectXMan12/kubernetes that referenced this issue Jan 10, 2017
In certain conditions in which the set of metrics returned by Heapster
is completely disjoint from the set of pods returned by the API server,
we can have a request sum of zero, which can cause a panic (due to
division by zero).  This checks for that condition.

Fixes kubernetes#39680
@saad-ali
Copy link
Member

@saad-ali - fix for this is important to cherry-pick.

Ack, please get the cherry-pick out ASAP. I'll keep an eye out.

saad-ali pushed a commit to saad-ali/kubernetes that referenced this issue Jan 11, 2017
In certain conditions in which the set of metrics returned by Heapster
is completely disjoint from the set of pods returned by the API server,
we can have a request sum of zero, which can cause a panic (due to
division by zero).  This checks for that condition.

Fixes kubernetes#39680
k8s-github-robot pushed a commit that referenced this issue Jan 11, 2017
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

HPA Controller: Check for 0-sum request value

In certain conditions in which the set of metrics returned by Heapster
is completely disjoint from the set of pods returned by the API server,
we can have a request sum of zero, which can cause a panic (due to
division by zero).  This checks for that condition.

Fixes #39680

**Release note**:

```release-note
Fixes an HPA-related panic due to division-by-zero.
```
jayunit100 pushed a commit to jayunit100/kubernetes that referenced this issue Jan 13, 2017
In certain conditions in which the set of metrics returned by Heapster
is completely disjoint from the set of pods returned by the API server,
we can have a request sum of zero, which can cause a panic (due to
division by zero).  This checks for that condition.

Fixes kubernetes#39680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Projects
None yet
Development

No branches or pull requests

6 participants