initial heapster redploy leaves ghosted replicasets #241

Closed
cm-graham opened this Issue Mar 23, 2017 · 6 comments

Comments

Projects
None yet
2 participants

cm-graham commented Mar 23, 2017

After a clean cluster install, I came across what for the lack of a better term, looks like 2 ghosted heapster replicasets that weren't running any pods. Here are screen shots from the dashboard when I first noticed:

heapster-v1 2 0 1-deployment

(It is worth noting that there are no old replica sets listed on the deployment page either. The only one that is mentioned is noted in the screen shot)

heapster-v1 2 0 1-replicasets

Output from CLI:

$ kubectl get po,svc,rc --all-namespaces
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
default       po/default-http-backend-gzv6r              1/1       Running   0          1h
default       po/nginx-ingress-controller-hn1k0          1/1       Running   0          1h
default       po/nginx-ingress-controller-w91sk          1/1       Running   0          1h
default       po/nginx-ingress-controller-xxz70          1/1       Running   0          1h
kube-system   po/heapster-v1.2.0.1-2320000070-zswjf      4/4       Running   0          1h
kube-system   po/kube-dns-4101612645-c23lv               4/4       Running   0          1h
kube-system   po/kubernetes-dashboard-3543765157-t902s   1/1       Running   0          1h
kube-system   po/monitoring-influxdb-grafana-v4-hmznk    2/2       Running   0          1h

NAMESPACE     NAME                       CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
default       svc/default-http-backend   10.152.183.203   <none>        80/TCP              1h
default       svc/kubernetes             10.152.183.1     <none>        443/TCP             1h
kube-system   svc/heapster               10.152.183.232   <none>        80/TCP              1h
kube-system   svc/kube-dns               10.152.183.10    <none>        53/UDP,53/TCP       1h
kube-system   svc/kubernetes-dashboard   10.152.183.96    <none>        80/TCP              1h
kube-system   svc/monitoring-grafana     10.152.183.135   <none>        80/TCP              1h
kube-system   svc/monitoring-influxdb    10.152.183.4     <none>        8083/TCP,8086/TCP   1h

NAMESPACE     NAME                                DESIRED   CURRENT   READY     AGE
default       rc/default-http-backend             1         1         1         1h
default       rc/nginx-ingress-controller         3         3         3         1h
kube-system   rc/monitoring-influxdb-grafana-v4   1         1         1         1h
Collaborator

chuckbutler commented Mar 23, 2017

Thats certainly interesting!

It doesn't appear that the 'ghosted heapster" replication controllers are listed in the CLI output.

Can you try to kubectl describe one of the RC's that have no pods? eg:
kubectl describe rc heapster-v1.2.0.1-907310421 for example

I suspect whats happened here is it rescheduled the addon service, and there is still an RC definition thats unable to be fulfilled because the pods are already running occupying whatever bindings the other rc's are expecting so there are 2 sitting in not-error-state but they aren't being helpful either.

Whats confusing to me is the presence of the RC in the dashboard, but not in the cli output. Highly suspect...

cm-graham commented Mar 23, 2017

Looks like it is definitely an issue with the dashboard and not the CLI:

$ kubectl describe rc heapster-v1.2.0.1-2548925502
Error from server (NotFound): replicationcontrollers "heapster-v1.2.0.1-2548925502" not found
$ kubectl describe rc heapster-v1.2.0.1-2548925502
Error from server (NotFound): replicationcontrollers "heapster-v1.2.0.1-2548925502" not found

cm-graham commented Mar 23, 2017

Spoke too soon:

$ kubectl get rs --all-namespaces
NAMESPACE     NAME                              DESIRED   CURRENT   READY     AGE
kube-system   heapster-v1.2.0.1-2320000070      1         1         1         1h
kube-system   heapster-v1.2.0.1-2548925502      0         0         0         1h
kube-system   heapster-v1.2.0.1-907310421       0         0         0         1h
kube-system   kube-dns-4101612645               1         1         1         1h
kube-system   kubernetes-dashboard-3543765157   1         1         1         1h

cm-graham commented Mar 23, 2017

But you can't describe them

$ kubectl describe rs heapster-v1.2.0.1-2548925502
Error from server (NotFound): replicasets "heapster-v1.2.0.1-2548925502" not found
$ kubectl describe rs heapster-v1.2.0.1-907310421 
Error from server (NotFound): replicasets "heapster-v1.2.0.1-907310421" not found

It's Schrodinger's cat!

cm-graham commented Apr 5, 2017

OK I believe this one can be closed as during testing a couple of in house built deployments and they also have older replicasets that hang out. It appears that K8s keeps the last 3 revisions of a replicaset, and looks at the deployment.kubernetes.io/revision annotation it looks like heapster is "deployed" 3 times during the initial CDK install.

It does seem to make sense to do that from a rollback perspective, this is a configurable setting in the deployment spec, revisionHistoryLimit, and defaults to 2 plus the current revision. Apparently at one point last year they had it to default to keep all replicasets by default.

Collaborator

chuckbutler commented Apr 21, 2017

Thanks for the feedback loop here cm-graham. Good investigative work. I'm going to close this for now. if it continues to be problematic for you, don't hesitate to reply to the bug and we'll re-open and evaluate for a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment